Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/30258
Title: GENCODE 2025: reference gene annotation for human and mouse
Authors: Mudge, JM
Carbonell-Sala, S
Diekhans, M
Martinez, JG
Hunt, T
Jungreis, I
Loveland, JE
Arnan, C
Barnes, I
Bennett, R
Berry, A
Bignell, A
Cerdán-Vélez, D
Cochran, K
Cortés, LT
Davidson, C
Donaldson, S
Dursun, C
Fatima, R
Hardy, M
Hebbar, P
Hollis, Z
James, BT
Jiang, Y
Johnson, R
Kaur, G
Kay, M
Mangan, RJ
Maquedano, M
Gómez, LM
Mathlouthi, N
Merritt, R
Ni, P
Palumbo, E
Perteghella, T
Pozo, F
Raj, S
Sisu, C
Steed, E
Sumathipala, D
Suner, M-M
Uszczynska-Ratajczak, B
Wass, E
Yang, YT
Zhang, D
Finn, RD
Gerstein, M
Guigó, R
Hubbard, TJP
Kellis, M
Kundaje, A
Paten, B
Tress, ML
Birney, E
Martin, FJ
Frankish, A
Issue Date: 20-Nov-2024
Publisher: Oxford University Press
Citation: Mudge, J.M. et al (2024) 'GENCODE 2025: reference gene annotation for human and mouse', Nucleic Acids Research, 2024, 0 (ahead of print), gkae1078, pp. 1 - 10. doi: 10.1093/nar/gkae1078.
Abstract: GENCODE produces comprehensive reference gene annotation for human and mouse. Entering its twentieth year, the project remains highly active as new technologies and methodologies allow us to catalog the genome at ever-increasing granularity. In particular, long-read transcriptome sequencing enables us to identify large numbers of missing transcripts and to substantially improve existing models, and our long non-coding RNA catalogs have undergone a dramatic expansion and reconfiguration as a result. Meanwhile, we are incorporating data from state-of-the-art proteomics and Ribo-seq experiments to fine-tune our annotation of translated sequences, while further insights into function can be gained from multi-genome alignments that grow richer as more species’ genomes are sequenced. Such methodologies are combined into a fully integrated annotation workflow. However, the increasing complexity of our resources can present usability challenges, and we are resolving these with the creation of filtered genesets such as MANE Select and GENCODE Primary. The next challenge is to propagate annotations throughout multiple human and mouse genomes, as we enter the pangenome era. Our resources are freely available at our web portal www.gencodegenes.org, and via the Ensembl and UCSC genome browsers.
Description: Data availability: A new GENCODE release is produced up to four times each year for both human and mouse. Each release is made freely available immediately upon release from the Ensembl website (https://www.ensembl.org) and the GENCODE webportal (https//www.gencodegenes.org), with a release on the UCSC Genome Browser shortly after that (https://genome.ucsc.edu/). GENCODE is currently the default annotation in both genome browsers, and is embedded in numerous genomics and clinical projects. The current human release is GENCODE 47, and the current mouse release is GENCODE M36 (October 2024). Additional information and previous releases can be found at https//www.gencodegenes.org. MANE annotations are available from the Ensembl and RefSeq NCBI websites and can be viewed on both the Ensembl and UCSC genome browsers. To expedite public access to updated annotation between releases, all annotation changes are made freely available within 24 h via the ‘GENCODE Annotation Updates’ Track Hub, accessed at both the Ensembl and UCSC genome browsers. GENCODE has been designated a Global Core Biodata Resource by the Global Biodata Coalition. GENCODE produces the human and mouse gene annotation for the Ensembl project, in collaboration with Ensembl. Human 47 and mouse M36 are contained within Ensembl release e113. Programmatic access to the GENCODE gene sets is possible via the extensive Ensembl Perl API and the language-agnostic Ensembl REST API (50). Programmatic access facilitates advanced genome-wide analysis such as retrieval of supporting features and associated gene trees. Examples of REST endpoint usage and starter scripts in different languages are at https://rest.ensembl.org. Other interfaces include the Ensembl FTP site (ftp://ftp.ensembl.org/pub/), which includes gene sets in GFF3, Genbank and GTF formats and full download of the complete Ensembl databases. GENCODE-specific training materials and GENCODE-focused workshops from the Ensembl Outreach team are available via the Ensembl Training portal (http://training.ensembl.org) and EMBL-EBI (https://www.ebi.ac.uk/training/on-demand), and are regularly presented at online and in-person training events. Further information on the results of the GENCODE CLS pipeline to produce a collection of full-length high-quality transcripts—including access to the human and mouse master tables of transcript models prior to full annotation—is available here: https://github.com/guigolab/gencode-cls-master-table. All raw transcriptomics data produced by GENCODE to support the CLS work have been uploaded to the ENCODE data repository (see https://www.encodeproject.org/about/data-access/) and will be made publicly available as part of a manuscript describing this work, currently in preparation. Our resources are freely available at our web portal, www.gencodegenes.org, and via the Ensembl (https://www.ensembl.org) and UCSC genome browsers (https://genome.ucsc.edu/).
URI: https://bura.brunel.ac.uk/handle/2438/30258
DOI: https://doi.org/10.1093/nar/gkae1078
ISSN: 0305-1048
Other Identifiers: ORCiD: Jonathan M Mudge https://orcid.org/0000-0003-4789-7495
ORCiD: Mark Diekhans https://orcid.org/0000-0002-0430-0989
ORCiD: Irwin Jungreis https://orcid.org/0000-0002-3197-5367
ORCiD: Jane E Loveland https://orcid.org/0000-0002-7669-2934
ORCiD: Carme Arnan https://orcid.org/0000-0002-7431-2088
ORCiD: Cristina Sisu https://orcid.org/0000-0001-9371-0797
ORCiD: Marie-Marthe Suner https://orcid.org/0000-0002-0380-7171
ORCiD: Robert D Finn https://orcid.org/0000-0001-8626-2148
ORCiD: Anshul Kundaje https://orcid.org/0000-0003-3084-2287
ORCiD: Benedict Paten https://orcid.org/0000-0001-8863-3539
ORCiD: Michael L Tress https://orcid.org/0000-0001-9046-6370
ORCiD: Ewan Birney https://orcid.org/0000-0001-8314-8497
ORCiD: Fergal J Martin https://orcid.org/0000-0002-1672-050X
ORCiD: Adam Frankish https://orcid.org/0000-0002-4333-628X
gkae1078
Appears in Collections:Dept of Life Sciences Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.1.18 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons