Please use this identifier to cite or link to this item:
http://bura.brunel.ac.uk/handle/2438/30258
Title: | GENCODE 2025: reference gene annotation for human and mouse |
Authors: | Mudge, JM Carbonell-Sala, S Diekhans, M Martinez, JG Hunt, T Jungreis, I Loveland, JE Arnan, C Barnes, I Bennett, R Berry, A Bignell, A Cerdán-Vélez, D Cochran, K Cortés, LT Davidson, C Donaldson, S Dursun, C Fatima, R Hardy, M Hebbar, P Hollis, Z James, BT Jiang, Y Johnson, R Kaur, G Kay, M Mangan, RJ Maquedano, M Gómez, LM Mathlouthi, N Merritt, R Ni, P Palumbo, E Perteghella, T Pozo, F Raj, S Sisu, C Steed, E Sumathipala, D Suner, M-M Uszczynska-Ratajczak, B Wass, E Yang, YT Zhang, D Finn, RD Gerstein, M Guigó, R Hubbard, TJP Kellis, M Kundaje, A Paten, B Tress, ML Birney, E Martin, FJ Frankish, A |
Issue Date: | 20-Nov-2024 |
Publisher: | Oxford University Press |
Citation: | Mudge, J.M. et al (2024) 'GENCODE 2025: reference gene annotation for human and mouse', Nucleic Acids Research, 2024, 0 (ahead of print), gkae1078, pp. 1 - 10. doi: 10.1093/nar/gkae1078. |
Abstract: | GENCODE produces comprehensive reference gene annotation for human and mouse. Entering its twentieth year, the project remains highly active as new technologies and methodologies allow us to catalog the genome at ever-increasing granularity. In particular, long-read transcriptome sequencing enables us to identify large numbers of missing transcripts and to substantially improve existing models, and our long non-coding RNA catalogs have undergone a dramatic expansion and reconfiguration as a result. Meanwhile, we are incorporating data from state-of-the-art proteomics and Ribo-seq experiments to fine-tune our annotation of translated sequences, while further insights into function can be gained from multi-genome alignments that grow richer as more species’ genomes are sequenced. Such methodologies are combined into a fully integrated annotation workflow. However, the increasing complexity of our resources can present usability challenges, and we are resolving these with the creation of filtered genesets such as MANE Select and GENCODE Primary. The next challenge is to propagate annotations throughout multiple human and mouse genomes, as we enter the pangenome era. Our resources are freely available at our web portal www.gencodegenes.org, and via the Ensembl and UCSC genome browsers. |
Description: | Data availability: A new GENCODE release is produced up to four times each year for both human and mouse. Each release is made freely available immediately upon release from the Ensembl website (https://www.ensembl.org) and the GENCODE webportal (https//www.gencodegenes.org), with a release on the UCSC Genome Browser shortly after that (https://genome.ucsc.edu/). GENCODE is currently the default annotation in both genome browsers, and is embedded in numerous genomics and clinical projects. The current human release is GENCODE 47, and the current mouse release is GENCODE M36 (October 2024). Additional information and previous releases can be found at https//www.gencodegenes.org. MANE annotations are available from the Ensembl and RefSeq NCBI websites and can be viewed on both the Ensembl and UCSC genome browsers. To expedite public access to updated annotation between releases, all annotation changes are made freely available within 24 h via the ‘GENCODE Annotation Updates’ Track Hub, accessed at both the Ensembl and UCSC genome browsers. GENCODE has been designated a Global Core Biodata Resource by the Global Biodata Coalition. GENCODE produces the human and mouse gene annotation for the Ensembl project, in collaboration with Ensembl. Human 47 and mouse M36 are contained within Ensembl release e113. Programmatic access to the GENCODE gene sets is possible via the extensive Ensembl Perl API and the language-agnostic Ensembl REST API (50). Programmatic access facilitates advanced genome-wide analysis such as retrieval of supporting features and associated gene trees. Examples of REST endpoint usage and starter scripts in different languages are at https://rest.ensembl.org. Other interfaces include the Ensembl FTP site (ftp://ftp.ensembl.org/pub/), which includes gene sets in GFF3, Genbank and GTF formats and full download of the complete Ensembl databases. GENCODE-specific training materials and GENCODE-focused workshops from the Ensembl Outreach team are available via the Ensembl Training portal (http://training.ensembl.org) and EMBL-EBI (https://www.ebi.ac.uk/training/on-demand), and are regularly presented at online and in-person training events. Further information on the results of the GENCODE CLS pipeline to produce a collection of full-length high-quality transcripts—including access to the human and mouse master tables of transcript models prior to full annotation—is available here: https://github.com/guigolab/gencode-cls-master-table. All raw transcriptomics data produced by GENCODE to support the CLS work have been uploaded to the ENCODE data repository (see https://www.encodeproject.org/about/data-access/) and will be made publicly available as part of a manuscript describing this work, currently in preparation. Our resources are freely available at our web portal, www.gencodegenes.org, and via the Ensembl (https://www.ensembl.org) and UCSC genome browsers (https://genome.ucsc.edu/). |
URI: | https://bura.brunel.ac.uk/handle/2438/30258 |
DOI: | https://doi.org/10.1093/nar/gkae1078 |
ISSN: | 0305-1048 |
Other Identifiers: | ORCiD: Jonathan M Mudge https://orcid.org/0000-0003-4789-7495 ORCiD: Mark Diekhans https://orcid.org/0000-0002-0430-0989 ORCiD: Irwin Jungreis https://orcid.org/0000-0002-3197-5367 ORCiD: Jane E Loveland https://orcid.org/0000-0002-7669-2934 ORCiD: Carme Arnan https://orcid.org/0000-0002-7431-2088 ORCiD: Cristina Sisu https://orcid.org/0000-0001-9371-0797 ORCiD: Marie-Marthe Suner https://orcid.org/0000-0002-0380-7171 ORCiD: Robert D Finn https://orcid.org/0000-0001-8626-2148 ORCiD: Anshul Kundaje https://orcid.org/0000-0003-3084-2287 ORCiD: Benedict Paten https://orcid.org/0000-0001-8863-3539 ORCiD: Michael L Tress https://orcid.org/0000-0001-9046-6370 ORCiD: Ewan Birney https://orcid.org/0000-0001-8314-8497 ORCiD: Fergal J Martin https://orcid.org/0000-0002-1672-050X ORCiD: Adam Frankish https://orcid.org/0000-0002-4333-628X gkae1078 |
Appears in Collections: | Dept of Life Sciences Research Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
FullText.pdf | Copyright © The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. | 1.18 MB | Adobe PDF | View/Open |
This item is licensed under a Creative Commons License