Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/28720
Title: PyCoM: a python library for large-scale analysis of residue-residue coevolution data
Authors: Bibik, P
Alibai, S
Pandini, A
Dantu, SC
Issue Date: 26-Mar-2024
Publisher: Oxford University Press (OUP)
Citation: Bibik, B. et al. (2024) 'PyCoM: a python library for large-scale analysis of residue-residue coevolution data', Bioinformatics, 40 (4), btae166, pp. 1 - 4. doi: 10.1093/bioinformatics/btae166.
Abstract: Motivation: Computational methods to detect correlated amino acid positions in proteins have become a valuable tool to predict intra and inter-residue protein contacts, protein structures, and effects of mutation on protein stability and function. While there are many tools and webservers to compute coevolution scoring matrices, there is no central repository of alignments and coevolution matrices for large-scale studies and pattern detection leveraging on structural and biological annotation already available in UniProt. Results: We present a Python library, PyCoM, which enables users to query and analyse coevolution matrices and sequence alignments of 457,622 proteins, selected from UniProtKB/Swiss-Prot database (length ≤ 500 residues), from a pre-compiled coevolution matrix database (PyCoMdb). PyCoM facilitates the development of statistical analyses of residue coevolution patterns using filters on structural and biological annotation from UniProtKB/Swiss-Prot, with simple access to PyCoMdb for both novice and advanced users, supporting Jupyter Notebooks, Python scripts, and a web API access. The resource is open source and will help in generating data-driven computational models and methods to study and understand protein structures, stability, function, and design.
Description: The version currently archived on this institutional repository is an accepted manuscript, the PDF version of the author’s final manuscript, as accepted for publication by the journal but prior to copyediting or typesetting. They can be cited using the author(s), article title, journal title, year of online publication, and DOI. They will be replaced by the final typeset articles, which may therefore contain changes. The DOI will remain the same throughout.
Data availability The data underpinning this publication can be accessed from Brunel University London's data repository under CC BY license: Coevolution matrix database https://brunel.figshare.com/articles/dataset/PyCoM_ProCoM_Database_of_coevolu tion_matrices/23735613 and protein database https://brunel.figshare.com/articles/dataset/PyCoM_ProCoM_Curated_UniProt_pro tein_database/23733309 .
Availability and implementation PyCoM code is freely available from https://github.com/scdantu/pycom and PyCoMdb and the Jupyter Notebook tutorials are freely available from https://pycom.brunel.ac.uk .
Supplementary information: Supplementary data are available at Bioinformatics online at https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btae166/7635577#supplementary-data .
URI: https://bura.brunel.ac.uk/handle/2438/28720
DOI: https://doi.org/10.1093/bioinformatics/btae166
Other Identifiers: ORCiD: Alessandro Pandini https://orcid.org/0000-0002-4158-233X
ORCiD: Sarath Chandra Dantu https://orcid.org/0000-0003-2019-5311
Appears in Collections:Dept of Computer Science Research Papers

Files in This Item:
File Description SizeFormat 
FullText.pdfCopyright © The Author(s) 2024. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.1.2 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons