{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T18:20:16Z","timestamp":1767896416260,"version":"3.49.0"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2024,12,13]],"date-time":"2024-12-13T00:00:00Z","timestamp":1734048000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000928","name":"Welch Foundation","doi-asserted-by":"publisher","award":["R01GM147367"],"award-info":[{"award-number":["R01GM147367"]}],"id":[{"id":"10.13039\/100000928","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,12,26]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Due to the breakthrough in protein structure prediction by AlphaFold, the scientific community has access to 200 million predicted protein structures with near-atomic accuracy from the AlphaFold protein structure DataBase (AFDB), covering nearly the entire protein universe. Segmenting these models into domains and classifying them into an evolutionary hierarchy hold tremendous potential for unraveling essential insights into protein function.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduce DPAM-AI, a Domain Parser for AlphaFold Models based on Artificial Intelligence. DPAM-AI utilizes a convolutional neural network trained with previously classified domains in the Evolutionary Classification Of protein Domains (ECOD) database. DPAM-AI integrates inter-residue distances, predicted aligned errors, and sequence and structural alignments to previously classified domains detected via sequence (HHsuite) and structural (Dali) similarity searches. DPAM-AI has demonstrated its power through rigorous tests, excelling in several benchmark sets compared to its predecessor, DPAM, and other recently published domain parsers, Merizo and Chainsaw. We applied DPAM-AI to representative AFDB models for proteins classified in Pfam. We obtained representative 3D structures for 18\u00a0487 (89%) of the 20\u00a0795 Pfam families. The remaining families either (i) belong to viral proteins that were excluded from AFDB or (ii) do not adopt globular 3D structures. Our structure-aware domain delineation uncovered a considerable fraction (15%) of Pfam domains containing multiple structural and evolutionary units and refined the boundaries for over half.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Pfam and corresponding DPAM-AI domains are at http:\/\/prodata.swmed.edu\/DPAM-pfam\/. Our code is deposited at https:\/\/github.com\/Jsauce5p\/DPAM\/tree\/dpam_ai, and updates will be released through https:\/\/github.com\/CongLabCode\/DPAM.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae740","type":"journal-article","created":{"date-parts":[[2024,12,14]],"date-time":"2024-12-14T02:48:58Z","timestamp":1734144538000},"source":"Crossref","is-referenced-by-count":2,"title":["DPAM-AI: a domain parser for AlphaFold models powered by artificial intelligence"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6335-1332","authenticated-orcid":false,"given":"Jesse","family":"Durham","sequence":"first","affiliation":[{"name":"Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center , Dallas, TX 75390,","place":["United States"]},{"name":"Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center , Dallas, TX 75390,","place":["United States"]},{"name":"Department of Biophysics, University of Texas Southwestern Medical Center , Dallas, TX 75390,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4190-3065","authenticated-orcid":false,"given":"Jing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center , Dallas, TX 75390,","place":["United States"]},{"name":"Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center , Dallas, TX 75390,","place":["United States"]},{"name":"Department of Biophysics, University of Texas Southwestern Medical Center , Dallas, TX 75390,","place":["United States"]}]},{"given":"Richard D","family":"Schaeffer","sequence":"additional","affiliation":[{"name":"Department of Biophysics, University of Texas Southwestern Medical Center , Dallas, TX 75390,","place":["United States"]},{"name":"Department of Biochemistry, University of Texas Southwestern Medical Center , Dallas, TX 75390,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8909-0414","authenticated-orcid":false,"given":"Qian","family":"Cong","sequence":"additional","affiliation":[{"name":"Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center , Dallas, TX 75390,","place":["United States"]},{"name":"Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center , Dallas, TX 75390,","place":["United States"]},{"name":"Department of Biophysics, University of Texas Southwestern Medical Center , Dallas, TX 75390,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2024,12,13]]},"reference":[{"key":"2025011021305339000_btae740-B1","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1093\/bioinformatics\/btg006","article-title":"PDP: protein domain parser","volume":"19","author":"Alexandrov","year":"2003","journal-title":"Bioinformatics"},{"key":"2025011021305339000_btae740-B2","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1042\/BST0370751","article-title":"The evolution of protein domain families","volume":"37","author":"Buljan","year":"2009","journal-title":"Biochem Soc Transac"},{"key":"2025011021305339000_btae740-B3","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1186\/1471-2105-10-421","article-title":"BLAST+: architecture and applications","volume":"10","author":"Camacho","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2025011021305339000_btae740-B4","doi-asserted-by":"crossref","first-page":"e1003926","DOI":"10.1371\/journal.pcbi.1003926","article-title":"ECOD: an evolutionary classification of protein domains","volume":"10","author":"Cheng","year":"2014","journal-title":"PLoS Comput Biol"},{"key":"2025011021305339000_btae740-B5","doi-asserted-by":"crossref","first-page":"953","DOI":"10.1038\/nsb1101-953","article-title":"Identification of homology in protein structure classification","volume":"8","author":"Dietmann","year":"2001","journal-title":"Nat Struct Biol"},{"key":"2025011021305339000_btae740-B6","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1016\/j.tibs.2023.03.003","article-title":"Recent advances in predicting and modeling protein-protein interactions","volume":"48","author":"Durham","year":"2023","journal-title":"Trends Biochem Sci"},{"key":"2025011021305339000_btae740-B7","doi-asserted-by":"crossref","first-page":"696","DOI":"10.1110\/ps.0233103","article-title":"Prediction of protein domain boundaries from sequence alone","volume":"12","author":"Galzitskaya","year":"2003","journal-title":"Protein Sci"},{"key":"2025011021305339000_btae740-B8","doi-asserted-by":"crossref","first-page":"5326","DOI":"10.1093\/bioinformatics\/btz536","article-title":"Benchmarking fold detection by DaliLite v.5","volume":"35","author":"Holm","year":"2019","journal-title":"Bioinformatics"},{"key":"2025011021305339000_btae740-B9","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1002\/prot.340190309","article-title":"Parser for protein folding units","volume":"19","author":"Holm","year":"1994","journal-title":"Proteins"},{"key":"2025011021305339000_btae740-B10","doi-asserted-by":"crossref","first-page":"1711","DOI":"10.1002\/prot.26257","article-title":"Applying and improving AlphaFold at CASP14","volume":"89","author":"Jumper","year":"2021","journal-title":"Proteins"},{"key":"2025011021305339000_btae740-B11","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2025011021305339000_btae740-B12","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1002\/bip.360221211","article-title":"Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features","volume":"22","author":"Kabsch","year":"1983","journal-title":"Biopolymers"},{"key":"2025011021305339000_btae740-B13","doi-asserted-by":"crossref","first-page":"532","DOI":"10.1093\/nar\/gkg161","article-title":"Structural classification of zinc fingers: survey and summary","volume":"31","author":"Krishna","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2025011021305339000_btae740-B14","doi-asserted-by":"crossref","first-page":"8445","DOI":"10.1038\/s41467-023-43934-4","article-title":"Merizo: a rapid and accurate protein domain segmentation method using invariant point attention","volume":"14","author":"Lau","year":"2023","journal-title":"Nat Commun"},{"key":"2025011021305339000_btae740-B15","doi-asserted-by":"crossref","first-page":"2775","DOI":"10.1038\/s41467-024-46808-5","article-title":"PLMSearch: protein language model powers accurate and fast sequence search for remote homology","volume":"15","author":"Liu","year":"2024","journal-title":"Nat Commun"},{"key":"2025011021305339000_btae740-B16","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/s10115-007-0088-0","article-title":"On the use of structure and sequence-based features for protein classification and retrieval","volume":"14","author":"Marsolo","year":"2008","journal-title":"Knowl Inf Syst"},{"key":"2025011021305339000_btae740-B17","doi-asserted-by":"crossref","first-page":"4301","DOI":"10.1093\/bioinformatics\/btac527","article-title":"Human mitochondrial protein complexes revealed by large-scale coevolution analysis and deep learning-based structure modeling","volume":"38","author":"Pei","year":"2022","journal-title":"Bioinformatics"},{"key":"2025011021305339000_btae740-B18","doi-asserted-by":"crossref","first-page":"vbad064","DOI":"10.1093\/bioadv\/vbad064","article-title":"Nightingale: web components for protein feature visualization","volume":"3","author":"Salazar","year":"2023","journal-title":"Bioinform Adv"},{"key":"2025011021305339000_btae740-B19","doi-asserted-by":"crossref","first-page":"e2214069120","DOI":"10.1073\/pnas.2214069120","article-title":"Classification of domains in predicted structures of the human proteome","volume":"120","author":"Schaeffer","year":"2023","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025011021305339000_btae740-B20","doi-asserted-by":"crossref","first-page":"e45","DOI":"10.1002\/cpbi.45","article-title":"Searching ECOD for homologous domains by sequence and structure","volume":"61","author":"Schaeffer","year":"2018","journal-title":"Curr Protoc Bioinformatics"},{"key":"2025011021305339000_btae740-B21","doi-asserted-by":"crossref","first-page":"W431","DOI":"10.1093\/nar\/gkab314","article-title":"Mol* viewer: modern web app for 3D visualization and analysis of large biomolecular structures","volume":"49","author":"Sehnal","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025011021305339000_btae740-B22","doi-asserted-by":"crossref","first-page":"D266","DOI":"10.1093\/nar\/gkaa1079","article-title":"CATH: increased structural coverage of functional space","volume":"49","author":"Sillitoe","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025011021305339000_btae740-B23","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1093\/nar\/26.1.320","article-title":"Pfam: multiple sequence alignments and HMM-profiles of protein domains","volume":"26","author":"Sonnhammer","year":"1998","journal-title":"Nucleic Acids Res"},{"key":"2025011021305339000_btae740-B24","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2025011021305339000_btae740-B25","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1186\/s12859-019-3019-7","article-title":"HH-suite3 for fast remote homology detection and deep protein annotation","volume":"20","author":"Steinegger","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2025011021305339000_btae740-B26","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1038\/s41587-023-01773-0","article-title":"Fast and accurate protein structure search with foldseek","volume":"42","author":"van Kempen","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2025011021305339000_btae740-B27","doi-asserted-by":"crossref","first-page":"D439","DOI":"10.1093\/nar\/gkab1061","article-title":"AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models","volume":"50","author":"Varadi","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025011021305339000_btae740-B28","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1016\/j.sbi.2004.03.011","article-title":"Structure, function and evolution of multidomain proteins","volume":"14","author":"Vogel","year":"2004","journal-title":"Curr Opin Struct Biol"},{"key":"2025011021305339000_btae740-B29","doi-asserted-by":"publisher","DOI":"10.1101\/2023.07.19.549732","article-title":"Chainsaw: protein domain segmentation with fully convolutional neural networks","author":"Wells","year":"2023"},{"key":"2025011021305339000_btae740-B30","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1017\/S0033583503003901","article-title":"Prediction of protein function from protein sequence and structure","volume":"36","author":"Whisstock","year":"2003","journal-title":"Q Rev Biophys"},{"key":"2025011021305339000_btae740-B31","doi-asserted-by":"crossref","first-page":"709","DOI":"10.1038\/nrm2762","article-title":"Structural and functional constraints in the evolution of protein families","volume":"10","author":"Worth","year":"2009","journal-title":"Nat Rev Mol Cell Biol"},{"key":"2025011021305339000_btae740-B32","doi-asserted-by":"crossref","first-page":"e4548","DOI":"10.1002\/pro.4548","article-title":"DPAM: a domain parser for AlphaFold models","volume":"32","author":"Zhang","year":"2023","journal-title":"Protein Sci"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae740\/61156925\/btae740.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/1\/btae740\/61156925\/btae740.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/1\/btae740\/61156925\/btae740.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,10]],"date-time":"2025-01-10T21:31:18Z","timestamp":1736544678000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae740\/7923418"}},"subtitle":[],"editor":[{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,12,13]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,12,26]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae740","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,1]]},"published":{"date-parts":[[2024,12,13]]},"article-number":"btae740"}}