{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T10:53:04Z","timestamp":1740135184346,"version":"3.37.3"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,10,23]],"date-time":"2021-10-23T00:00:00Z","timestamp":1634947200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,23]],"date-time":"2021-10-23T00:00:00Z","timestamp":1634947200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Higher Education Commission, Mauritius"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>The wealth of biological information available nowadays in public databases has triggered an unprecedented rise in multi-database search and data retrieval for obtaining detailed information about key functional and structural entities. This concerns investigations ranging from gene or genome analysis to protein structural analysis. However, the retrieval of interconnected data from a number of different databases is very often done repeatedly in an unsystematic way.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>Here, we present TAxonomy, Gene, Ontology, Protein, Structure INtegrated (TAGOPSIN), a command line program written in Java for rapid and systematic retrieval of select data from seven of the most popular public biological databases relevant to comparative genomics and protein structure studies. The program allows a user to retrieve organism-centred data and assemble them in a single data warehouse which constitutes a useful resource for several biological applications. TAGOPSIN was tested with a number of organisms encompassing eukaryotes, prokaryotes and viruses. For example, it successfully integrated data for about 17,000 UniProt entries of <jats:italic>Homo sapiens<\/jats:italic> and 21 UniProt entries of human coronavirus.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>TAGOPSIN demonstrates efficient data integration whereby manipulation of interconnected data is more convenient than doing multi-database queries. The program facilitates for instance interspecific comparative analyses of protein-coding genes in a molecular evolutionary study, or identification of taxa-specific protein domains and three-dimensional structures. TAGOPSIN is available as a JAR file at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/ebundhoo\/TAGOPSIN\">https:\/\/github.com\/ebundhoo\/TAGOPSIN<\/jats:ext-link> and is released under the GNU General Public License.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-021-04429-5","type":"journal-article","created":{"date-parts":[[2021,10,23]],"date-time":"2021-10-23T18:02:25Z","timestamp":1635012145000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["TAGOPSIN: collating taxa-specific gene and protein functional and structural information"],"prefix":"10.1186","volume":"22","author":[{"given":"Eshan","family":"Bundhoo","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8273-6000","authenticated-orcid":false,"given":"Anisah W.","family":"Ghoorah","sequence":"additional","affiliation":[]},{"given":"Yasmina","family":"Jaufeerally-Fakim","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,10,23]]},"reference":[{"issue":"D1","key":"4429_CR1","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1093\/nar\/gkz899","volume":"48","author":"EW Sayers","year":"2020","unstructured":"Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I. GenBank. Nucleic Acids Res. 2020;48(D1):84\u20136.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"4429_CR2","doi-asserted-by":"publisher","first-page":"506","DOI":"10.1093\/nar\/gky1049","volume":"47","author":"The UniProt Consortium","year":"2019","unstructured":"The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47(D1):506\u201315.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"4429_CR3","doi-asserted-by":"publisher","first-page":"464","DOI":"10.1093\/nar\/gky1004","volume":"47","author":"SK Burley","year":"2019","unstructured":"Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Di Costanzo L, Christie C, Dalenberg K, Duarte JM, Dutta S, Feng Z, Ghosh S, Goodsell DS, Green RK, Guranovic V, Guzenko D, Hudson BP, Kalro T, Liang Y, Lowe R, Namkoong H, Peisach E, Periskova I, Prlic A, Randle C, Rose A, Rose P, Sala R, Sekharan M, Shao C, Tan L, Tao YP, Valasatava Y, Voigt M, Westbrook J, Woo J, Yang H, Young J, Zhuravleva M, Zardecki C. RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Res. 2019;47(D1):464\u201374.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"4429_CR4","doi-asserted-by":"publisher","first-page":"427","DOI":"10.1093\/nar\/gky995","volume":"47","author":"S El-Gebali","year":"2019","unstructured":"El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer ELL, Hirsh L, Paladin L, Piovesan D, Tosatto SCE, Finn RD. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47(D1):427\u201332.","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"4429_CR5","doi-asserted-by":"publisher","first-page":"1077","DOI":"10.1093\/gbe\/evz049","volume":"11","author":"IM Velsko","year":"2019","unstructured":"Velsko IM, Perez MS, Richards VP. Resolving phylogenetic relationships for Streptococcus mitis and Streptococcus oralis through core- and pan-genome analyses. Genome Biol Evol. 2019;11(4):1077\u201387.","journal-title":"Genome Biol Evol"},{"issue":"9","key":"4429_CR6","doi-asserted-by":"publisher","first-page":"2255","DOI":"10.1093\/gbe\/evy178","volume":"10","author":"C Liu","year":"2018","unstructured":"Liu C, Wright B, Allen-Vercoe E, Gu H, Beiko R. Phylogenetic clustering of genes reveals shared evolutionary trajectories and putative gene functions. Genome Biol Evol. 2018;10(9):2255\u201365.","journal-title":"Genome Biol Evol"},{"key":"4429_CR7","doi-asserted-by":"publisher","first-page":"2753","DOI":"10.3389\/fmicb.2018.02753","volume":"9","author":"R Coates-Brown","year":"2018","unstructured":"Coates-Brown R, Moran JC, Pongchaikul P, Darby AC, Horsburgh MJ. Comparative genomics of staphylococcus reveals determinants of speciation and diversification of antimicrobial defense. Front Microbiol. 2018;9:2753.","journal-title":"Front Microbiol"},{"issue":"1","key":"4429_CR8","doi-asserted-by":"publisher","first-page":"1437","DOI":"10.1038\/s41598-018-19944-4","volume":"8","author":"S Sandhaus","year":"2018","unstructured":"Sandhaus S, Chapagain PP, Tse-Dinh YC. Discovery of novel bacterial topoisomerase I inhibitors by use of in silico docking and in vitro assays. Sci Rep. 2018;8(1):1437.","journal-title":"Sci Rep"},{"issue":"10","key":"4429_CR9","doi-asserted-by":"publisher","first-page":"2217","DOI":"10.3390\/ijms18102217","volume":"18","author":"G Nitulescu","year":"2017","unstructured":"Nitulescu G, Nicorescu IM, Olaru OT, Ungurianu A, Mihai DP, Zanfirescu A, Nitulescu GM, Margina D. Molecular docking and screening studies of new natural sortase A inhibitors. Int J Mol Sci. 2017;18(10):2217.","journal-title":"Int J Mol Sci"},{"issue":"12","key":"4429_CR10","doi-asserted-by":"publisher","first-page":"0168035","DOI":"10.1371\/journal.pone.0168035","volume":"11","author":"D Talens-Perales","year":"2016","unstructured":"Talens-Perales D, Gorska A, Huson DH, Polaina J, Marin-Navarro J. Analysis of domain architecture and phylogenetics of family 2 glycoside hydrolases (GH2). PLoS ONE. 2016;11(12):0168035.","journal-title":"PLoS ONE"},{"issue":"1","key":"4429_CR11","first-page":"00744","volume":"5","author":"NF Goodacre","year":"2013","unstructured":"Goodacre NF, Gerloff DL, Uetz P. Protein domains of unknown function are essential in bacteria. MBio. 2013;5(1):00744\u201300713.","journal-title":"MBio"},{"issue":"13","key":"4429_CR12","doi-asserted-by":"publisher","first-page":"00046-18","DOI":"10.1128\/JB.00046-18","volume":"200","author":"MA Jorgenson","year":"2018","unstructured":"Jorgenson MA, Young KD. YtfB, an OapA domain-containing protein, is a new cell division protein in Escherichia coli. J Bacteriol. 2018;200(13):00046\u201318.","journal-title":"J Bacteriol"},{"issue":"D1","key":"4429_CR13","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1093\/nar\/gkz899","volume":"48","author":"EW Sayers","year":"2020","unstructured":"Sayers EW, Beck J, Brister JR, Bolton EE, Canese K, Comeau DC, Funk K, Ketter A, Kim S, Kimchi A, Kitts PA, Kuznetsov A, Lathrop S, Lu Z, McGarvey K, Madden TL, Murphy TD, O\u2019Leary N, Phan L, Schneider VA, Thibaud-Nissen F, Trawick BW, Pruitt KD, Ostell J. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2020;48(D1):9\u201316.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"4429_CR14","doi-asserted-by":"publisher","first-page":"482","DOI":"10.1093\/nar\/gky1114","volume":"47","author":"JM Dana","year":"2019","unstructured":"Dana JM, Gutmanas A, Tyagi N, Qi G, O\u2019Donovan C, Martin M, Velankar S. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Res. 2019;47(D1):482\u20139.","journal-title":"Nucleic Acids Res"},{"key":"4429_CR15","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1016\/S0076-6879(96)66012-1","volume":"266","author":"GD Schuler","year":"1996","unstructured":"Schuler GD, Epstein JA, Ohkawa H, Kans JA. Entrez: molecular biology database and retrieval system. Meth Enzymol. 1996;266:141\u201362.","journal-title":"Meth Enzymol"},{"issue":"D1","key":"4429_CR16","doi-asserted-by":"publisher","first-page":"520","DOI":"10.1093\/nar\/gky949","volume":"47","author":"wwPDB consortium","year":"2019","unstructured":"wwPDB consortium. Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. 2019;47(D1):520\u20138.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"4429_CR17","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1093\/nar\/gky1055","volume":"47","author":"The Gene Ontology Consortium","year":"2019","unstructured":"The Gene Ontology Consortium. The Gene Ontology resource: 20 years and still GOing strong. Nucleic Acids Res. 2019;47(D1):330\u20138.","journal-title":"Nucleic Acids Res"},{"key":"4429_CR18","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1186\/1745-6150-5-52","volume":"5","author":"A Termanini","year":"2010","unstructured":"Termanini A, Tieri P, Franceschi C. Encoding the states of interacting proteins to facilitate biological pathways reconstruction. Biol Direct. 2010;5:52.","journal-title":"Biol Direct"},{"key":"4429_CR19","doi-asserted-by":"publisher","first-page":"13210","DOI":"10.1038\/srep13210","volume":"5","author":"H Luo","year":"2015","unstructured":"Luo H, Gao F, Lin Y. Evolutionary conservation analysis between the essential and nonessential genes in bacterial genomes. Sci Rep. 2015;5:13210.","journal-title":"Sci Rep"},{"key":"4429_CR20","doi-asserted-by":"publisher","first-page":"15328","DOI":"10.1038\/srep15328","volume":"5","author":"W Hong","year":"2015","unstructured":"Hong W, Wang Y, Chang Z, Yang Y, Pu J, Sun T, Kaur S, Sacchettini JC, Jung H, Lin Wong W, Fah Yap L, Fong Ngeow Y, Paterson IC, Wang H. The identification of novel Mycobacterium tuberculosis DHFR inhibitors and the investigation of their binding preferences by using molecular modelling. Sci Rep. 2015;5:15328.","journal-title":"Sci Rep"},{"issue":"Database issue","key":"4429_CR21","doi-asserted-by":"publisher","first-page":"136","DOI":"10.1093\/nar\/gkr1178","volume":"40","author":"S Federhen","year":"2012","unstructured":"Federhen S. The NCBI Taxonomy database. Nucleic Acids Res. 2012;40(Database issue):136\u201343.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"4429_CR22","doi-asserted-by":"publisher","first-page":"851","DOI":"10.1093\/nar\/gkx1068","volume":"46","author":"DH Haft","year":"2018","unstructured":"Haft DH, DiCuccio M, Badretdin A, Brover V, Chetvernin V, O\u2019Neill K, Li W, Chitsaz F, Derbyshire MK, Gonzales NR, Gwadz M, Lu F, Marchler GH, Song JS, Thanki N, Yamashita RA, Zheng C, Thibaud-Nissen F, Geer LY, Marchler-Bauer A, Pruitt KD. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 2018;46(D1):851\u201360.","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"4429_CR23","doi-asserted-by":"publisher","first-page":"160","DOI":"10.1101\/gr.1645104","volume":"14","author":"A Kasprzyk","year":"2004","unstructured":"Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C, Hammond M, Rocca-Serra P, Cox T, Birney E. EnsMart: a generic system for fast and flexible access to biological data. Genome Res. 2004;14(1):160\u20139.","journal-title":"Genome Res"},{"key":"4429_CR24","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1186\/1471-2105-6-34","volume":"6","author":"SP Shah","year":"2005","unstructured":"Shah SP, Huang Y, Xu T, Yuen MM, Ling J, Ouellette BF. Atlas\u2013a data warehouse for integrative bioinformatics. BMC Bioinform. 2005;6:34.","journal-title":"BMC Bioinform"},{"key":"4429_CR25","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1186\/1471-2105-6-81","volume":"6","author":"S Trissl","year":"2005","unstructured":"Trissl S, Rother K, Muller H, Steinke T, Koch I, Preissner R, Frommel C, Leser U. Columba: an integrated database of proteins, structures, and annotations. BMC Bioinform. 2005;6:81.","journal-title":"BMC Bioinform"},{"issue":"Database issue","key":"4429_CR26","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1093\/nar\/gkj153","volume":"34","author":"A Birkland","year":"2006","unstructured":"Birkland A, Yona G. BIOZON: a hub of heterogeneous biological data. Nucleic Acids Res. 2006;34(Database issue):235\u201342.","journal-title":"Nucleic Acids Res"},{"key":"4429_CR27","doi-asserted-by":"publisher","first-page":"170","DOI":"10.1186\/1471-2105-7-170","volume":"7","author":"TJ Lee","year":"2006","unstructured":"Lee TJ, Pouliot Y, Wagner V, Gupta P, Stringer-Calvert DW, Tenenbaum JD, Karp PD. BioWarehouse: a bioinformatics database warehouse toolkit. BMC Bioinform. 2006;7:170.","journal-title":"BMC Bioinform"},{"key":"4429_CR28","doi-asserted-by":"publisher","first-page":"051","DOI":"10.1093\/database\/bat051","volume":"2013","author":"R Vera","year":"2013","unstructured":"Vera R, Perez-Riverol Y, Perez S, Ligeti B, Kertesz-Farkas A, Pongor S. JBioWH: an open-source Java framework for bioinformatics data integration. Database (Oxford). 2013;2013:051.","journal-title":"Database (Oxford)"},{"key":"4429_CR29","doi-asserted-by":"publisher","DOI":"10.1101\/016758","author":"P Pareja-Tobes","year":"2015","unstructured":"Pareja-Tobes P, Tobes R, Manrique M, Pareja E, Pareja-Tobes E. Bio4j: a high-performance cloud-enabled graph-based data platform. bioRxiv. 2015. https:\/\/doi.org\/10.1101\/016758.","journal-title":"bioRxiv"},{"key":"4429_CR30","unstructured":"Neo4j, Inc.: Neo4j Graph Platform\u2014the Leader in Graph Databases. https:\/\/neo4j.com Accessed 10 Oct 2019"},{"issue":"1","key":"4429_CR31","first-page":"025","volume":"2017","author":"TE Putman","year":"2017","unstructured":"Putman TE, Lelong S, Burgstaller-Muehlbacher S, Waagmeester A, Diesh C, Dunn N, Munoz-Torres M, Stupp GS, Wu C, Su AI, Good BM. WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata. Database (Oxford). 2017;2017(1):025.","journal-title":"Database (Oxford)."},{"issue":"1","key":"4429_CR32","doi-asserted-by":"publisher","first-page":"19","DOI":"10.5808\/GI.2017.15.1.19","volume":"15","author":"BH Yoon","year":"2017","unstructured":"Yoon BH, Kim SK, Kim SY. Use of graph database for the integration of heterogeneous biological data. Genomics Inform. 2017;15(1):19\u201327.","journal-title":"Genomics Inform"},{"key":"4429_CR33","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1186\/s13040-016-0102-8","volume":"9","author":"A Lysenko","year":"2016","unstructured":"Lysenko A, Roznov\u01ce\u1e6d IA, Saqi M, Mazein A, Rawlings CJ, Auffray C. Representing and querying disease networks using graph databases. BioData Min. 2016;9:23.","journal-title":"BioData Min"},{"key":"4429_CR34","doi-asserted-by":"publisher","first-page":"882","DOI":"10.1186\/1471-2164-15-882","volume":"15","author":"J Bohlin","year":"2014","unstructured":"Bohlin J, Brynildsrud OB, Sekse C, Snipen L. An evolutionary analysis of genome expansion and pathogenicity in Escherichia coli. BMC Genomics. 2014;15:882.","journal-title":"BMC Genomics"},{"issue":"1","key":"4429_CR35","doi-asserted-by":"publisher","first-page":"196","DOI":"10.1186\/s12864-019-5568-6","volume":"20","author":"VK Sharma","year":"2019","unstructured":"Sharma VK, Akavaram S, Schaut RG, Bayles DO. Comparative genomics reveals structural and functional features specific to the genome of a foodborne Escherichia coli O157:H7. BMC Genomics. 2019;20(1):196.","journal-title":"BMC Genomics"},{"issue":"18","key":"4429_CR36","doi-asserted-by":"publisher","first-page":"4348","DOI":"10.3390\/ijms20184348","volume":"20","author":"D Latek","year":"2019","unstructured":"Latek D, Langer I, Krzysko K, Charzewski L. A molecular dynamics study of vasoactive intestinal peptide receptor 1 and the basis of its therapeutic antagonism. Int J Mol Sci. 2019;20(18):4348.","journal-title":"Int J Mol Sci"},{"issue":"D1","key":"4429_CR37","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1093\/nar\/gkw1092","volume":"45","author":"M Kanehisa","year":"2017","unstructured":"Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45(D1):353\u201361.","journal-title":"Nucleic Acids Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04429-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-021-04429-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04429-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,10,23]],"date-time":"2021-10-23T18:02:39Z","timestamp":1635012159000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-04429-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,23]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["4429"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-04429-5","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2021,10,23]]},"assertion":[{"value":"13 November 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 October 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 October 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"517"}}