{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T11:02:51Z","timestamp":1776078171075,"version":"3.50.1"},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T00:00:00Z","timestamp":1752537600000},"content-version":"vor","delay-in-days":14,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Partnership under Joint Call 2022"},{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","award":["GA N\u00b0101069750"],"award-info":[{"award-number":["GA N\u00b0101069750"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Norway, France, Czech Republic, and Saxon"},{"DOI":"10.13039\/100020806","name":"ANR","doi-asserted-by":"publisher","award":["ANR-23-CETP-0002"],"award-info":[{"award-number":["ANR-23-CETP-0002"]}],"id":[{"id":"10.13039\/100020806","id-type":"DOI","asserted-by":"publisher"}]},{"name":"International Society for Computational Biology"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Taxonomic analysis of environmental microbial communities is now routinely performed thanks to advances in DNA sequencing. Determining the role of these communities in global biogeochemical cycles requires the identification of their metabolic functions, such as hydrogen oxidation, sulfur reduction, and carbon fixation. These functions can be directly inferred from metagenomics data, but in many environmental applications metabarcoding is still the method of choice. The reconstruction of metabolic functions from metabarcoding data and their integration into coarse-grained representations of biogeochemical cycles remains a difficult bioinformatics problem today.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We developed a pipeline, called Tabigecy, which exploits taxonomic affiliations to predict metabolic functions constituting biogeochemical cycles. In a first step, Tabigecy uses the tool EsMeCaTa to predict consensus proteomes from input affiliations. To optimize this process, we generated a precomputed database containing information about 2404 taxa from UniProt. The consensus proteomes are searched using bigecyhmm, a newly developed Python package relying on Hidden Markov Models to identify key enzymes involved in metabolic function of biogeochemical cycles. The metabolic functions are then projected on coarse-grained representation of the cycles. We applied Tabigecy to two salt cavern datasets and validated its predictions with microbial activity and hydrochemistry measurements performed on the samples. The results highlight the utility of the approach to investigate the impact of microbial communities on biogeochemical processes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The Tabigecy pipeline is available at https:\/\/github.com\/ArnaudBelcour\/tabigecy. The Python package bigecyhmm and the precomputed EsMeCaTa database are also separately available at https:\/\/github.com\/ArnaudBelcour\/bigecyhmm and https:\/\/doi.org\/10.5281\/zenodo.13354073, respectively.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf230","type":"journal-article","created":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T07:38:25Z","timestamp":1745825905000},"page":"i49-i57","source":"Crossref","is-referenced-by-count":2,"title":["Predicting coarse-grained representations of biogeochemical cycles from metabarcoding data"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1170-0785","authenticated-orcid":false,"given":"Arnaud","family":"Belcour","sequence":"first","affiliation":[{"name":"Univ. Grenoble Alpes, Inria , 38000 Grenoble,","place":["France"]},{"name":"Universit\u00e9 Grenoble Alpes, CNRS, LIPhy , 38000 Grenoble,","place":["France"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Loris","family":"Megy","sequence":"additional","affiliation":[{"name":"Gricad, Inria, CNRS, Universit\u00e9 Grenoble Alpes, Grenoble INP , 38000 Grenoble,","place":["France"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sylvain","family":"Stephant","sequence":"additional","affiliation":[{"name":"French Geological Survey (BRGM) , 45060 Orl\u00e9ans,","place":["France"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Caroline","family":"Michel","sequence":"additional","affiliation":[{"name":"French Geological Survey (BRGM) , 45060 Orl\u00e9ans,","place":["France"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"S\u00e9tareh","family":"Rad","sequence":"additional","affiliation":[{"name":"French Geological Survey (BRGM) , 45060 Orl\u00e9ans,","place":["France"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Petra","family":"Bombach","sequence":"additional","affiliation":[{"name":"Isodetect GmbH , 04103 Leipzig,","place":["Germany"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nicole","family":"Dopffel","sequence":"additional","affiliation":[{"name":"NORCE Norwegian Research Center AS , 5008 Bergen,","place":["Norway"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hidde","family":"de Jong","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, Inria , 38000 Grenoble,","place":["France"]},{"name":"Universit\u00e9 Grenoble Alpes, CNRS, LIPhy , 38000 Grenoble,","place":["France"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Delphine","family":"Ropers","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, Inria , 38000 Grenoble,","place":["France"]},{"name":"Universit\u00e9 Grenoble Alpes, CNRS, LIPhy , 38000 Grenoble,","place":["France"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,7,15]]},"reference":[{"key":"2025071509030091200_btaf230-B1","doi-asserted-by":"crossref","first-page":"W83","DOI":"10.1093\/nar\/gkae410","article-title":"The galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update","volume":"52","author":"Abueg","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025071509030091200_btaf230-B2","doi-asserted-by":"crossref","first-page":"2629","DOI":"10.1128\/JB.186.9.2629-2635.2004","article-title":"Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons","volume":"186","author":"Acinas","year":"2004","journal-title":"J Bacteriol"},{"key":"2025071509030091200_btaf230-B3","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1038\/s41579-024-01110-5","article-title":"Anthropogenic impacts on the terrestrial subsurface biosphere","volume":"23","author":"Amundson","year":"2025","journal-title":"Nat Rev Microbiol"},{"key":"2025071509030091200_btaf230-B4","doi-asserted-by":"crossref","first-page":"D609","DOI":"10.1093\/nar\/gkae1010","article-title":"UniProt: the universal protein knowledgebase in 2025","volume":"53","author":"Bateman","year":"2025","journal-title":"Nucleic Acids Res"},{"key":"2025071509030091200_btaf230-B5","doi-asserted-by":"publisher","author":"Belcour","year":"2025","DOI":"10.1101\/2022.03.16.484574"},{"key":"2025071509030091200_btaf230-B6","doi-asserted-by":"crossref","first-page":"bbab318","DOI":"10.1093\/bib\/bbab318","article-title":"FROGS: a powerful tool to analyse the diversity of fungi with special management of internal transcribed spacers","volume":"22","author":"Bernard","year":"2021","journal-title":"Brief Bioinform"},{"key":"2025071509030091200_btaf230-B7","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1016\/j.ibiod.2012.08.001","article-title":"Microbial community structure and microbial activities related to CO2 storage capacities of a salt cavern","volume":"81","author":"Bordenave","year":"2013","journal-title":"Int Biodeterior Biodegradation"},{"key":"2025071509030091200_btaf230-B8","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1186\/1471-2105-10-421","article-title":"BLAST+: architecture and applications","volume":"10","author":"Camacho","year":"2009","journal-title":"BMC Bioinform"},{"key":"2025071509030091200_btaf230-B9","doi-asserted-by":"crossref","first-page":"5825","DOI":"10.1093\/molbev\/msab293","article-title":"eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale","volume":"38","author":"Cantalapiedra","year":"2021","journal-title":"Mol Biol Evol"},{"key":"2025071509030091200_btaf230-B10","doi-asserted-by":"crossref","first-page":"e24","DOI":"10.1371\/journal.pcbi.0010024","article-title":"Bioinformatics for whole-genome shotgun sequencing of microbial communities","volume":"1","author":"Chen","year":"2005","journal-title":"PLoS Comput Biol"},{"key":"2025071509030091200_btaf230-B11","first-page":"5","article-title":"The ade4 package \u2013 I: one-table methods","volume":"4","author":"Chessel","year":"2004","journal-title":"R News"},{"key":"2025071509030091200_btaf230-B12","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1038\/nbt.3820","article-title":"Nextflow enables reproducible computational workflows","volume":"35","author":"Di Tommaso","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2025071509030091200_btaf230-B13","doi-asserted-by":"crossref","first-page":"8594","DOI":"10.1016\/j.ijhydene.2020.12.058","article-title":"Microbial side effects of underground hydrogen storage \u2013 knowledge gaps, risks and opportunities for successful implementation","volume":"46","author":"Dopffel","year":"2021","journal-title":"Int J Hydrogen Energy"},{"key":"2025071509030091200_btaf230-B14","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1038\/s41587-020-0548-6","article-title":"PICRUSt2 for prediction of metagenome functions","volume":"38","author":"Douglas","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2025071509030091200_btaf230-B15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v022.i04","article-title":"The ade4 package: implementing the duality diagram for ecologists","volume":"22","author":"Dray","year":"2007","journal-title":"J Stat Soft"},{"key":"2025071509030091200_btaf230-B16","first-page":"47","article-title":"The ade4 package \u2013 II: two-table and K-table methods","volume":"7","author":"Dray","year":"2007","journal-title":"R News"},{"key":"2025071509030091200_btaf230-B17","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated profile HMM searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLoS Comput Biol"},{"key":"2025071509030091200_btaf230-B18","doi-asserted-by":"crossref","first-page":"348","DOI":"10.3389\/fgene.2015.00348","article-title":"The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics","volume":"6","author":"Escobar-Zepeda","year":"2015","journal-title":"Front Genet"},{"key":"2025071509030091200_btaf230-B19","doi-asserted-by":"crossref","first-page":"1287","DOI":"10.1093\/bioinformatics\/btx791","article-title":"FROGS: find, rapidly, OTUs with galaxy solution","volume":"34","author":"Escudi\u00e9","year":"2018","journal-title":"Bioinformatics"},{"key":"2025071509030091200_btaf230-B20","doi-asserted-by":"crossref","first-page":"1635","DOI":"10.1093\/molbev\/msw046","article-title":"ETE 3: reconstruction, analysis, and visualization of phylogenomic data","volume":"33","author":"Huerta-Cepas","year":"2016","journal-title":"Mol Biol Evol"},{"key":"2025071509030091200_btaf230-B21","doi-asserted-by":"crossref","first-page":"D309","DOI":"10.1093\/nar\/gky1085","article-title":"0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses","volume":"5","author":"Huerta-Cepas","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025071509030091200_btaf230-B22","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1109\/MCSE.2007.55","article-title":"Matplotlib: a 2d graphics environment","volume":"9","author":"Hunter","year":"2007","journal-title":"Comput Sci Eng"},{"key":"2025071509030091200_btaf230-B23","doi-asserted-by":"publisher","author":"Kassambara","year":"2020","DOI":"10.32614\/CRAN.package.factoextra"},{"key":"2025071509030091200_btaf230-B24","doi-asserted-by":"crossref","first-page":"btad214","DOI":"10.1093\/bioinformatics\/btad214","article-title":"PyHMMER: a Python library binding to HMMER for efficient sequence analysis","volume":"39","author":"Larralde","year":"2023","journal-title":"Bioinformatics"},{"key":"2025071509030091200_btaf230-B25","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1007\/s13238-020-00724-8","article-title":"A practical guide to amplicon and metagenomic analysis of microbiome data","volume":"12","author":"Liu","year":"2021","journal-title":"Protein Cell"},{"key":"2025071509030091200_btaf230-B26","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1016\/j.syapm.2012.11.008","article-title":"New insights into the archaeal diversity of a hypersaline microbial mat obtained by a metagenomic approach","volume":"36","author":"L\u00f3pez-L\u00f3pez","year":"2013","journal-title":"Syst Appl Microbiol"},{"key":"2025071509030091200_btaf230-B27","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1093\/bioinformatics\/btab493","article-title":"Swarm v3: towards tera-scale amplicon clustering","volume":"38","author":"Mah\u00e9","year":"2021","journal-title":"Bioinformatics"},{"key":"2025071509030091200_btaf230-B28","first-page":"001203","article-title":"On the limits of 16S rRNA gene-based metagenome prediction and functional profiling","volume":"10","author":"Matchado","year":"2024","journal-title":"Microb Genom"},{"key":"2025071509030091200_btaf230-B29","doi-asserted-by":"publisher","author":"Murray","year":"2025","DOI":"10.5281\/zenodo.14586647"},{"key":"2025071509030091200_btaf230-B30","author":"P. T. Inc","year":"2015"},{"key":"2025071509030091200_btaf230-B31","doi-asserted-by":"crossref","first-page":"829","DOI":"10.1038\/s41576-024-00746-6","article-title":"Sequencing-based analysis of microbiomes","volume":"25","author":"Pinto","year":"2024","journal-title":"Nat Rev Genet"},{"key":"2025071509030091200_btaf230-B32","doi-asserted-by":"crossref","first-page":"D590","DOI":"10.1093\/nar\/gks1219","article-title":"The SILVA ribosomal RNA gene database project: improved data processing and web-based tools","volume":"41","author":"Quast","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2025071509030091200_btaf230-B33","volume-title":"R: A Language and Environment for Statistical Computing","author":"R Core Team","year":"2024"},{"key":"2025071509030091200_btaf230-B34","doi-asserted-by":"crossref","first-page":"e2584","DOI":"10.7717\/peerj.2584","article-title":"VSEARCH: a versatile open source tool for metagenomics","volume":"4","author":"Rognes","year":"2016","journal-title":"PeerJ"},{"key":"2025071509030091200_btaf230-B35","doi-asserted-by":"crossref","first-page":"e1012577","DOI":"10.1371\/journal.pcbi.1012577","article-title":"SPARTA: interpretable functional classification of microbiomes and detection of hidden cumulative effects","volume":"20","author":"Ruiz","year":"2024","journal-title":"PLoS Comput Biol"},{"key":"2025071509030091200_btaf230-B36","doi-asserted-by":"crossref","first-page":"20684","DOI":"10.1016\/j.ijhydene.2022.04.170","article-title":"Structural analysis of microbiomes from salt caverns used for underground gas storage","volume":"47","author":"Schwab","year":"2022","journal-title":"Int J Hydrogen Energy"},{"key":"2025071509030091200_btaf230-B37","doi-asserted-by":"crossref","first-page":"2128","DOI":"10.1038\/s41564-022-01266-x","article-title":"Standardized multi-omics of Earth\u2019s microbiomes reveals microbial and metabolite diversity","volume":"7","author":"Shaffer","year":"2022","journal-title":"Nat Microbiol"},{"key":"2025071509030091200_btaf230-B38","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1038\/s41579-022-00695-z","article-title":"Life and death in the soil microbiome: how ecological processes influence biogeochemistry","volume":"20","author":"Sokol","year":"2022","journal-title":"Nat Rev Microbiol"},{"key":"2025071509030091200_btaf230-B39","doi-asserted-by":"crossref","first-page":"e70030","DOI":"10.1111\/1758-2229.70030","article-title":"Hydrochemical gradients driving extremophile distribution in saline and brine groundwater of southern Poland","volume":"16","author":"S\u0142owakiewicz","year":"2024","journal-title":"Environ Microbiol Rep"},{"key":"2025071509030091200_btaf230-B40","doi-asserted-by":"crossref","first-page":"3297","DOI":"10.3390\/foods11203297","article-title":"The application of metagenomics to study microbial communities and develop desirable traits in fermented foods","volume":"11","author":"Srinivas","year":"2022","journal-title":"Foods"},{"key":"2025071509030091200_btaf230-B41","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2025071509030091200_btaf230-B42","doi-asserted-by":"crossref","first-page":"428","DOI":"10.1038\/s41579-020-0364-5","article-title":"Tara oceans: towards global ocean ecosystems biology","volume":"18","author":"Sunagawa","year":"2020","journal-title":"Nat Rev Microbiol"},{"key":"2025071509030091200_btaf230-B43","doi-asserted-by":"crossref","first-page":"2045","DOI":"10.1111\/j.1365-294X.2012.05470.x","article-title":"Towards next-generation biodiversity assessment using DNA metabarcoding","volume":"21","author":"Taberlet","year":"2012","journal-title":"Mol Ecol"},{"key":"2025071509030091200_btaf230-B44","doi-asserted-by":"publisher","author":"T. M. D. Team","year":"2024","DOI":"10.5281\/zenodo.13308876"},{"key":"2025071509030091200_btaf230-B45","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4939-8850-1","volume-title":"Multivariate Analysis of Ecological Data with ade4","author":"Thioulouse","year":"2018"},{"key":"2025071509030091200_btaf230-B46","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1038\/nature24621","article-title":"A communal catalogue reveals Earth\u2019s multiscale microbial diversity","volume":"551","author":"Thompson","year":"2017","journal-title":"Nature"},{"key":"2025071509030091200_btaf230-B47","first-page":"11","author":"Voormeij","year":"2004"},{"key":"2025071509030091200_btaf230-B48","doi-asserted-by":"crossref","first-page":"3021","DOI":"10.21105\/joss.03021","article-title":"seaborn: statistical data visualization","volume":"6","author":"Waskom","year":"2021","journal-title":"J Open Source Softw"},{"key":"2025071509030091200_btaf230-B49","doi-asserted-by":"publisher","author":"Wei","year":"2024","DOI":"10.32614\/CRAN.package.corrplot"},{"key":"2025071509030091200_btaf230-B50","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/s40793-020-00358-7","article-title":"Tax4Fun2: prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene sequences","volume":"15","author":"Wemheuer","year":"2020","journal-title":"Environ Microbiome"},{"key":"2025071509030091200_btaf230-B51","doi-asserted-by":"crossref","first-page":"2557","DOI":"10.1038\/ismej.2016.45","article-title":"Challenges in microbial ecology: building predictive understanding of community function and dynamics","volume":"10","author":"Widder","year":"2016","journal-title":"ISME J"},{"key":"2025071509030091200_btaf230-B52","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1186\/s40168-021-01213-8","article-title":"METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks","volume":"10","author":"Zhou","year":"2022","journal-title":"Microbiome"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/Supplement_1\/i49\/63745560\/btaf230.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/Supplement_1\/i49\/63745560\/btaf230.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T13:03:15Z","timestamp":1752584595000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/41\/Supplement_1\/i49\/8199388"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,1]]},"references-count":52,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2025,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf230","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2025.01.30.635649","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,7,1]]}}}