{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:31Z","timestamp":1772138071300,"version":"3.50.1"},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2025,2,24]],"date-time":"2025-02-24T00:00:00Z","timestamp":1740355200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100009619","name":"Japan Agency for Medical Research and Development","doi-asserted-by":"publisher","award":["JP21ae0121040"],"award-info":[{"award-number":["JP21ae0121040"]}],"id":[{"id":"10.13039\/100009619","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japanese Society for the Promotion of Science","doi-asserted-by":"crossref","award":["22H00477"],"award-info":[{"award-number":["22H00477"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Identifying bacteriophages (phages) within metagenomic sequences is essential for understanding microbial community dynamics. Transformer-based foundation models have been successfully employed to address various biological challenges. However, these models are typically pre-trained with self-supervised tasks that do not consider label variance in the pre-training data. This presents a challenge for phage identification as pre-training on mixed bacterial and phage data may lead to information bias due to the imbalance between bacterial and phage samples.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>To overcome this limitation, we proposed a novel conditional BERT framework that incorporates label classes as special tokens during pre-training. Specifically, our conditional BERT model attaches labels directly during tokenization, introducing label constraints into the model\u2019s input. Additionally, we introduced a new fine-tuning scheme that enables the conditional BERT to be effectively utilized for classification tasks. This framework allows the BERT model to acquire label-specific contextual representations from mixed sequence data during pre-training and applies the conditional BERT as a classifier during fine-tuning, and we named the fine-tuned model as PharaCon. We evaluated PharaCon against several existing methods on both simulated sequence datasets and real metagenomic contig datasets. The results demonstrate PharaCon\u2019s effectiveness and efficiency in phage identification, highlighting the advantages of incorporating label information during both pre-training and fine-tuning.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The source code and associated data can be accessed at https:\/\/github.com\/Celestial-Bai\/PharaCon.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf085","type":"journal-article","created":{"date-parts":[[2025,2,20]],"date-time":"2025-02-20T07:17:35Z","timestamp":1740035855000},"source":"Crossref","is-referenced-by-count":0,"title":["PharaCon: a new framework for identifying bacteriophages via conditional representation learning"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4921-0936","authenticated-orcid":false,"given":"Zeheng","family":"Bai","sequence":"first","affiliation":[{"name":"Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo , 4-6-1, Shirokanedai , Minato-ku, Tokyo, 108-8639,","place":["Japan"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5598-2521","authenticated-orcid":false,"given":"Yao-zhong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo , 4-6-1, Shirokanedai , Minato-ku, Tokyo, 108-8639,","place":["Japan"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6778-7034","authenticated-orcid":false,"given":"Yuxuan","family":"Pang","sequence":"additional","affiliation":[{"name":"Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo , 4-6-1, Shirokanedai , Minato-ku, Tokyo, 108-8639,","place":["Japan"]}]},{"given":"Seiya","family":"Imoto","sequence":"additional","affiliation":[{"name":"Division of Health Medical Intelligence, Human Genome Center, The Institute of Medical Science, The University of Tokyo , 4-6-1, Shirokanedai , Minato-ku, Tokyo, 108-8639,","place":["Japan"]},{"name":"Collaborative Research Institute for Innovative Microbiology, The University of Tokyo , 1-1-1, Yayoi , Bunkyo-ku, Tokyo, 113-8657,","place":["Japan"]}]}],"member":"286","published-online":{"date-parts":[[2025,2,24]]},"reference":[{"key":"2025032208012845200_btaf085-B1","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1038\/s41587-020-0603-3","article-title":"A unified catalog of 204,938 reference genomes from the human gut microbiome","volume":"39","author":"Almeida","year":"2021","journal-title":"Nature Biotechnology"},{"key":"2025032208012845200_btaf085-B2","doi-asserted-by":"crossref","first-page":"e121","DOI":"10.1093\/nar\/gkaa856","article-title":"Seeker: alignment-free identification of bacteriophage genomes by deep learning","volume":"48","author":"Auslander","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2025032208012845200_btaf085-B3","doi-asserted-by":"crossref","first-page":"4264","DOI":"10.1093\/bioinformatics\/btac509","article-title":"Identification of bacteriophage genome sequences with representation learning","volume":"38","author":"Bai","year":"2022","journal-title":"Bioinformatics"},{"key":"2025032208012845200_btaf085-B4","doi-asserted-by":"crossref","first-page":"1303","DOI":"10.1038\/s41587-023-01953-y","article-title":"Identification of mobile genetic elements with genomad","volume":"42","author":"Camargo","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2025032208012845200_btaf085-B5","first-page":"287","volume-title":"Nat Methods","author":"Dalla-Torre"},{"key":"2025032208012845200_btaf085-B6","author":"Devlin","year":"2018."},{"key":"2025032208012845200_btaf085-B7","author":"Dosovitskiy","year":"2020"},{"key":"2025032208012845200_btaf085-B8","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2025032208012845200_btaf085-B9","doi-asserted-by":"crossref","first-page":"724","DOI":"10.1016\/j.chom.2020.08.003","article-title":"The gut virome database reveals age-dependent patterns of virome diversity in the human gut","volume":"28","author":"Gregory","year":"2020","journal-title":"Cell Host Microbe"},{"key":"2025032208012845200_btaf085-B10","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1186\/s40168-020-00990-y","article-title":"VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses","volume":"9","author":"Guo","year":"2021","journal-title":"Microbiome"},{"key":"2025032208012845200_btaf085-B11","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1186\/1471-2105-11-119","article-title":"Prodigal: prokaryotic gene recognition and translation initiation site identification","volume":"11","author":"Hyatt","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2025032208012845200_btaf085-B12","doi-asserted-by":"crossref","first-page":"2112","DOI":"10.1093\/bioinformatics\/btab083","article-title":"DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome","volume":"37","author":"Ji","year":"2021","journal-title":"Bioinformatics"},{"key":"2025032208012845200_btaf085-B13","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1186\/s40168-020-00867-0","article-title":"VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences","volume":"8","author":"Kieft","year":"2020","journal-title":"Microbiome"},{"key":"2025032208012845200_btaf085-B14","doi-asserted-by":"crossref","first-page":"3772","DOI":"10.1109\/TCBB.2023.3322870","article-title":"Identifying phage sequences from metagenomic data using deep neural network with word embedding and attention mechanism","volume":"20","author":"Ma","year":"2023","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2025032208012845200_btaf085-B15","author":"Marin","year":"2024"},{"key":"2025032208012845200_btaf085-B16","doi-asserted-by":"crossref","first-page":"W20","DOI":"10.1093\/nar\/gkh435","article-title":"Blast: at the core of a powerful and diverse set of sequence analysis tools","volume":"32","author":"McGinnis","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2025032208012845200_btaf085-B17","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: the protein families database in 2021","volume":"49","author":"Mistry","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025032208012845200_btaf085-B18","author":"Ng","year":"2017"},{"key":"2025032208012845200_btaf085-B19","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1007\/s40484-019-0187-4","article-title":"Identifying viruses from metagenomic data using deep learning","volume":"8","author":"Ren","year":"2020","journal-title":"Quant Biol"},{"key":"2025032208012845200_btaf085-B20","doi-asserted-by":"crossref","first-page":"e2023202118","DOI":"10.1073\/pnas.2023202118","article-title":"A catalog of tens of thousands of viruses from human metagenomes reveals hidden associations with chronic diseases","volume":"118","author":"Tisza","year":"2021","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025032208012845200_btaf085-B21","doi-asserted-by":"crossref","first-page":"veaa100","DOI":"10.1093\/ve\/veaa100","article-title":"Cenote-taker 2 democratizes virus discovery and sequence annotation","volume":"7","author":"Tisza","year":"2021","journal-title":"Virus Evol"},{"key":"2025032208012845200_btaf085-B22","first-page":"5998","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS 2017), Vol. 30.","author":"Vaswani"},{"key":"2025032208012845200_btaf085-B23","first-page":"84","author":"Wu","year":"2019"},{"key":"2025032208012845200_btaf085-B24","author":"Zhou","year":"2023"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf085\/62111828\/btaf085.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/3\/btaf085\/62111828\/btaf085.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/3\/btaf085\/62111828\/btaf085.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,23]],"date-time":"2025-03-23T07:20:00Z","timestamp":1742714400000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf085\/8037845"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,2,24]]},"references-count":24,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,3,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf085","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.06.16.599237","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,3]]},"published":{"date-parts":[[2025,2,24]]},"article-number":"btaf085"}}