{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T02:28:59Z","timestamp":1771036139494,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":27,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,9,3]],"date-time":"2023-09-03T00:00:00Z","timestamp":1693699200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,9,3]]},"DOI":"10.1145\/3584371.3612953","type":"proceedings-article","created":{"date-parts":[[2023,10,4]],"date-time":"2023-10-04T18:52:30Z","timestamp":1696445550000},"page":"1-6","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Leveraging Large Language Models for Predicting Microbial Virulence from Protein Structure and Sequence"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-4701-2967","authenticated-orcid":false,"given":"Felix","family":"Quintana","sequence":"first","affiliation":[{"name":"Computer Science, Rice University, Houston, Texas, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3760-564X","authenticated-orcid":false,"given":"Todd","family":"Treangen","sequence":"additional","affiliation":[{"name":"Computer Science, Rice University, Houston, Texas, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0699-8038","authenticated-orcid":false,"given":"Lydia","family":"Kavraki","sequence":"additional","affiliation":[{"name":"Computer Science, Rice University, Houston, Texas, United States"}]}],"member":"320","published-online":{"date-parts":[[2023,10,4]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Oct.","author":"Amos B.","year":"2021","unstructured":"B. Amos , C. Aurrecoechea , M. Barba , A. Barreto , E. Y. Basenko , W. Ba\u017cant , R. Belnap , A. S. Blevins , U. B\u00f6hme , J. Brestelli , B. P. Brunk , M. Caddick , D. Callan , L. Campbell , M. B. Christensen , G. K. Christophides , K. Crouch , K. Davis , J. DeBarry , R. Doherty , Y. Duan , M. Dunn , D. Falke , S. Fisher , P. Flicek , B. Fox , B. Gajria , G. I. Giraldo-Calder\u00f3n , O. S. Harb , E. Harper , C. Hertz-Fowler , M. J. Hickman , C. Howington , S. Hu , J. Humphrey , J. Iodice , A. Jones , J. Judkins , S. A. Kelly , J. C. Kissinger , D. K. Kwon , K. Lamoureux , D. Lawson , W. Li , K. Lies , D. Lodha , J. Long , R. M. MacCallum , G. Maslen , M. A. McDowell , J. Nabrzyski , D. S. Roos , S. S. C. Rund , S. W. Schulman , A. Shanmugasundram , V. Sitnik , D. Spruill , D. Starns , C. J. Stoeckert , S. S. Tomko , H. Wang , S. Warrenfeltz , R. Wieck , P. A. Wilkinson , L. Xu , and J. Zheng . VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center. Nucleic Acids Research, 50(D1):D898--D911 , Oct. 2021 . B. Amos, C. Aurrecoechea, M. Barba, A. Barreto, E. Y. Basenko, W. Ba\u017cant, R. Belnap, A. S. Blevins, U. B\u00f6hme, J. Brestelli, B. P. Brunk, M. Caddick, D. Callan, L. Campbell, M. B. Christensen, G. K. Christophides, K. Crouch, K. Davis, J. DeBarry, R. Doherty, Y. Duan, M. Dunn, D. Falke, S. Fisher, P. Flicek, B. Fox, B. Gajria, G. I. Giraldo-Calder\u00f3n, O. S. Harb, E. Harper, C. Hertz-Fowler, M. J. Hickman, C. Howington, S. Hu, J. Humphrey, J. Iodice, A. Jones, J. Judkins, S. A. Kelly, J. C. Kissinger, D. K. Kwon, K. Lamoureux, D. Lawson, W. Li, K. Lies, D. Lodha, J. Long, R. M. MacCallum, G. Maslen, M. A. McDowell, J. Nabrzyski, D. S. Roos, S. S. C. Rund, S. W. Schulman, A. Shanmugasundram, V. Sitnik, D. Spruill, D. Starns, C. J. Stoeckert, S. S. Tomko, H. Wang, S. Warrenfeltz, R. Wieck, P. A. Wilkinson, L. Xu, and J. Zheng. VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center. Nucleic Acids Research, 50(D1):D898--D911, Oct. 2021."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1038\/75556"},{"key":"e_1_3_2_1_3_1","volume-title":"Nov.","author":"Aurrecoechea C.","year":"2016","unstructured":"C. Aurrecoechea , A. Barreto , E. Y. Basenko , J. Brestelli , B. P. Brunk , S. Cade , K. Crouch , R. Doherty , D. Falke , S. Fischer , B. Gajria , O. S. Harb , M. Heiges , C. Hertz-Fowler , S. Hu , J. Iodice , J. C. Kissinger , C. Lawrence , W. Li , D. F. Pinney , J. A. Pulman , D. S. Roos , A. Shanmugasundram , F. Silva-Franco , S. Steinbiss , C. J. Stoeckert , D. Spruill , H. Wang , S. Warrenfeltz , and J. Zheng . EuPathDB: the eukaryotic pathogen genomics database resource. Nucleic Acids Research, 45(D1):D581--D591 , Nov. 2016 . C. Aurrecoechea, A. Barreto, E. Y. Basenko, J. Brestelli, B. P. Brunk, S. Cade, K. Crouch, R. Doherty, D. Falke, S. Fischer, B. Gajria, O. S. Harb, M. Heiges, C. Hertz-Fowler, S. Hu, J. Iodice, J. C. Kissinger, C. Lawrence, W. Li, D. F. Pinney, J. A. Pulman, D. S. Roos, A. Shanmugasundram, F. Silva-Franco, S. Steinbiss, C. J. Stoeckert, D. Spruill, H. Wang, S. Warrenfeltz, and J. Zheng. EuPathDB: the eukaryotic pathogen genomics database resource. Nucleic Acids Research, 45(D1):D581--D591, Nov. 2016."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/28.1.45"},{"key":"e_1_3_2_1_5_1","volume-title":"June","author":"Balaji A.","year":"2022","unstructured":"A. Balaji , B. Kille , A. D. Kappell , G. D. Godbold , M. Diep , R. A. L. Elworth , Z. Qian , D. Albin , D. J. Nasko , N. Shah , M. Pop , S. Segarra , K. L. Ternus , and T. J. Treangen . SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning. Genome Biology, 23(1) , June 2022 . A. Balaji, B. Kille, A. D. Kappell, G. D. Godbold, M. Diep, R. A. L. Elworth, Z. Qian, D. Albin, D. J. Nasko, N. Shah, M. Pop, S. Segarra, K. L. Ternus, and T. J. Treangen. SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning. Genome Biology, 23(1), June 2022."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btac020"},{"issue":"2","key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1038\/s41594-022-00910-8","article-title":"Towards a structurally resolved human protein interaction network","volume":"30","author":"Burke D. F.","year":"2023","unstructured":"D. F. Burke , P. Bryant , I. Barrio-Hernandez , D. Memon , G. Pozzati , A. Shenoy , W. Zhu , A. S. Dunham , P. Albanese , A. Keller , R. A. Scheltema , J. E. Bruce , A. Leitner , P. Kundrotas , P. Beltrao , and A. Elofsson . Towards a structurally resolved human protein interaction network . Nature Structural & Molecular Biology , 30 ( 2 ): 216 -- 225 , Jan. 2023 . D. F. Burke, P. Bryant, I. Barrio-Hernandez, D. Memon, G. Pozzati, A. Shenoy, W. Zhu, A. S. Dunham, P. Albanese, A. Keller, R. A. Scheltema, J. E. Bruce, A. Leitner, P. Kundrotas, P. Beltrao, and A. Elofsson. Towards a structurally resolved human protein interaction network. Nature Structural & Molecular Biology, 30(2):216--225, Jan. 2023.","journal-title":"Nature Structural & Molecular Biology"},{"key":"e_1_3_2_1_8_1","volume-title":"Dec.","author":"Chen L.","year":"2004","unstructured":"L. Chen . VFDB : a reference database for bacterial virulence factors. Nucleic Acids Research, 33(Database issue):D325--D328 , Dec. 2004 . L. Chen. VFDB: a reference database for bacterial virulence factors. Nucleic Acids Research, 33(Database issue):D325--D328, Dec. 2004."},{"key":"e_1_3_2_1_9_1","first-page":"11","article-title":"the universal protein knowledgebase in 2021","author":"T. U. Consortium","year":"2020","unstructured":"T. U. Consortium . UniProt : the universal protein knowledgebase in 2021 . Nucleic Acids Research, 49(D1):D480--D489 , 11 2020 . T. U. Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research, 49(D1):D480--D489, 11 2020.","journal-title":"Nucleic Acids Research, 49(D1):D480--D489"},{"issue":"8","key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","first-page":"e1008649","DOI":"10.1371\/journal.ppat.1008649","article-title":"Synthetic DNA and biosecurity: Nuances of predicting pathogenicity and the impetus for novel computational approaches for screening oligonucleotides","volume":"16","author":"Elworth R. A. L.","year":"2020","unstructured":"R. A. L. Elworth , C. Diaz , J. Yang , P. de Figueiredo , K. Ternus , and T. Treangen . Synthetic DNA and biosecurity: Nuances of predicting pathogenicity and the impetus for novel computational approaches for screening oligonucleotides . PLOS Pathogens , 16 ( 8 ): e1008649 , Aug. 2020 . R. A. L. Elworth, C. Diaz, J. Yang, P. de Figueiredo, K. Ternus, and T. Treangen. Synthetic DNA and biosecurity: Nuances of predicting pathogenicity and the impetus for novel computational approaches for screening oligonucleotides. PLOS Pathogens, 16(8):e1008649, Aug. 2020.","journal-title":"PLOS Pathogens"},{"key":"e_1_3_2_1_11_1","volume-title":"Sept.","author":"Geffen Y.","year":"2022","unstructured":"Y. Geffen , Y. Ofran , and R. Unger . DistilProtBert: a distilled protein language model used to distinguish between real proteins and their randomly shuffled counterparts. Bioinformatics, 38(Supplement_2):ii95--ii98 , Sept. 2022 . Y. Geffen, Y. Ofran, and R. Unger. DistilProtBert: a distilled protein language model used to distinguish between real proteins and their randomly shuffled counterparts. Bioinformatics, 38(Supplement_2):ii95--ii98, Sept. 2022."},{"key":"e_1_3_2_1_12_1","volume-title":"May","author":"Gligorijevi\u0107 V.","year":"2021","unstructured":"V. Gligorijevi\u0107 , P. D. Renfrew , T. Kosciolek , J. K. Leman , D. Berenberg , T. Vatanen , C. Chandler , B. C. Taylor , I. M. Fisk , H. Vlamakis , R. J. Xavier , R. Knight , K. Cho , and R. Bonneau . Structure-based protein function prediction using graph convolutional networks. Nature Communications, 12(1) , May 2021 . V. Gligorijevi\u0107, P. D. Renfrew, T. Kosciolek, J. K. Leman, D. Berenberg, T. Vatanen, C. Chandler, B. C. Taylor, I. M. Fisk, H. Vlamakis, R. J. Xavier, R. Knight, K. Cho, and R. Bonneau. Structure-based protein function prediction using graph convolutional networks. Nature Communications, 12(1), May 2021."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1031"},{"key":"e_1_3_2_1_14_1","unstructured":"R. Jacak J. Proescher G. Godbold A. Ernlund and T. Zudock. PathGO: The Pathogenesis Gene Ontology.  R. Jacak J. Proescher G. Godbold A. Ernlund and T. Zudock. PathGO: The Pathogenesis Gene Ontology."},{"key":"e_1_3_2_1_15_1","volume-title":"International Conference on Learning Representations","author":"Jing B.","year":"2021","unstructured":"B. Jing , S. Eismann , P. Suriana , R. J. L. Townshend , and R. Dror . Learning from protein structure with geometric vector perceptrons . In International Conference on Learning Representations , 2021 . B. Jing, S. Eismann, P. Suriana, R. J. L. Townshend, and R. Dror. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations, 2021."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-021-03819-2"},{"key":"e_1_3_2_1_17_1","volume-title":"Structure, Function, and Motion","author":"Kessel A.","year":"2018","unstructured":"A. Kessel and N. Ben-Tal . : Structure, Function, and Motion , Second Edition. Chapman and Hall\/CRC , New York , 2 edition, Mar. 2018 . A. Kessel and N. Ben-Tal. : Structure, Function, and Motion, Second Edition. Chapman and Hall\/CRC, New York, 2 edition, Mar. 2018."},{"issue":"4","key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1093\/bioinformatics\/btx624","article-title":"DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier","volume":"34","author":"Kulmanov M.","year":"2017","unstructured":"M. Kulmanov , M. A. Khan , and R. Hoehndorf . DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier . Bioinformatics , 34 ( 4 ): 660 -- 668 , Oct. 2017 . M. Kulmanov, M. A. Khan, and R. Hoehndorf. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics, 34(4):660--668, Oct. 2017.","journal-title":"Bioinformatics"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.ade2574"},{"key":"e_1_3_2_1_20_1","volume-title":"Nature Biotechnology","author":"Madani A.","year":"2023","unstructured":"A. Madani , B. Krause , E. R. Greene , S. Subramanian , B. P. Mohr , J. M. Holton , J. L. Olmos , C. Xiong , Z. Z. Sun , R. Socher , J. S. Fraser , and N. Naik . Large language models generate functional protein sequences across diverse families . Nature Biotechnology , Jan. 2023 . A. Madani, B. Krause, E. R. Greene, S. Subramanian, B. P. Mohr, J. M. Holton, J. L. Olmos, C. Xiong, Z. Z. Sun, R. Socher, J. S. Fraser, and N. Naik. Large language models generate functional protein sequences across diverse families. Nature Biotechnology, Jan. 2023."},{"issue":"6","key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1038\/s41592-022-01488-1","article-title":"ColabFold: making protein folding accessible to all","volume":"19","author":"Mirdita M.","year":"2022","unstructured":"M. Mirdita , K. Sch\u00fctze , Y. Moriwaki , L. Heo , S. Ovchinnikov , and M. Steinegger . ColabFold: making protein folding accessible to all . Nature Methods , 19 ( 6 ): 679 -- 682 , May 2022 . M. Mirdita, K. Sch\u00fctze, Y. Moriwaki, L. Heo, S. Ovchinnikov, and M. Steinegger. ColabFold: making protein folding accessible to all. Nature Methods, 19(6):679--682, May 2022.","journal-title":"Nature Methods"},{"key":"e_1_3_2_1_22_1","volume-title":"June","author":"Oliveira G. B.","year":"2023","unstructured":"G. B. Oliveira , H. Pedrini , and Z. Dias . TEMPROT: protein function annotation using transformers embeddings and homology search. BMC Bioinformatics, 24(1) , June 2023 . G. B. Oliveira, H. Pedrini, and Z. Dias. TEMPROT: protein function annotation using transformers embeddings and homology search. BMC Bioinformatics, 24(1), June 2023."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmb.2004.04.012"},{"key":"e_1_3_2_1_24_1","volume-title":"Nov.","author":"Varadi M.","year":"2021","unstructured":"M. Varadi , S. Anyango , M. Deshpande , S. Nair , C. Natassia , G. Yordanova , D. Yuan , O. Stroe , G. Wood , A. Laydon , A. \u017d\u00eddek , T. Green , K. Tunyasuvunakool , S. Petersen , J. Jumper , E. Clancy , R. Green , A. Vora , M. Lutfi , M. Figurnov , A. Cowie , N. Hobbs , P. Kohli , G. Kleywegt , E. Birney , D. Hassabis , and S. Velankar . AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research, 50(D1):D439--D444 , Nov. 2021 . M. Varadi, S. Anyango, M. Deshpande, S. Nair, C. Natassia, G. Yordanova, D. Yuan, O. Stroe, G. Wood, A. Laydon, A. \u017d\u00eddek, T. Green, K. Tunyasuvunakool, S. Petersen, J. Jumper, E. Clancy, R. Green, A. Vora, M. Lutfi, M. Figurnov, A. Cowie, N. Hobbs, P. Kohli, G. Kleywegt, E. Birney, D. Hassabis, and S. Velankar. AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research, 50(D1):D439--D444, Nov. 2021."},{"key":"e_1_3_2_1_25_1","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani A.","year":"2017","unstructured":"A. Vaswani , N. Shazeer , N. Parmar , J. Uszkoreit , L. Jones , A. N. Gomez , L. u. Kaiser, and I. Polosukhin. Attention is all you need . In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems , volume 30 . Curran Associates, Inc. , 2017 . A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017."},{"key":"e_1_3_2_1_26_1","volume-title":"Nov.","author":"Wattam A. R.","year":"2016","unstructured":"A. R. Wattam , J. J. Davis , R. Assaf , S. Boisvert , T. Brettin , C. Bun , N. Conrad , E. M. Dietrich , T. Disz , J. L. Gabbard , S. Gerdes , C. S. Henry , R. W. Kenyon , D. Machi , C. Mao , E. K. Nordberg , G. J. Olsen , D. E. Murphy-Olson , R. Olson , R. Overbeek , B. Parrello , G. D. Pusch , M. Shukla , V. Vonstein , A. Warren , F. Xia , H. Yoo , and R. L. Stevens . Improvements to PATRIC, the all-bacterial bioinformatics database and analysis resource center. Nucleic Acids Research, 45(D1):D535--D542 , Nov. 2016 . A. R. Wattam, J. J. Davis, R. Assaf, S. Boisvert, T. Brettin, C. Bun, N. Conrad, E. M. Dietrich, T. Disz, J. L. Gabbard, S. Gerdes, C. S. Henry, R. W. Kenyon, D. Machi, C. Mao, E. K. Nordberg, G. J. Olsen, D. E. Murphy-Olson, R. Olson, R. Overbeek, B. Parrello, G. D. Pusch, M. Shukla, V. Vonstein, A. Warren, F. Xia, H. Yoo, and R. L. Stevens. Improvements to PATRIC, the all-bacterial bioinformatics database and analysis resource center. Nucleic Acids Research, 45(D1):D535--D542, Nov. 2016."},{"key":"e_1_3_2_1_27_1","volume-title":"Mar.","author":"Yuan Q.","year":"2023","unstructured":"Q. Yuan , J. Xie , J. Xie , H. Zhao , and Y. Yang . Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion. Briefings in Bioinformatics, 24(3) , Mar. 2023 . Q. Yuan, J. Xie, J. Xie, H. Zhao, and Y. Yang. Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion. Briefings in Bioinformatics, 24(3), Mar. 2023."}],"event":{"name":"BCB '23: 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","location":"Houston TX USA","acronym":"BCB '23","sponsor":["SIGBio ACM Special Interest Group on Bioinformatics"]},"container-title":["Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3584371.3612953","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3584371.3612953","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:54Z","timestamp":1750182534000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3584371.3612953"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,3]]},"references-count":27,"alternative-id":["10.1145\/3584371.3612953","10.1145\/3584371"],"URL":"https:\/\/doi.org\/10.1145\/3584371.3612953","relation":{},"subject":[],"published":{"date-parts":[[2023,9,3]]},"assertion":[{"value":"2023-10-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}