{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:28:25Z","timestamp":1772166505758,"version":"3.50.1"},"reference-count":51,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,12,8]],"date-time":"2023-12-08T00:00:00Z","timestamp":1701993600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,12,8]],"date-time":"2023-12-08T00:00:00Z","timestamp":1701993600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The solubility of proteins stands as a pivotal factor in the realm of pharmaceutical research and production. Addressing the imperative to enhance production efficiency and curtail experimental costs, the demand arises for computational models adept at accurately predicting solubility based on provided datasets. Prior investigations have leveraged deep learning models and feature engineering techniques to distill features from raw protein sequences for solubility prediction. However, these methodologies have not thoroughly delved into the interdependencies among features or their respective magnitudes of significance. This study introduces HybridGCN, a pioneering Hybrid Graph Convolutional Network that elevates solubility prediction accuracy through the combination of diverse features, encompassing sophisticated deep-learning features and classical biophysical features. An exploration into the intricate interplay between deep-learning features and biophysical features revealed that specific biophysical attributes, notably evolutionary features, complement features extracted by advanced deep-learning models. Augmenting the model\u2019s capability for feature representation, we employed ESM, a substantial protein language model, to derive a zero-shot learning feature capturing comprehensive and pertinent information concerning protein functions and structures. Furthermore, we proposed a novel feature fusion module termed Adaptive Feature Re-weighting (AFR) to integrate multiple features, thereby enabling the fine-tuning of feature importance. Ablation experiments and comparative analyses attest to the efficacy of the HybridGCN approach, culminating in state-of-the-art performances on the public eSOL and S. cerevisiae datasets.<\/jats:p>","DOI":"10.1186\/s13321-023-00788-8","type":"journal-article","created":{"date-parts":[[2023,12,8]],"date-time":"2023-12-08T05:02:01Z","timestamp":1702011721000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["HybridGCN for protein solubility prediction with adaptive weighting of multiple features"],"prefix":"10.1186","volume":"15","author":[{"given":"Long","family":"Chen","sequence":"first","affiliation":[]},{"given":"Rining","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Feixiang","family":"Zhou","sequence":"additional","affiliation":[]},{"given":"Huifeng","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Jian K.","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,12,8]]},"reference":[{"issue":"3","key":"788_CR1","doi-asserted-by":"publisher","first-page":"582","DOI":"10.1110\/ps.041009005","volume":"14","author":"S Idicula-Thomas","year":"2005","unstructured":"Idicula-Thomas S, Balaji PV (2005) Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in escherichia coli. Prot Sci 14(3):582\u2013592","journal-title":"Prot Sci"},{"issue":"4","key":"788_CR2","doi-asserted-by":"publisher","first-page":"382","DOI":"10.1002\/(SICI)1097-0290(19991120)65:4<382::AID-BIT2>3.0.CO;2-I","volume":"65","author":"GD Davis","year":"1999","unstructured":"Davis GD, Elisee C, Newham DM, Harrison RG (1999) New fusion protein systems designed to give soluble expression in escherichia coli. Biotechnol Bioeng 65(4):382\u2013388","journal-title":"Biotechnol Bioeng"},{"issue":"10","key":"788_CR3","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0045869","volume":"7","author":"BA van den Berg","year":"2012","unstructured":"van den Berg BA, Reinders MJ, Hulsman M, Wu L, Pel HJ, Roubos JA, de Ridder D (2012) Exploring sequence characteristics related to high-level production of secreted proteins in aspergillus niger. PLoS ONE 7(10):e45869","journal-title":"PLoS ONE"},{"key":"788_CR4","doi-asserted-by":"publisher","first-page":"136","DOI":"10.1016\/j.sbi.2017.01.004","volume":"42","author":"K Trainor","year":"2017","unstructured":"Trainor K, Broom A, Meiering EM (2017) Exploring the relationships between protein sequence, structure and solubility. Curr Opin Struct Biol 42:136\u2013146","journal-title":"Curr Opin Struct Biol"},{"issue":"7","key":"788_CR5","doi-asserted-by":"publisher","first-page":"1092","DOI":"10.1093\/bioinformatics\/btx662","volume":"34","author":"R Rawi","year":"2018","unstructured":"Rawi R, Mall R, Kunji K, Shen C-H, Kwong PD, Chuang G-Y (2018) Parsnip: sequence-based protein solubility prediction using gradient boosting machine. Bioinformatics 34(7):1092\u20131098","journal-title":"Bioinformatics"},{"issue":"19","key":"788_CR6","doi-asserted-by":"publisher","first-page":"2536","DOI":"10.1093\/bioinformatics\/btl623","volume":"23","author":"P Smialowski","year":"2007","unstructured":"Smialowski P, Martin-Galiano AJ, Mikolajka A, Girschick T, Holak TA, Frishman D (2007) Protein solubility: sequence based prediction and experimental verification. Bioinformatics 23(19):2536\u20132542","journal-title":"Bioinformatics"},{"issue":"1","key":"788_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12859-019-3220-8","volume":"20","author":"M Heinzinger","year":"2019","unstructured":"Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B (2019) Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinform 20(1):1\u201317","journal-title":"BMC Bioinform"},{"issue":"3","key":"788_CR8","doi-asserted-by":"publisher","first-page":"278","DOI":"10.1093\/bioinformatics\/bti810","volume":"22","author":"S Idicula-Thomas","year":"2006","unstructured":"Idicula-Thomas S, Kulkarni AJ, Kulkarni BD, Jayaraman VK, Balaji PV (2006) A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in escherichia coli. Bioinformatics 22(3):278\u2013284","journal-title":"Bioinformatics"},{"issue":"15","key":"788_CR9","doi-asserted-by":"publisher","first-page":"2605","DOI":"10.1093\/bioinformatics\/bty166","volume":"34","author":"S Khurana","year":"2018","unstructured":"Khurana S, Rawi R, Kunji K, Chuang G-Y, Bensmail H, Mall R (2018) Deepsol: a deep learning framework for sequence-based protein solubility prediction. Bioinformatics 34(15):2605\u20132613","journal-title":"Bioinformatics"},{"issue":"5","key":"788_CR10","first-page":"443","volume":"9","author":"DL Wilkinson","year":"1991","unstructured":"Wilkinson DL, Harrison RG (1991) Predicting the solubility of recombinant proteins in Escherichia coli. Bio\/technology 9(5):443\u2013448","journal-title":"Bio\/technology"},{"issue":"12","key":"788_CR11","doi-asserted-by":"publisher","first-page":"2192","DOI":"10.1111\/j.1742-4658.2012.08603.x","volume":"279","author":"P Smialowski","year":"2012","unstructured":"Smialowski P, Doose G, Torkler P, Kaufmann S, Frishman D (2012) Proso ii-a new method for protein solubility prediction. FEBS J 279(12):2192\u20132200","journal-title":"FEBS J"},{"issue":"7873","key":"788_CR12","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","volume":"596","author":"J Jumper","year":"2021","unstructured":"Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, \u017d\u00eddek A, Potapenko A et al (2021) Highly accurate protein structure prediction with alphafold. Nature 596(7873):583\u2013589","journal-title":"Nature"},{"issue":"6557","key":"788_CR13","doi-asserted-by":"publisher","first-page":"871","DOI":"10.1126\/science.abj8754","volume":"373","author":"M Baek","year":"2021","unstructured":"Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, Wang J, Cong Q, Kinch LN, Schaeffer RD, Mill\u00e1n C, Park H, Adams C, Glassman CR, DeGiovanni A, Pereira JH, Rodrigues AV, van Dijk AA, Ebrecht AC, Opperman DJ, Sagmeister T, Buhlheller C, Pavkov-Keller T, Rathinaswamy MK, Dalwadi U, Yip CK, Burke JE, Garcia KC, Grishin NV, Adams PD, Read RJ, Baker D (2021) Accurate prediction of protein structures and interactions using a three-track neural network. Science 373(6557):871\u2013876","journal-title":"Science"},{"issue":"7949","key":"788_CR14","doi-asserted-by":"publisher","first-page":"774","DOI":"10.1038\/s41586-023-05696-3","volume":"614","author":"AH-W Yeh","year":"2023","unstructured":"Yeh AH-W, Norn C, Kipnis Y, Tischer D, Pellock SJ, Evans D, Ma P, Lee GR, Zhang JZ, Anishchenko I, Coventry B, Cao L, Dauparas J, Halabiya S, DeWitt M, Carter L, Houk KN, Baker D (2023) De novo design of luciferases using deep learning. Nature 614(7949):774\u2013780","journal-title":"Nature"},{"issue":"6615","key":"788_CR15","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1126\/science.add2187","volume":"378","author":"J Dauparas","year":"2022","unstructured":"Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte RJ, Milles LF, Wicky BIM, Courbet A, de Haas RJ, Bethel N, Leung PJY, Huddy TF, Pellock S, Tischer D, Chan F, Koepnick B, Nguyen H, Kang A, Sankaran B, Bera AK, King NP, Baker D (2022) Robust deep learning-based protein sequence design using ProteinMPNN. Science 378(6615):49\u201356","journal-title":"Science"},{"issue":"1","key":"788_CR16","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-023-38328-5","volume":"14","author":"NR Bennett","year":"2022","unstructured":"Bennett NR, Coventry B, Goreshnik I, Huang B, Allen A, Vafeados D, Peng YP, Dauparas J, Baek M, Stewart L, DiMaio F, Munck SD, Savvides SN, Baker D (2023) Improving de novo protein binder design with deep learning. Nat Commun 14(1):2625","journal-title":"Nat Commun"},{"issue":"12","key":"788_CR17","doi-asserted-by":"publisher","DOI":"10.1002\/pro.4480","volume":"31","author":"G Li","year":"2022","unstructured":"Li G, Buric F, Zrimec J, Viknander S, Nielsen J, Zelezniak A, Engqvist MK (2022) Learning deep representations of enzyme thermal adaptation. Prot Sci 31(12):e4480","journal-title":"Prot Sci"},{"issue":"11","key":"788_CR18","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1008291","volume":"16","author":"B Li","year":"2020","unstructured":"Li B, Yang YT, Capra JA, Gerstein MB (2020) Predicting changes in protein thermodynamic stability upon point mutation with deep 3d convolutional neural networks. PLoS Comput Biol 16(11):e1008291","journal-title":"PLoS Comput Biol"},{"issue":"1","key":"788_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-021-00488-1","volume":"13","author":"J Chen","year":"2021","unstructured":"Chen J, Zheng S, Zhao H, Yang Y (2021) Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map. J Cheminform 13(1):1\u201310","journal-title":"J Cheminform"},{"issue":"20","key":"788_CR20","doi-asserted-by":"publisher","first-page":"2975","DOI":"10.1093\/bioinformatics\/btu420","volume":"30","author":"F Agostini","year":"2014","unstructured":"Agostini F, Cirillo D, Livi CM, Delli Ponti R, Tartaglia GG (2014) cc sol omics: a webserver for solubility prediction of endogenous and heterologous expression in Escherichia coli. Bioinformatics 30(20):2975\u20132977","journal-title":"Bioinformatics"},{"issue":"17","key":"788_CR21","doi-asserted-by":"publisher","first-page":"2200","DOI":"10.1093\/bioinformatics\/btp386","volume":"25","author":"CN Magnan","year":"2009","unstructured":"Magnan CN, Randall A, Baldi P (2009) Solpro: accurate sequence-based prediction of protein solubility. Bioinformatics 25(17):2200\u20132207","journal-title":"Bioinformatics"},{"issue":"Suppl 17","key":"788_CR22","doi-asserted-by":"publisher","first-page":"S3","DOI":"10.1186\/1471-2105-13-S17-S3","volume":"13","author":"H-L Huang","year":"2012","unstructured":"Huang H-L, Charoenkwan P, Kao T-F, Lee H-C, Chang F-L, Huang W-L, Ho S-J, Shu L-S, Chen W-L, Ho S-Y (2012) Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition. BMC Bioinformatics 13(Suppl 17):S3","journal-title":"BMC Bioinformatics"},{"issue":"3","key":"788_CR23","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1201\/9781420089653.ch3","volume":"6","author":"H Xue","year":"2009","unstructured":"Xue H, Yang Q, Chen S (2009) Svm: Support vector machines. Top Ten Algor Data Mining 6(3):37\u201360","journal-title":"Top Ten Algor Data Mining"},{"issue":"6245","key":"788_CR24","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1126\/science.aaa8685","volume":"349","author":"J Hirschberg","year":"2015","unstructured":"Hirschberg J, Manning CD (2015) Advances in natural language processing. Science 349(6245):261\u2013266","journal-title":"Science"},{"issue":"9","key":"788_CR25","doi-asserted-by":"publisher","first-page":"2352","DOI":"10.1162\/neco_a_00990","volume":"29","author":"W Rawat","year":"2017","unstructured":"Rawat W, Wang Z (2017) Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput 29(9):2352\u20132449","journal-title":"Neural Comput"},{"issue":"1","key":"788_CR26","doi-asserted-by":"publisher","first-page":"5743","DOI":"10.1038\/s41467-021-25976-8","volume":"12","author":"Y Luo","year":"2021","unstructured":"Luo Y, Jiang G, Yu T, Liu Y, Vo L, Ding H, Su Y, Qian WW, Zhao H, Peng J (2021) Ecnet is an evolutionary context-integrated deep learning framework for protein engineering. Nature Commun 12(1):5743","journal-title":"Nature Commun"},{"issue":"1","key":"788_CR27","doi-asserted-by":"publisher","first-page":"29","DOI":"10.14311\/NNW.2021.31.002","volume":"31","author":"ABP Samson","year":"2021","unstructured":"Samson ABP, Chandra SRA, Manikant M (2021) A deep neural network approach for the prediction of protein subcellular localization. Neural Netw World 31(1):29\u201345","journal-title":"Neural Netw World"},{"issue":"23","key":"788_CR28","doi-asserted-by":"publisher","first-page":"4314","DOI":"10.1093\/bioinformatics\/btab463","volume":"37","author":"X Wu","year":"2021","unstructured":"Wu X, Yu L (2021) Epsol: sequence-based protein solubility prediction using multidimensional embedding. Bioinformatics 37(23):4314\u20134320","journal-title":"Bioinformatics"},{"key":"788_CR29","doi-asserted-by":"publisher","DOI":"10.1016\/j.compchemeng.2019.106533","volume":"131","author":"X Han","year":"2019","unstructured":"Han X, Zhang L, Zhou K, Wang X (2019) Progan: Protein solubility generative adversarial nets for data augmentation in dnn framework. Comp Chem Eng 131:106533","journal-title":"Comp Chem Eng"},{"key":"788_CR30","first-page":"9689","volume":"32","author":"R Rao","year":"2019","unstructured":"Rao R, Bhattacharya N, Thomas N, Duan Y, Chen P, Canny J, Abbeel P, Song Y (2019) Evaluating protein transfer learning with tape. Adv Neural Inf Process Syst 32:9689\u20139701","journal-title":"Adv Neural Inf Process Syst"},{"issue":"4","key":"788_CR31","doi-asserted-by":"publisher","first-page":"941","DOI":"10.1093\/bioinformatics\/btab801","volume":"38","author":"V Thumuluri","year":"2022","unstructured":"Thumuluri V, Martiny H-M, Almagro Armenteros JJ, Salomon J, Nielsen H, Johansen AR (2022) Netsolp: predicting protein solubility in Escherichia coli using language models. Bioinformatics 38(4):941\u2013946","journal-title":"Bioinformatics"},{"key":"788_CR32","doi-asserted-by":"publisher","first-page":"59397","DOI":"10.1109\/ACCESS.2023.3284464","volume":"11","author":"F Mehmood","year":"2023","unstructured":"Mehmood F, Arshad S, Shoaib M (2023) RPPSP: a robust and precise protein solubility predictor by utilizing novel protein sequence encoder. IEEE Access 11:59397\u201359416","journal-title":"IEEE Access"},{"key":"788_CR33","volume-title":"Feature engineering for machine learning and data analytics","author":"G Dong","year":"2018","unstructured":"Dong G, Liu H (2018) Feature engineering for machine learning and data analytics. CRC Press, Boca Raton"},{"key":"788_CR34","volume-title":"Feature engineering for machine learning: principles and techniques for data scientists","author":"A Zheng","year":"2018","unstructured":"Zheng A, Casari A (2018) Feature engineering for machine learning: principles and techniques for data scientists. O\u2019 Reilly Media, Inc, Sebastopol"},{"key":"788_CR35","doi-asserted-by":"crossref","unstructured":"Kang B, Liu Z, Wang X, Yu F, Feng J, Darrell T (2019)\u00a0Few-shot object detection via feature reweighting. In: IEEE\/CVF international conference on computer vision (ICCV), pp 8419\u20138428","DOI":"10.1109\/ICCV.2019.00851"},{"key":"788_CR36","doi-asserted-by":"crossref","unstructured":"Heaton J (2016) An empirical analysis of feature engineering for predictive modeling. In: IEEE SoutheastCon","DOI":"10.1109\/SECON.2016.7506650"},{"issue":"5","key":"788_CR37","doi-asserted-by":"publisher","first-page":"3104","DOI":"10.1007\/s10489-021-02199-4","volume":"51","author":"CSR Annavarapu","year":"2021","unstructured":"Annavarapu CSR et al (2021) Deep learning-based improved snapshot ensemble technique for covid-19 chest x-ray classification. Appl Intell 51(5):3104\u20133120","journal-title":"Appl Intell"},{"issue":"29","key":"788_CR38","first-page":"287","volume":"34","author":"J Meier","year":"2021","unstructured":"Meier J, Rao R, Verkuil R, Liu J, Sercu T, Rives A (2021) Language models enable zero-shot prediction of the effects of mutations on protein function. Adv Neural Inform Proc Syst 34(29):287","journal-title":"Adv Neural Inform Proc Syst"},{"key":"788_CR39","doi-asserted-by":"crossref","unstructured":"Mount DW (2008) \u201cUsing blosum in sequence alignments,\u201d Cold Spring Harbor Protocols, vol. 2008, no.\u00a06, pp. pdb\u2013top39","DOI":"10.1101\/pdb.top39"},{"issue":"9","key":"788_CR40","doi-asserted-by":"publisher","first-page":"360","DOI":"10.1007\/s008940100038","volume":"7","author":"J Meiler","year":"2001","unstructured":"Meiler J, M\u00fcller M, Zeidler A, Schm\u00e4schke F (2001) Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks. Mol Model Ann 7(9):360\u2013369","journal-title":"Mol Model Ann"},{"issue":"17","key":"788_CR41","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Sch\u00e4ffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389\u20133402","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"788_CR42","doi-asserted-by":"publisher","first-page":"D170","DOI":"10.1093\/nar\/gkw1081","volume":"45","author":"M Mirdita","year":"2017","unstructured":"Mirdita M, Von Den Driesch L, Galiez C, Martin MJ, S\u00f6ding J, Steinegger M (2017) Uniclust databases of clustered and deeply annotated protein sequences and alignments. Nucleic acids Res 45(D1):D170\u2013D176","journal-title":"Nucleic acids Res"},{"issue":"18","key":"788_CR43","doi-asserted-by":"publisher","first-page":"2842","DOI":"10.1093\/bioinformatics\/btx218","volume":"33","author":"R Heffernan","year":"2017","unstructured":"Heffernan R, Yang Y, Paliwal K, Zhou Y (2017) Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 33(18):2842\u20132849","journal-title":"Bioinformatics"},{"issue":"23","key":"788_CR44","doi-asserted-by":"publisher","first-page":"4039","DOI":"10.1093\/bioinformatics\/bty481","volume":"34","author":"J Hanson","year":"2018","unstructured":"Hanson J, Paliwal K, Litfin T, Yang Y, Zhou Y (2018) Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks. Bioinformatics 34(23):4039\u20134045","journal-title":"Bioinformatics"},{"issue":"22","key":"788_CR45","doi-asserted-by":"publisher","first-page":"4640","DOI":"10.1093\/bioinformatics\/btz294","volume":"35","author":"X Han","year":"2019","unstructured":"Han X, Wang X, Zhou K (2019) Develop machine learning-based regression predictive models for engineering protein solubility. Bioinformatics 35(22):4640\u20134646","journal-title":"Bioinformatics"},{"issue":"3","key":"788_CR46","doi-asserted-by":"publisher","first-page":"299","DOI":"10.1016\/j.ymeth.2005.04.006","volume":"36","author":"Y Shimizu","year":"2005","unstructured":"Shimizu Y, Kanamori T, Ueda T (2005) Protein synthesis by pure translation systems. Methods 36(3):299\u2013304","journal-title":"Methods"},{"issue":"5","key":"788_CR47","doi-asserted-by":"publisher","first-page":"1445","DOI":"10.1093\/bioinformatics\/btz773","volume":"36","author":"Q Hou","year":"2020","unstructured":"Hou Q, Kwasigroch JM, Rooman M, Pucci F (2020) Solart: a structure-based method to predict protein solubility and aggregation. Bioinformatics 36(5):1445\u20131452","journal-title":"Bioinformatics"},{"issue":"19","key":"788_CR48","doi-asserted-by":"publisher","first-page":"3098","DOI":"10.1093\/bioinformatics\/btx345","volume":"33","author":"M Hebditch","year":"2017","unstructured":"Hebditch M, Carballo-Amador MA, Charonis S, Curtis R, Warwicker J (2017) Protein-sol: a web tool for predicting protein solubility from sequence. Bioinformatics 33(19):3098\u20133100","journal-title":"Bioinformatics"},{"key":"788_CR49","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2307.06435","author":"H Naveed","year":"2023","unstructured":"Naveed H, Khan AU, Qiu S, Saqib M, Anwar S, Usman M, Akhtar N, Barnes N, Mian A (2023) A comprehensive overview of large language models. arXiv. https:\/\/doi.org\/10.48550\/arXiv.2307.06435","journal-title":"arXiv"},{"issue":"8","key":"788_CR50","doi-asserted-by":"publisher","first-page":"1099","DOI":"10.1038\/s41587-022-01618-2","volume":"41","author":"A Madani","year":"2023","unstructured":"Madani A, Krause B, Greene ER, Subramanian S, Mohr BP, Holton JM, Olmos JL, Xiong C, Sun ZZ, Socher R, Fraser JS, Naik N (2023) Large language models generate functional protein sequences across diverse families. Nat Biotechnol 41(8):1099\u20131106","journal-title":"Nat Biotechnol"},{"issue":"1","key":"788_CR51","doi-asserted-by":"publisher","first-page":"4348","DOI":"10.1038\/s41467-022-32007-7","volume":"13","author":"N Ferruz","year":"2022","unstructured":"Ferruz N, Schmidt S, H\u00f6cker B (2022) Protgpt2 is a deep unsupervised language model for protein design. Nat Commun 13(1):4348","journal-title":"Nat Commun"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00788-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-023-00788-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-023-00788-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,8]],"date-time":"2023-12-08T05:08:51Z","timestamp":1702012131000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-023-00788-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,8]]},"references-count":51,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["788"],"URL":"https:\/\/doi.org\/10.1186\/s13321-023-00788-8","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3294983\/v1","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,8]]},"assertion":[{"value":"25 August 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 November 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 December 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable. The authors declare that they have no human or animal studies.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"118"}}