{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,30]],"date-time":"2025-12-30T03:25:31Z","timestamp":1767065131431,"version":"3.41.2"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2023,7,25]],"date-time":"2023-07-25T00:00:00Z","timestamp":1690243200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1909536"],"award-info":[{"award-number":["1909536"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01GM132391","R35GM148219","W911NF2210239"],"award-info":[{"award-number":["R01GM132391","R35GM148219","W911NF2210239"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Accurately predicting the likelihood of interaction between two objects (compound\u2013protein sequence, user\u2013item, author\u2013paper, etc.) is a fundamental problem in Computer Science. Current deep-learning models rely on learning accurate representations of the interacting objects. Importantly, relationships between the interacting objects, or features of the interaction, offer an opportunity to partition the data to create multi-views of the interacting objects. The resulting congruent and non-congruent views can then be exploited via contrastive learning techniques to learn enhanced representations of the objects.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present a novel method, Contrastive Stratification for Interaction Prediction (CSI), to stratify (partition) a dataset in a manner that can be exploited via Contrastive Multiview Coding to learn embeddings that maximize the mutual information across congruent data views. CSI assigns a key and multiple views to each data point, where data partitions under a particular key form congruent views of the data. We showcase the effectiveness of CSI by applying it to the compound\u2013protein sequence interaction prediction problem, a pressing problem whose solution promises to expedite drug delivery (drug\u2013protein interaction prediction), metabolic engineering, and synthetic biology (compound\u2013enzyme interaction prediction) applications. Comparing CSI with a baseline model that does not utilize data stratification and contrastive learning, and show gains in average precision ranging from 13.7% to 39% using compounds and sequences as keys across multiple drug\u2013target and enzymatic datasets, and gains ranging from 16.9% to 63% using reaction features as keys across enzymatic datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Code and dataset available at https:\/\/github.com\/HassounLab\/CSI.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad456","type":"journal-article","created":{"date-parts":[[2023,7,25]],"date-time":"2023-07-25T17:28:55Z","timestamp":1690306135000},"source":"Crossref","is-referenced-by-count":4,"title":["CSI: Contrastive data Stratification for Interaction prediction and its application to compound\u2013protein interaction prediction"],"prefix":"10.1093","volume":"39","author":[{"given":"Apurva","family":"Kalia","sequence":"first","affiliation":[{"name":"Department of Computer Science, Tufts University , Medford, MA 02155, United States"}]},{"given":"Dilip","family":"Krishnan","sequence":"additional","affiliation":[{"name":"Google Research , Cambridge, MA 02142, Unites States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9477-2199","authenticated-orcid":false,"given":"Soha","family":"Hassoun","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Tufts University , Medford, MA 02155, United States"},{"name":"Department of Chemical and Biological Engineering, Tufts University , Medford, MA 02155, United States"}]}],"member":"286","published-online":{"date-parts":[[2023,7,25]]},"reference":[{"key":"2023081217033851300_btad456-B1","doi-asserted-by":"crossref","first-page":"2100","DOI":"10.2174\/0929867327666200907141016","article-title":"Deep learning in drug target interaction prediction: current and future perspectives","volume":"28","author":"Abbasi","year":"2021","journal-title":"Curr Med Chem"},{"key":"2023081217033851300_btad456-B2","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1093\/bib\/bbz157","article-title":"Machine learning approaches and databases for prediction of drug\u2013target interaction: a survey paper","volume":"22","author":"Bagherian","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023081217033851300_btad456-B3","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: a review and new perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2023081217033851300_btad456-B4","doi-asserted-by":"crossref","first-page":"D498","DOI":"10.1093\/nar\/gkaa1025","article-title":"BRENDA, the ELIXIR core data resource in 2021: new developments and updates","volume":"49","author":"Chang","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023081217033851300_btad456-B5","doi-asserted-by":"crossref","first-page":"696","DOI":"10.1093\/bib\/bbv066","article-title":"Drug\u2013target interaction prediction: databases, web servers and computational models","volume":"17","author":"Chen","year":"2016","journal-title":"Brief Bioinform"},{"key":"2023081217033851300_btad456-B6","doi-asserted-by":"crossref","first-page":"e1005678","DOI":"10.1371\/journal.pcbi.1005678","article-title":"Computational-experimental approach to drug-target interaction mapping: a case study on kinase inhibitors","volume":"13","author":"Cichonska","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2023081217033851300_btad456-B7","doi-asserted-by":"crossref","first-page":"12788","DOI":"10.1021\/acs.chemrev.0c00534","article-title":"Thermodynamics and kinetics of drug-target binding by molecular simulation","volume":"120","author":"Decherchi","year":"2020","journal-title":"Chem Rev"},{"year":"2018","author":"Feng","key":"2023081217033851300_btad456-B8"},{"key":"2023081217033851300_btad456-B9","doi-asserted-by":"crossref","first-page":"D1045","DOI":"10.1093\/nar\/gkv1072","article-title":"BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology","volume":"44","author":"Gilson","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023081217033851300_btad456-B10","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1145\/3422622","article-title":"Generative adversarial networks","volume":"63","author":"Goodfellow","year":"2020","journal-title":"Commun ACM"},{"year":"2016","author":"He","key":"2023081217033851300_btad456-B11"},{"year":"2017","author":"He","first-page":"173","key":"2023081217033851300_btad456-B12"},{"key":"2023081217033851300_btad456-B13","first-page":"44","volume-title":"International Conference on Artificial Neural Networks","author":"Hinton","year":"2011"},{"key":"2023081217033851300_btad456-B14","doi-asserted-by":"crossref","first-page":"830","DOI":"10.1093\/bioinformatics\/btaa880","article-title":"MolTrans: molecular interaction transformer for drug\u2013target interaction prediction","volume":"37","author":"Huang","year":"2021","journal-title":"Bioinformatics"},{"key":"2023081217033851300_btad456-B15","doi-asserted-by":"crossref","first-page":"D545","DOI":"10.1093\/nar\/gkaa970","article-title":"KEGG: integrating viruses and cellular organisms","volume":"49","author":"Kanehisa","year":"2021","journal-title":"Nucleic Acids Res"},{"year":"2014","author":"Kingma","key":"2023081217033851300_btad456-B16"},{"year":"2016","author":"Kipf","key":"2023081217033851300_btad456-B17"},{"key":"2023081217033851300_btad456-B18","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/j.pisc.2014.02.003","article-title":"Predictive genomic and metabolomic analysis for the standardization of enzyme data","volume":"1","author":"Kotera","year":"2014","journal-title":"Perspect Sci"},{"year":"2013","author":"Landrum","key":"2023081217033851300_btad456-B19"},{"key":"2023081217033851300_btad456-B20","doi-asserted-by":"crossref","first-page":"e1007129","DOI":"10.1371\/journal.pcbi.1007129","article-title":"DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences","volume":"15","author":"Lee","year":"2019","journal-title":"PLoS Comput Biol"},{"key":"2023081217033851300_btad456-B21","doi-asserted-by":"crossref","first-page":"1863","DOI":"10.1109\/TKDE.2018.2872063","article-title":"A survey of multi-view representation learning","volume":"31","author":"Li","year":"2019","journal-title":"IEEE Trans Knowl Data Eng"},{"year":"2020","author":"Lin","key":"2023081217033851300_btad456-B22"},{"key":"2023081217033851300_btad456-B23","doi-asserted-by":"crossref","first-page":"1435","DOI":"10.1126\/science.2983426","article-title":"Rapid and sensitive protein similarity searches","volume":"227","author":"Lipman","year":"1985","journal-title":"Science"},{"key":"2023081217033851300_btad456-B24","doi-asserted-by":"crossref","first-page":"123912","DOI":"10.1109\/ACCESS.2021.3110269","article-title":"Pre-training of deep bidirectional protein sequence representations with structural information","volume":"9","author":"Min","year":"2021","journal-title":"IEEE Access"},{"key":"2023081217033851300_btad456-B25","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1093\/bioinformatics\/btaa921","article-title":"GraphDTA: predicting drug\u2013target binding affinity with graph neural networks","volume":"37","author":"Nguyen","year":"2021","journal-title":"Bioinformatics"},{"key":"2023081217033851300_btad456-B26","doi-asserted-by":"crossref","first-page":"i821","DOI":"10.1093\/bioinformatics\/bty593","article-title":"DeepDTA: deep drug\u2013target binding affinity prediction","volume":"34","author":"\u00d6zt\u00fcrk","year":"2018","journal-title":"Bioinformatics"},{"year":"2021","author":"Radford","first-page":"8748","key":"2023081217033851300_btad456-B27"},{"key":"2023081217033851300_btad456-B28","first-page":"776","volume-title":"European Conference on Computer Vision","author":"Tian","year":"2020"},{"key":"2023081217033851300_btad456-B29","first-page":"6827","article-title":"What makes for good views for contrastive learning?","volume":"33","author":"Tian","year":"2020","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023081217033851300_btad456-B30","article-title":"Molecular docking: from lock and key to combination lock","volume":"2","author":"Tripathi","year":"2017","journal-title":"J Mol Med Clin Appl"},{"key":"2023081217033851300_btad456-B31","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1093\/bioinformatics\/bty535","article-title":"Compound\u2013protein interaction prediction with end-to-end learning of neural networks for graphs and sequences","volume":"35","author":"Tsubaki","year":"2019","journal-title":"Bioinformatics"},{"key":"2023081217033851300_btad456-B32","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1038\/s41573-019-0024-5","article-title":"Applications of machine learning in drug discovery and development","volume":"18","author":"Vamathevan","year":"2019","journal-title":"Nat Rev Drug Discov"},{"key":"2023081217033851300_btad456-B33","doi-asserted-by":"crossref","first-page":"2017","DOI":"10.1093\/bioinformatics\/btab054","article-title":"Enzyme promiscuity prediction using hierarchy-informed multi-label classification","volume":"37","author":"Visani","year":"2021","journal-title":"Bioinformatics"},{"year":"2017","author":"Xue","first-page":"3203","key":"2023081217033851300_btad456-B34"},{"year":"2018","author":"Yao","first-page":"684","key":"2023081217033851300_btad456-B35"},{"key":"2023081217033851300_btad456-B36","doi-asserted-by":"crossref","first-page":"1714","DOI":"10.3390\/molecules24091714","article-title":"Revealing drug-target interactions with computational models and algorithms","volume":"24","author":"Zhou","year":"2019","journal-title":"Molecules"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad456\/50959384\/btad456.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/8\/btad456\/51103124\/btad456.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/8\/btad456\/51103124\/btad456.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,12]],"date-time":"2023-08-12T17:04:47Z","timestamp":1691859887000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad456\/7230783"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,7,25]]},"references-count":36,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2023,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad456","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2023,8,1]]},"published":{"date-parts":[[2023,7,25]]},"article-number":"btad456"}}