{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:55Z","timestamp":1772138095663,"version":"3.50.1"},"reference-count":8,"publisher":"Oxford University Press (OUP)","issue":"20","license":[{"start":{"date-parts":[[2019,3,15]],"date-time":"2019-03-15T00:00:00Z","timestamp":1552608000000},"content-version":"vor","delay-in-days":1,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Institute for Integrated Data Science"},{"name":"Purdue Instructional Innovation Award"},{"DOI":"10.13039\/100007289","name":"Purdue Research Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007289","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Department of Chemistry start up award"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The Protein Data Bank (PDB) currently holds over 140 000 biomolecular structures and continues to release new structures on a weekly basis. The PDB is an essential resource to the structural bioinformatics community to develop software that mine, use, categorize and analyze such data. New computational biology methods are evaluated using custom benchmarking sets derived as subsets of 3D experimentally determined structures and structural features from the PDB. Currently, such benchmarking features are manually curated with custom scripts in a non-standardized manner that results in slow distribution and updates with new experimental structures. Finally, there is a scarcity of standardized tools to rapidly query 3D descriptors of the entire PDB.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Our solution is the Lemon framework, a C++11 library with Python bindings, which provides a consistent workflow methodology for selecting biomolecular interactions based on user criterion and computing desired 3D structural features. This framework can parse and characterize the entire PDB in &amp;lt;10\u00a0min on modern, multithreaded hardware. The speed in parsing is obtained by using the recently developed MacroMolecule Transmission Format to reduce the computational cost of reading text-based PDB files. The use of C++ lambda functions and Python bindings provide extensive flexibility for analysis and categorization of the PDB by allowing the user to write custom functions to suite their objective. We think Lemon will become a one-stop-shop to quickly mine the entire PDB to generate desired structural biology features.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The Lemon software is available as a C++ header library along with a PyPI package and example functions at https:\/\/github.com\/chopralab\/lemon.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz178","type":"journal-article","created":{"date-parts":[[2019,3,13]],"date-time":"2019-03-13T08:33:02Z","timestamp":1552465982000},"page":"4165-4167","source":"Crossref","is-referenced-by-count":3,"title":["Lemon: a framework for rapidly mining structural information from the Protein Data Bank"],"prefix":"10.1093","volume":"35","author":[{"given":"Jonathan","family":"Fine","sequence":"first","affiliation":[{"name":"Department of Chemistry, Purdue University , West Lafayette, IN, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0942-7898","authenticated-orcid":false,"given":"Gaurav","family":"Chopra","sequence":"additional","affiliation":[{"name":"Department of Chemistry, Purdue University , West Lafayette, IN, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,3,14]]},"reference":[{"key":"2023013108281038700_btz178-B1","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1002\/prot.22323","article-title":"A generalized knowledge-based discriminatory function for biomolecular interactions","volume":"76","author":"Bernard","year":"2009","journal-title":"Proteins"},{"key":"2023013108281038700_btz178-B2","doi-asserted-by":"crossref","first-page":"e1005575.","DOI":"10.1371\/journal.pcbi.1005575","article-title":"MMTF\u2014an efficient file format for the transmission, visualization, and analysis of macromolecular structures","volume":"13","author":"Bradley","year":"2017","journal-title":"PLoS Comput. Biol"},{"key":"2023013108281038700_btz178-B3","doi-asserted-by":"crossref","first-page":"20239","DOI":"10.1073\/pnas.0810818105","article-title":"Solvent dramatically affects protein structure refinement","volume":"105","author":"Chopra","year":"2008","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023013108281038700_btz178-B4","first-page":"137","volume-title":"OSDI'04 Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation","author":"Dean","year":"2004"},{"key":"2023013108281038700_btz178-B5","doi-asserted-by":"crossref","first-page":"726","DOI":"10.1021\/jm061277y","article-title":"Diverse, high-quality test set for the validation of protein-ligand docking performance","volume":"50","author":"Hartshorn","year":"2007","journal-title":"J. Med. Chem"},{"key":"2023013108281038700_btz178-B6","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1021\/acs.accounts.6b00491","article-title":"Forging the basis for developing protein-ligand interaction scoring functions","volume":"50","author":"Liu","year":"2017","journal-title":"Acc. Chem. Res"},{"key":"2023013108281038700_btz178-B7","doi-asserted-by":"crossref","first-page":"6582","DOI":"10.1021\/jm300687e","article-title":"Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking","volume":"55","author":"Mysinger","year":"2012","journal-title":"J. Med. Chem"},{"key":"2023013108281038700_btz178-B8","doi-asserted-by":"crossref","first-page":"D345","DOI":"10.1093\/nar\/gku1214","article-title":"The RCSB Protein Data Bank: views of structural biology for basic and applied research and education","volume":"43","author":"Rose","year":"2015","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/20\/4165\/48976349\/bioinformatics_35_20_4165.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/20\/4165\/48976349\/bioinformatics_35_20_4165.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T12:18:36Z","timestamp":1675167516000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/20\/4165\/5380765"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,3,14]]},"references-count":8,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2019,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz178","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/379891","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,10,15]]},"published":{"date-parts":[[2019,3,14]]}}}