{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T17:00:27Z","timestamp":1773248427802,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1008502","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2020,12,17]],"date-time":"2020-12-17T00:00:00Z","timestamp":1608163200000}}],"reference-count":39,"publisher":"Public Library of Science (PLoS)","issue":"12","license":[{"start":{"date-parts":[[2020,12,7]],"date-time":"2020-12-07T00:00:00Z","timestamp":1607299200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100008982","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DBI-1832184"],"award-info":[{"award-number":["DBI-1832184"]}],"id":[{"id":"10.13039\/501100008982","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"publisher","award":["DE-SC0019749"],"award-info":[{"award-number":["DE-SC0019749"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01GM133198"],"award-info":[{"award-number":["R01GM133198"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>\n                    Biochemical and biological functions of proteins are the product of both the overall fold of the polypeptide chain, and, typically, structural motifs made up of smaller numbers of amino acids constituting a catalytic center or a binding site that may be remote from one another in amino acid sequence. Detection of such structural motifs can provide valuable insights into the function(s) of previously uncharacterized proteins. Technically, this remains an extremely challenging problem because of the size of the Protein Data Bank (PDB) archive. Existing methods depend on a clustering by sequence similarity and can be computationally slow. We have developed a new approach that uses an inverted index strategy capable of analyzing &gt;170,000 PDB structures with unmatched speed. The efficiency of the inverted index method depends critically on identifying the small number of structures containing the query motif and ignoring most of the structures that are irrelevant. Our approach (implemented at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"http:\/\/motif.rcsb.org\" xlink:type=\"simple\">motif.rcsb.org<\/jats:ext-link>\n                    ) enables real-time retrieval and superposition of structural motifs, either extracted from a reference structure or uploaded by the user. Herein, we describe the method and present five case studies that exemplify its efficacy and speed for analyzing 3D structures of both proteins and nucleic acids.\n                  <\/jats:p>","DOI":"10.1371\/journal.pcbi.1008502","type":"journal-article","created":{"date-parts":[[2020,12,7]],"date-time":"2020-12-07T18:08:17Z","timestamp":1607364497000},"page":"e1008502","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":37,"title":["Real-time structural motif searching in proteins using an inverted index strategy"],"prefix":"10.1371","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3576-0387","authenticated-orcid":true,"given":"Sebastian","family":"Bittrich","sequence":"first","affiliation":[]},{"given":"Stephen K.","family":"Burley","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0893-5551","authenticated-orcid":true,"given":"Alexander S.","family":"Rose","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2020,12,7]]},"reference":[{"key":"pcbi.1008502.ref001","first-page":"1","volume-title":"Sequence and Genome Analysis II\u2014Methods and Applications","author":"A Via","year":"2011"},{"issue":"12","key":"pcbi.1008502.ref002","doi-asserted-by":"crossref","first-page":"4501","DOI":"10.1021\/cr000033x","article-title":"Serine protease mechanism and specificity","volume":"102","author":"L Hedstrom","year":"2002","journal-title":"Chemical reviews"},{"issue":"17","key":"pcbi.1008502.ref003","doi-asserted-by":"crossref","first-page":"6878","DOI":"10.1073\/pnas.87.17.6878","article-title":"Molecular structure of leucine aminopeptidase at 2.7-A resolution","volume":"87","author":"SK Burley","year":"1990","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"pcbi.1008502.ref004","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1007\/978-94-024-1069-3_11","volume-title":"From Protein Structure to Function with Bioinformatics","author":"JP Nilmeier","year":"2017"},{"issue":"1","key":"pcbi.1008502.ref005","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1146\/annurev.biochem.70.1.313","article-title":"Design and selection of novel Cys2His2 zinc finger proteins","volume":"70","author":"CO Pabo","year":"2001","journal-title":"Annual review of biochemistry"},{"issue":"4","key":"pcbi.1008502.ref006","doi-asserted-by":"crossref","first-page":"962","DOI":"10.1002\/prot.20099","article-title":"Superfamily active site templates","volume":"55","author":"EC Meng","year":"2004","journal-title":"PROTEINS: Structure, Function, and Bioinformatics"},{"issue":"19","key":"pcbi.1008502.ref007","doi-asserted-by":"crossref","first-page":"5402","DOI":"10.1093\/nar\/gkl655","article-title":"Quadruplex DNA: sequence, topology and structure","volume":"34","author":"S Burge","year":"2006","journal-title":"Nucleic acids research"},{"key":"pcbi.1008502.ref008","first-page":"29","volume-title":"Proceedings of the Workshop on Molecular Graphics and Visual Analysis of Molecular Data","author":"D Sehnal","year":"2018"},{"issue":"7","key":"pcbi.1008502.ref009","doi-asserted-by":"crossref","first-page":"e1003750","DOI":"10.1371\/journal.pcbi.1003750","article-title":"A real-time all-atom structural search engine for proteins","volume":"10","author":"G Gonzalez","year":"2014","journal-title":"PLoS computational biology"},{"issue":"23","key":"pcbi.1008502.ref010","doi-asserted-by":"crossref","first-page":"10495","DOI":"10.1073\/pnas.88.23.10495","article-title":"Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques","volume":"88","author":"R Nussinov","year":"1991","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"2","key":"pcbi.1008502.ref011","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1109\/TCBB.2017.2786250","article-title":"Unsupervised Discovery of Geometrically Common Structural Motifs and Long-Range Contacts in Protein 3D Structures","volume":"16","author":"F Kaiser","year":"2017","journal-title":"IEEE\/ACM transactions on computational biology and bioinformatics"},{"issue":"D1","key":"pcbi.1008502.ref012","doi-asserted-by":"crossref","first-page":"D618","DOI":"10.1093\/nar\/gkx1012","article-title":"Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites","volume":"46","author":"AJM Ribeiro","year":"2017","journal-title":"Nucleic acids research"},{"issue":"6","key":"pcbi.1008502.ref013","doi-asserted-by":"crossref","first-page":"516","DOI":"10.1093\/bioinformatics\/14.6.516","article-title":"A geometric algorithm to find small but highly similar 3D substructures in proteins","volume":"14","author":"X Pennec","year":"1998","journal-title":"Bioinformatics (Oxford, England)"},{"issue":"11","key":"pcbi.1008502.ref014","doi-asserted-by":"crossref","first-page":"2308","DOI":"10.1002\/pro.5560061104","article-title":"TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites","volume":"6","author":"AC Wallace","year":"1997","journal-title":"Protein science"},{"issue":"1","key":"pcbi.1008502.ref015","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1186\/1471-2105-11-555","article-title":"The LabelHash algorithm for substructure matching","volume":"11","author":"M Moll","year":"2010","journal-title":"BMC bioinformatics"},{"issue":"4","key":"pcbi.1008502.ref016","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1109\/99.641604","article-title":"Geometric hashing: An overview","volume":"4","author":"HJ Wolfson","year":"1997","journal-title":"IEEE computational science and engineering"},{"issue":"9","key":"pcbi.1008502.ref017","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1093\/bioinformatics\/btq100","article-title":"ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment","volume":"26","author":"J Konc","year":"2010","journal-title":"Bioinformatics"},{"issue":"W1","key":"pcbi.1008502.ref018","doi-asserted-by":"crossref","first-page":"W380","DOI":"10.1093\/nar\/gks401","article-title":"SPRITE and ASSAM: web servers for side chain 3D-motif searching in protein structures","volume":"40","author":"N Nadzirin","year":"2012","journal-title":"Nucleic acids research"},{"issue":"W1","key":"pcbi.1008502.ref019","doi-asserted-by":"crossref","first-page":"W256","DOI":"10.1093\/nar\/gkt403","article-title":"Catalytic site identification\u2014a web server to identify catalytic site structural matches throughout PDB","volume":"41","author":"DA Kirshner","year":"2013","journal-title":"Nucleic acids research"},{"issue":"7","key":"pcbi.1008502.ref020","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1089\/cmb.2014.0263","article-title":"A novel algorithm for enhanced structural motif matching in proteins","volume":"22","author":"F Kaiser","year":"2015","journal-title":"Journal of Computational Biology"},{"issue":"5","key":"pcbi.1008502.ref021","doi-asserted-by":"crossref","first-page":"1307","DOI":"10.1016\/S0022-2836(03)00045-7","article-title":"A model for statistical significance of local similarities in structure","volume":"326","author":"A Stark","year":"2003","journal-title":"Journal of molecular biology"},{"key":"pcbi.1008502.ref022","doi-asserted-by":"crossref","unstructured":"Fofanov VY, Chen BY, Bryant DH, Moll M, Lichtarge O, Kavraki L, et al. A statistical model to correct systematic bias introduced by algorithmic thresholds in protein structural comparison algorithms. In: 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops. IEEE; 2008. p. 1\u20138.","DOI":"10.1109\/BIBMW.2008.4686202"},{"issue":"D1","key":"pcbi.1008502.ref023","doi-asserted-by":"crossref","first-page":"D464","DOI":"10.1093\/nar\/gky1004","article-title":"RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy","volume":"47","author":"SK Burley","year":"2019","journal-title":"Nucleic acids research"},{"issue":"D1","key":"pcbi.1008502.ref024","doi-asserted-by":"crossref","first-page":"D520","DOI":"10.1093\/nar\/gky949","article-title":"Protein Data Bank: the single global archive for 3D macromolecular structure data","volume":"47","year":"2019","journal-title":"Nucleic acids research"},{"issue":"7","key":"pcbi.1008502.ref025","doi-asserted-by":"crossref","first-page":"e1007970","DOI":"10.1371\/journal.pcbi.1007970","article-title":"Real time structural search of the Protein Data Bank","volume":"16","author":"D Guzenko","year":"2020","journal-title":"PLoS computational biology"},{"key":"pcbi.1008502.ref026","volume-title":"The art of computer programming","author":"DE Knuth","year":"1997"},{"issue":"5","key":"pcbi.1008502.ref027","doi-asserted-by":"crossref","first-page":"792","DOI":"10.1093\/bioinformatics\/btv637","article-title":"Fit3D: a web application for highly accurate screening of spatial residue patterns in protein structure data","volume":"32","author":"F Kaiser","year":"2016","journal-title":"Bioinformatics"},{"key":"pcbi.1008502.ref028","first-page":"gkw1000","article-title":"The RCSB protein data bank: integrative view of protein, gene and 3D structural information","author":"PW Rose","year":"2016","journal-title":"Nucleic acids research"},{"issue":"D1","key":"pcbi.1008502.ref029","doi-asserted-by":"crossref","first-page":"D344","DOI":"10.1093\/nar\/gks1067","article-title":"New and continuing developments at PROSITE","volume":"41","author":"CJ Sigrist","year":"2012","journal-title":"Nucleic acids research"},{"issue":"5","key":"pcbi.1008502.ref030","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1006\/jmbi.1998.2089","article-title":"Structures of native and complexed complement factor D: implications of the atypical His57 conformation and self-inhibitory loop in the regulation of specific serine protease activity","volume":"282","author":"H Jing","year":"1998","journal-title":"Journal of molecular biology"},{"issue":"6","key":"pcbi.1008502.ref031","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1089\/cmb.2007.R017","article-title":"The MASH pipeline for protein function prediction and an algorithm for the geometric refinement of 3D motifs","volume":"14","author":"BY Chen","year":"2007","journal-title":"Journal of Computational Biology"},{"issue":"51","key":"pcbi.1008502.ref032","doi-asserted-by":"crossref","first-page":"16489","DOI":"10.1021\/bi9616413","article-title":"The enolase superfamily: a general strategy for enzyme-catalyzed abstraction of the \u03b1-protons of carboxylic acids","volume":"35","author":"PC Babbitt","year":"1996","journal-title":"Biochemistry"},{"issue":"3","key":"pcbi.1008502.ref033","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/j.str.2017.01.004","article-title":"OneDep: unified wwPDB system for deposition, biocuration, and validation of macromolecular structures in the PDB archive","volume":"25","author":"JY Young","year":"2017","journal-title":"Structure"},{"issue":"6","key":"pcbi.1008502.ref034","doi-asserted-by":"crossref","first-page":"e1005575","DOI":"10.1371\/journal.pcbi.1005575","article-title":"MMTF\u2014An efficient file format for the transmission, visualization, and analysis of macromolecular structures","volume":"13","author":"AR Bradley","year":"2017","journal-title":"PLoS computational biology"},{"issue":"3","key":"pcbi.1008502.ref035","doi-asserted-by":"crossref","first-page":"e0174846","DOI":"10.1371\/journal.pone.0174846","article-title":"Towards an efficient compression of 3D coordinates of macromolecular structures","volume":"12","author":"Y Valasatava","year":"2017","journal-title":"PloS one"},{"issue":"10","key":"pcbi.1008502.ref036","doi-asserted-by":"crossref","first-page":"e1008247","DOI":"10.1371\/journal.pcbi.1008247","article-title":"BinaryCIF and CIFTools\u2014Lightweight, Efficient and Extensible Macromolecular Data Management","volume":"16","author":"D Sehnal","year":"2020","journal-title":"PLoS computational biology"},{"issue":"15","key":"pcbi.1008502.ref037","doi-asserted-by":"crossref","first-page":"1496","DOI":"10.1002\/jcc.25802","article-title":"RMSD and Symmetry","volume":"40","author":"EA Coutsias","year":"2019","journal-title":"Journal of computational chemistry"},{"issue":"7","key":"pcbi.1008502.ref038","doi-asserted-by":"crossref","first-page":"1561","DOI":"10.1002\/jcc.21439","article-title":"Fast determination of the optimal rotational matrix for macromolecular superpositions","volume":"31","author":"P Liu","year":"2010","journal-title":"Journal of computational chemistry"},{"issue":"21","key":"pcbi.1008502.ref039","doi-asserted-by":"crossref","first-page":"3755","DOI":"10.1093\/bioinformatics\/bty419","article-title":"NGL viewer: web-based molecular graphics for large complexes","volume":"34","author":"AS Rose","year":"2018","journal-title":"Bioinformatics"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1008502","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2020,12,17]],"date-time":"2020-12-17T00:00:00Z","timestamp":1608163200000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008502","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,17]],"date-time":"2020-12-17T16:47:05Z","timestamp":1608223625000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008502"}},"subtitle":[],"editor":[{"given":"Marco","family":"Punta","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,12,7]]},"references-count":39,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2020,12,7]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1008502","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.09.11.293977","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,7]]}}}