{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T15:46:59Z","timestamp":1762098419319,"version":"3.38.0"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"24","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,12,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: The extraction of the set of features most relevant to function from classified biological sequence sets is still a challenging problem. A central issue is the determination of expected counts for higher order features so that artifact features may be screened.<\/jats:p><jats:p>Results: Cascade detection (CD), a new algorithm for the extraction of localized features from sequence sets, is introduced. CD is a natural extension of the proportional modeling techniques used in contingency table analysis into the domain of feature detection. The algorithm is successfully tested on synthetic data and then applied to feature detection problems from two different domains to demonstrate its broad utility. An analysis of HIV-1 protease specificity reveals patterns of strong first-order features that group hydrophobic residues by side chain geometry and exhibit substantial symmetry about the cleavage site. Higher order results suggest that favorable cooperativity is weak by comparison and broadly distributed, but indicate possible synergies between negative charge and hydrophobicity in the substrate. Structure\u2013function results for the Schellman loop, a helix-capping motif in proteins, contain strong first-order features and also show statistically significant cooperativities that provide new insights into the design of the motif. These include a new \u2018hydrophobic staple\u2019 and multiple amphipathic and electrostatic pair features. CD should prove useful not only for sequence analysis, but also for the detection of multifactor synergies in cross-classified data from clinical studies or other sources.<\/jats:p><jats:p>Availability: Windows XP\/7 application and data files available at: https:\/\/sites.google.com\/site\/cascadedetect\/home.<\/jats:p><jats:p>Contact: \u00a0nacnewell@comcast.net<\/jats:p><jats:p>Supplementary Information: Supplementary information is available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr594","type":"journal-article","created":{"date-parts":[[2011,10,29]],"date-time":"2011-10-29T02:04:25Z","timestamp":1319853865000},"page":"3415-3422","source":"Crossref","is-referenced-by-count":8,"title":["Cascade detection for the extraction of localized sequence features; specificity results for HIV-1 protease and structure\u2013function results for the Schellman loop"],"prefix":"10.1093","volume":"27","author":[{"given":"Nicholas E.","family":"Newell","sequence":"first","affiliation":[{"name":"21 Parkview Road, Reading, MA 01867, USA"}]}],"member":"286","published-online":{"date-parts":[[2011,10,28]]},"reference":[{"key":"2023012511305947700_B1","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1002\/pro.5560070103","article-title":"Helix capping","volume":"7","author":"Aurora","year":"1998","journal-title":"Prot. Sci."},{"key":"2023012511305947700_B2","doi-asserted-by":"crossref","first-page":"37","DOI":"10.2174\/1568005024605837","article-title":"Defining HIV-1 protease substrate selectivity","volume":"2","author":"Beck","year":"2002","journal-title":"Curr. Drug Targets Infect. Disord."},{"key":"2023012511305947700_B3","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023012511305947700_B4","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1111\/j.2517-6161.1963.tb00504.x","article-title":"Maximum likelihood in three-way contingency tables","volume":"25","author":"Birch","year":"1963","journal-title":"J. R. Stat. Soc. Ser. B"},{"volume-title":"Discrete Multivariate Analysis.","year":"1975","author":"Bishop","key":"2023012511305947700_B5"},{"key":"2023012511305947700_B6","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1093\/bib\/bbn021","article-title":"Computational intelligence approaches for pattern discovery in biological systems","volume":"9","author":"Fogel","year":"2008","journal-title":"Brief. Bioinform."},{"key":"2023012511305947700_B7","doi-asserted-by":"crossref","first-page":"312","DOI":"10.1186\/1471-2105-9-312","article-title":"MSDmotif: exploring protein sites and motifs","volume":"9","author":"Golovin","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012511305947700_B8","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1023\/A:1012487302797","article-title":"Gene Selection for Cancer Classification using Support Vector Machines","volume":"46","author":"Guyon","year":"2002","journal-title":"Mach. Learn."},{"key":"2023012511305947700_B9","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1002\/pro.5560020701","article-title":"Probing the roles of the residues at the e and g positions of the GCN4 leucine zipper by combinatorial mutagenesis","volume":"2","author":"Hu","year":"1993","journal-title":"Prot. Sci."},{"key":"2023012511305947700_B10","doi-asserted-by":"crossref","first-page":"4686","DOI":"10.1073\/pnas.85.13.4686","article-title":"Active human immunodeficiency virus protease is required for viral infectivity","volume":"85","author":"Kohl","year":"1988","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511305947700_B11","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1186\/1471-2105-10-60","article-title":"Motivated Proteins: a web application for studying small three-dimensional protein motifs","volume":"10","author":"Leader","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012511305947700_B12","doi-asserted-by":"crossref","first-page":"18718","DOI":"10.1073\/pnas.0808709105","article-title":"A general framework for multiple testing dependence","volume":"105","author":"Leek","year":"2008","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012511305947700_B13","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1517\/17460441.3.7.775","article-title":"Second generation HIV protease inhibitors against resistant virus","volume":"3","author":"Lu","year":"2008","journal-title":"Expert Opin. Drug Discov."},{"key":"2023012511305947700_B14","doi-asserted-by":"crossref","first-page":"380","DOI":"10.1038\/nsb0595-380","article-title":"The hydrophobic-staple motif and a role for loop-residues in \u03b1-helix stability and protein folding","volume":"2","author":"Munoz","year":"1995","journal-title":"Nat. Struct. Biol."},{"key":"2023012511305947700_B15","first-page":"70","article-title":"Minimum redundancy maximum relevance feature selection","volume":"20","author":"Peng","year":"2005","journal-title":"IEEE Intell. Syst."},{"key":"2023012511305947700_B16","doi-asserted-by":"crossref","first-page":"4709","DOI":"10.1074\/jbc.271.9.4709","article-title":"Human immunodeficiency virus, type I protease substrate specificity is limited by interactions between substrate amino acids bound in adjacent enzyme subsites","volume":"271","author":"Ridky","year":"1996","journal-title":"J. Biol. Chem."},{"key":"2023012511305947700_B17","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1186\/1471-2105-10-149","article-title":"How to find simple and accurate rules for viral protease cleavage specificities","volume":"10","author":"R\u00f6gnvaldsson","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023012511305947700_B18","doi-asserted-by":"crossref","first-page":"2507","DOI":"10.1093\/bioinformatics\/btm344","article-title":"A review of feature selection techniques in bioinformatics","volume":"23","author":"Saeys","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012511305947700_B19","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1016\/S0006-291X(67)80055-X","article-title":"On the size of the active site in proteases. I. Papain","volume":"27","author":"Schechter","year":"1967","journal-title":"Biochem. Biophys. Res. Comun."},{"key":"2023012511305947700_B20","first-page":"53","article-title":"The \u03b1L-conformation at the ends of helices","volume-title":"Protein Folding.","author":"Schellman","year":"1980"},{"key":"2023012511305947700_B21","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1038\/nbt1408","article-title":"Proteome-derived, database-searchable peptide libraries for identifying protease cleavage sites","volume":"26","author":"Schilling","year":"2008","journal-title":"Nat. Biotechnol."},{"key":"2023012511305947700_B22","doi-asserted-by":"crossref","first-page":"12477","DOI":"10.1128\/JVI.79.19.12477-12486.2005","article-title":"Comprehensive bioinformatics analysis of the specificity of human immunodeficiency virus type I protease","volume":"79","author":"You","year":"2005","journal-title":"J. Virol."},{"key":"2023012511305947700_B23","first-page":"856","article-title":"Feature selection for high-dimensional data: a fast correlation-based filter solution","volume-title":"ICML-03.","author":"Yu","year":"2003"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/24\/3415\/48861470\/bioinformatics_27_24_3415.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/24\/3415\/48861470\/bioinformatics_27_24_3415.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,13]],"date-time":"2025-03-13T12:53:45Z","timestamp":1741870425000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/24\/3415\/307105"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,10,28]]},"references-count":23,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2011,12,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr594","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"type":"electronic","value":"1367-4811"},{"type":"print","value":"1367-4803"}],"subject":[],"published-other":{"date-parts":[[2011,12,15]]},"published":{"date-parts":[[2011,10,28]]}}}