{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:33:54Z","timestamp":1750221234716,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":54,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,8,15]],"date-time":"2018-08-15T00:00:00Z","timestamp":1534291200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Institutes of Health","award":["R01~CA180777"],"award-info":[{"award-number":["R01~CA180777"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,8,15]]},"DOI":"10.1145\/3233547.3233717","type":"proceedings-article","created":{"date-parts":[[2018,8,24]],"date-time":"2018-08-24T12:05:17Z","timestamp":1535112317000},"page":"566-566","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Choosing Non-redundant Representative Subsets Of Protein Sequence Data Sets Using Submodular Optimization"],"prefix":"10.1145","author":[{"given":"Maxwell W.","family":"Libbrecht","sequence":"first","affiliation":[{"name":"Simon Fraser University, Burnaby, BC, Canada"}]},{"given":"Jeffrey A.","family":"Bilmes","sequence":"additional","affiliation":[{"name":"University of Washington, Seattle, WA, USA"}]},{"given":"William Stafford","family":"Noble","sequence":"additional","affiliation":[{"name":"University of Washington, Seattle, WA, USA"}]}],"member":"320","published-online":{"date-parts":[[2018,8,15]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"207","volume":"201","author":"Human Microbiome Project Consortium","unstructured":"Human Microbiome Project Consortium . Structure , function and diversity of the healthy human microbiome. Nature. 201 2;486: 207 -- 214 . Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207--214.","journal-title":"Nature."},{"key":"e_1_3_2_1_2_1","first-page":"409","volume":"199","author":"Hobohm U","unstructured":"Hobohm U , Scharf M , Schneider R , Sander C. Selection of representative protein data sets. Protein Science. 199 2;1(3): 409 -- 417 . Hobohm U, Scharf M, Schneider R, Sander C. Selection of representative protein data sets. Protein Science. 1992;1(3):409--417.","journal-title":"Protein Science."},{"key":"e_1_3_2_1_3_1","first-page":"423","volume":"199","author":"Holm L","unstructured":"Holm L , Sander C. Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics. 199 8;14(5): 423 -- 429 . Holm L, Sander C. Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics. 1998;14(5):423--429.","journal-title":"Bioinformatics."},{"key":"e_1_3_2_1_4_1","first-page":"282","volume":"200","author":"Li W","unstructured":"Li W , Jeroszewski L , Godzik A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics. 200 1;17(3): 282 -- 283 . Li W, Jeroszewski L, Godzik A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics. 2001;17(3):282--283.","journal-title":"Bioinformatics."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btq461"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Parsons J Brenner S Bishop M. Clustering cDNA sequences. Computer applications in the biosciences: CABIOS. 1992;8(5):461--466.  Parsons J Brenner S Bishop M. Clustering cDNA sequences. Computer applications in the biosciences: CABIOS. 1992;8(5):461--466.","DOI":"10.1093\/bioinformatics\/8.5.461"},{"key":"e_1_3_2_1_7_1","first-page":"3389","volume":"199","author":"Altschul SF","unstructured":"Altschul SF , Madden TL , Schaffer AA , Zhang J , Zhang Z , Miller W , Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research. 199 7;25: 3389 -- 3402 . Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research. 1997;25:3389--3402.","journal-title":"Nucleic Acids Research."},{"key":"e_1_3_2_1_8_1","first-page":"451","volume":"200","author":"Enright AJ","unstructured":"Enright AJ , Ouzounis CA. Gene RAGE: a robust algorithm for sequence clustering and domain detection. Bioinformatics. 200 0;16(5): 451 -- 457 . Enright AJ, Ouzounis CA. GeneRAGE: a robust algorithm for sequence clustering and domain detection. Bioinformatics. 2000;16(5):451--457.","journal-title":"Bioinformatics."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Rice P Longden I Bleasby A etal EMBOSS: the European molecular biology open software suite. Trends in genetics. 2000;16(6):276--277.  Rice P Longden I Bleasby A et al. EMBOSS: the European molecular biology open software suite. Trends in genetics. 2000;16(6):276--277.","DOI":"10.1016\/S0168-9525(00)02024-2"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Sikic K Carugo O. Protein sequence redundancy reduction: comparison of various method. Bioinformation. 2010;5(6):234.  Sikic K Carugo O. Protein sequence redundancy reduction: comparison of various method. Bioinformation. 2010;5(6):234.","DOI":"10.6026\/97320630005234"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl158"},{"key":"e_1_3_2_1_12_1","first-page":"1589","volume":"200","author":"Wang G","unstructured":"Wang G , Dunbrack , Jr RL. PISCES: a protein sequence culling server. Bioinformatics. 200 3;19: 1589 -- 1591 . Wang G, Dunbrack, Jr RL. PISCES: a protein sequence culling server. Bioinformatics. 2003;19:1589--1591.","journal-title":"Bioinformatics."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Fisher ML Nemhauser GL Wolsey LA. An analysis of approximations for maximizing submodular set functions--II. Polyhedral combinatorics. 1978;p. 73--87.  Fisher ML Nemhauser GL Wolsey LA. An analysis of approximations for maximizing submodular set functions--II. Polyhedral combinatorics. 1978;p. 73--87.","DOI":"10.1007\/BFb0121195"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01588971"},{"key":"e_1_3_2_1_15_1","first-page":"234","volume":"197","author":"Minoux M.","unstructured":"Minoux M. Accelerated greedy algorithms for maximizing submodular set functions. Optimization Techniques. 197 8;p. 234 -- 243 . Minoux M. Accelerated greedy algorithms for maximizing submodular set functions. Optimization Techniques. 1978;p. 234--243.","journal-title":"Optimization Techniques."},{"key":"e_1_3_2_1_16_1","volume-title":"Oligopoly pricing: Old ideas and new tools","author":"Vives X.","year":"2001","unstructured":"Vives X. Oligopoly pricing: Old ideas and new tools . The MIT Press ; 2001 . Vives X. Oligopoly pricing: Old ideas and new tools. The MIT Press; 2001."},{"key":"e_1_3_2_1_17_1","volume-title":"Foundations of Mathematical Economics","author":"Carter M.","year":"2001","unstructured":"Carter M. Foundations of Mathematical Economics . The MIT Press ; 2001 . Carter M. Foundations of Mathematical Economics. The MIT Press; 2001."},{"key":"e_1_3_2_1_18_1","volume-title":"Supermodularity and complementarity","author":"Topkis DM.","year":"1998","unstructured":"Topkis DM. Supermodularity and complementarity . Princeton University Press ; 1998 . Topkis DM. Supermodularity and complementarity. Princeton University Press; 1998."},{"key":"e_1_3_2_1_19_1","first-page":"11","volume":"197","author":"Shapley LS.","unstructured":"Shapley LS. Cores of convex games. International Journal of Game Theory. 197 1;1(1): 11 -- 26 . Shapley LS. Cores of convex games. International Journal of Game Theory. 1971;1(1):11--26.","journal-title":"Game Theory."},{"key":"e_1_3_2_1_20_1","first-page":"69","volume":"197","author":"Edmonds J.","unstructured":"Edmonds J. Matroids , Submodular Functions, and Certain Polyhedra . Combinatorial Structures and Their Applications. 197 0;p. 69 -- 87 . Edmonds J. Matroids, Submodular Functions, and Certain Polyhedra. Combinatorial Structures and Their Applications. 1970;p. 69--87.","journal-title":"Their Applications."},{"key":"e_1_3_2_1_21_1","volume-title":"Mathematical Programming -- The State of the Art","author":"Lov\u00e1sz L.","year":"1983","unstructured":"Lov\u00e1sz L. Submodular functions and convexity . In: A Bachem MG, Korte B, editors. Mathematical Programming -- The State of the Art . Springer-Verlag ; 1983 . p. 235--257. Lov\u00e1sz L. Submodular functions and convexity. In: A Bachem MG, Korte B, editors. Mathematical Programming -- The State of the Art. Springer-Verlag; 1983. p. 235--257."},{"key":"e_1_3_2_1_22_1","volume-title":"Combinatorial Optimization","author":"Schrijver A.","year":"2004","unstructured":"Schrijver A. Combinatorial Optimization . Springer ; 2004 . Schrijver A. Combinatorial Optimization. Springer; 2004."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Narayanan H. Submodular functions and electrical networks. Annals of Discrete Mathematics. 1997;54.  Narayanan H. Submodular functions and electrical networks. Annals of Discrete Mathematics. 1997;54.","DOI":"10.1016\/S0167-5060(08)70678-2"},{"key":"e_1_3_2_1_24_1","volume-title":"Discrete Location Theory.","author":"Cornun\u00e9jols G","year":"1990","unstructured":"Cornun\u00e9jols G , Nemhauser GL , Wolsey LA. The uncapacitated facility location problem . In: Mirchandani PB, Franci RL, editors. Discrete Location Theory. New York : Wiley\/Interscience ; 1990 . p. 119--171. Cornun\u00e9jols G, Nemhauser GL, Wolsey LA. The uncapacitated facility location problem. In: Mirchandani PB, Franci RL, editors. Discrete Location Theory. New York: Wiley\/Interscience; 1990. p. 119--171."},{"key":"e_1_3_2_1_25_1","first-page":"724","volume":"200","author":"Lin G","unstructured":"Lin G , Chawla MK , Olson K , Barnes CA , Guzowski JF , Bjornsson C , A multi-model approach to simultaneous segmentation and classification of heteregenous populations of cell nuclei in 3D confocal microscope images. Cytometry A. 200 7;71(9): 724 -- 736 . Lin G, Chawla MK, Olson K, Barnes CA, Guzowski JF, Bjornsson C, et al. A multi-model approach to simultaneous segmentation and classification of heteregenous populations of cell nuclei in 3D confocal microscope images. Cytometry A. 2007;71(9):724--736.","journal-title":"Cytometry A."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002537"},{"key":"e_1_3_2_1_27_1","volume-title":"Uncertainty in Artificial Intelligence (UAI)","author":"Lin H","year":"2012","unstructured":"Lin H , Bilmes J. Learning Mixtures of Submodular Shells with Application to Document Summarization . In: Uncertainty in Artificial Intelligence (UAI) . Catalina Island , USA : AUAI; 2012 . p. 479--490. Lin H, Bilmes J. Learning Mixtures of Submodular Shells with Application to Document Summarization. In: Uncertainty in Artificial Intelligence (UAI). Catalina Island, USA: AUAI; 2012. p. 479--490."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6639057"},{"key":"e_1_3_2_1_29_1","volume-title":"HLT-NAACL","author":"Wei K","year":"2013","unstructured":"Wei K , Liu Y , Kirchhoff K , Bilmes J. Using Document Summarization Techniques for Speech Data Subset Selection . In: HLT-NAACL ; 2013 . p. 721--726. Wei K, Liu Y, Kirchhoff K, Bilmes J. Using Document Summarization Techniques for Speech Data Subset Selection. In: HLT-NAACL; 2013. p. 721--726."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6854213"},{"key":"e_1_3_2_1_31_1","volume-title":"Empirical Methods in Natural Language Processing (EMNLP)","author":"Kirchhoff K","year":"2014","unstructured":"Kirchhoff K , Bilmes J. Submodularity for data selection in machine translation . In: Empirical Methods in Natural Language Processing (EMNLP) ; 2014 . . Kirchhoff K, Bilmes J. Submodularity for data selection in machine translation. In: Empirical Methods in Natural Language Processing (EMNLP); 2014. ."},{"key":"e_1_3_2_1_32_1","volume-title":"Advances in Neural Information Processing Systems","author":"Tschiatschek S","year":"2014","unstructured":"Tschiatschek S , Iyer RK , Wei H , Bilmes JA. Learning mixtures of submodular functions for image collection summarization . In: Advances in Neural Information Processing Systems ; 2014 . p. 1413--1421. Tschiatschek S, Iyer RK, Wei H, Bilmes JA. Learning mixtures of submodular functions for image collection summarization. In: Advances in Neural Information Processing Systems; 2014. p. 1413--1421."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Wei K Libbrecht MW Bilmes JA Noble WS. Choosing panels of genomics assays using submodular optimization. Genome biology. 2016;17(1):229.  Wei K Libbrecht MW Bilmes JA Noble WS. Choosing panels of genomics assays using submodular optimization. Genome biology. 2016;17(1):229.","DOI":"10.1186\/s13059-016-1089-7"},{"key":"e_1_3_2_1_34_1","volume-title":"Submodular functions and optimization","author":"Fujishige S.","year":"2005","unstructured":"Fujishige S. Submodular functions and optimization . vol. 58 . Elsevier Science ; 2005 . Fujishige S. Submodular functions and optimization. vol. 58. Elsevier Science; 2005."},{"key":"e_1_3_2_1_35_1","first-page":"6559","volume":"200","author":"Weston J","unstructured":"Weston J , Elisseeff A , Zhou D , Leslie C , Noble WS. Protein ranking: from local to global structure in the protein similarity network. Proceedings of the National Academy of Sciences. 200 4;101(17): 6559 -- 6563 . Weston J, Elisseeff A, Zhou D, Leslie C, Noble WS. Protein ranking: from local to global structure in the protein similarity network. Proceedings of the National Academy of Sciences. 2004;101(17):6559--63.","journal-title":"Sciences."},{"key":"e_1_3_2_1_36_1","first-page":"403","volume":"199","author":"Altschul SF","unstructured":"Altschul SF , Gish W , Miller W , Myers EW , Lipman DJ. A basic local alignment search tool. Journal of Molecular Biology. 199 0;215: 403 -- 410 . Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. A basic local alignment search tool. Journal of Molecular Biology. 1990;215:403--410.","journal-title":"Molecular Biology."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/285055.285059"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2012.73"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1137\/090779346"},{"key":"e_1_3_2_1_40_1","volume-title":"AAAI","author":"Mirzasoleiman B","year":"2015","unstructured":"Mirzasoleiman B , Badanidiyuru A , Karbasi A , Vondr\u00e1k J , Krause A. Lazier Than Lazy Greedy . In: AAAI ; 2015 . p. 1812--1818. Mirzasoleiman B, Badanidiyuru A, Karbasi A, Vondr\u00e1k J, Krause A. Lazier Than Lazy Greedy. In: AAAI; 2015. p. 1812--1818."},{"key":"e_1_3_2_1_41_1","first-page":"1494","volume-title":"International Conference on Machine Learning;","author":"Wei K","year":"2014","unstructured":"Wei K , Iyer R , Bilmes J. Fast multi-stage submodular maximization . In: International Conference on Machine Learning; 2014 . p. 1494 -- 1502 . Wei K, Iyer R, Bilmes J. Fast multi-stage submodular maximization. In: International Conference on Machine Learning; 2014. p. 1494--1502."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"crossref","unstructured":"Hauser M Mayer CE S\u00f6ding J. kClust: fast and sensitive clustering of large protein sequence databases. BMC Bioinformatics. 2013;14(1):248.  Hauser M Mayer CE S\u00f6ding J. kClust: fast and sensitive clustering of large protein sequence databases. BMC Bioinformatics. 2013;14(1):248.","DOI":"10.1186\/1471-2105-14-248"},{"key":"e_1_3_2_1_43_1","first-page":"1575","volume":"200","author":"Enright AJ","unstructured":"Enright AJ , Dongen SV , Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research. 200 2;30(7): 1575 -- 1584 . Enright AJ, Dongen SV, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research. 2002;30(7):1575--1584.","journal-title":"Nucleic Acids Research."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2003.1260785"},{"key":"e_1_3_2_1_45_1","first-page":"783","volume":"199","author":"Guan X","unstructured":"Guan X , Du L. Domain identification by clustering sequence alignments. Bioinformatics. 199 8;14(9): 783 -- 788 . Guan X, Du L. Domain identification by clustering sequence alignments. Bioinformatics. 1998;14(9):783--788.","journal-title":"Bioinformatics."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"crossref","unstructured":"Bull SC Muldoon MR Doig AJ. Maximising the size of non-redundant protein datasets using graph theory. PloS ONE. 2013;8(2):e55484.  Bull SC Muldoon MR Doig AJ. Maximising the size of non-redundant protein datasets using graph theory. PloS ONE. 2013;8(2):e55484.","DOI":"10.1371\/journal.pone.0055484"},{"key":"e_1_3_2_1_47_1","first-page":"1571","volume":"200","author":"Paccanaro A","unstructured":"Paccanaro A , Casbon JA , Saqi MA. Spectral clustering of protein sequences. Nucleic Acids Research. 200 6;34(5): 1571 -- 1580 . Paccanaro A, Casbon JA, Saqi MA. Spectral clustering of protein sequences. Nucleic Acids Research. 2006;34(5):1571--1580.","journal-title":"Nucleic Acids Research."},{"key":"e_1_3_2_1_48_1","unstructured":"Arthur D Vassilvitskii S. k-means  Arthur D Vassilvitskii S. k-means"},{"key":"e_1_3_2_1_49_1","first-page":"1027","volume-title":"Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics;","year":"2007","unstructured":": The advantages of careful seeding . In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics; 2007 . p. 1027 -- 1035 . : The advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics; 2007. p. 1027--1035."},{"key":"e_1_3_2_1_50_1","first-page":"536","volume":"199","author":"Murzin AG","unstructured":"Murzin AG , Brenner SE , Hubbard T , Chothia C. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology. 199 5;247: 536 -- 540 . Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology. 1995;247:536--540.","journal-title":"Molecular Biology."},{"key":"e_1_3_2_1_51_1","first-page":"254","volume":"200","author":"Brenner SE","unstructured":"Brenner SE , Koehl P , Levitt M. The ASTRAL compendium for sequence and structure analysis. Nucleic Acids Research. 200 0;28: 254 -- 256 . Brenner SE, Koehl P, Levitt M. The ASTRAL compendium for sequence and structure analysis. Nucleic Acids Research. 2000;28:254--256.","journal-title":"Nucleic Acids Research."},{"key":"e_1_3_2_1_52_1","first-page":"972","volume":"200","author":"Frey BJ","unstructured":"Frey BJ , Dueck D. Clustering by passing messages between data points. Science. 200 7;315: 972 -- 976 . Frey BJ, Dueck D. Clustering by passing messages between data points. Science. 2007;315:972--976.","journal-title":"Science."},{"key":"e_1_3_2_1_53_1","first-page":"385","volume":"198","author":"Wolsey LA.","unstructured":"Wolsey LA. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica. 198 2;2(4): 385 -- 393 . Wolsey LA. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica. 1982;2(4):385--393.","journal-title":"Combinatorica."},{"key":"e_1_3_2_1_54_1","first-page":"2579","volume":"200","author":"Van der Maaten L","unstructured":"Van der Maaten L , Hinton G. Visualizing data using t- SNE . Journal of Machine Learning Research. 200 8;9( 2579 -- 2605 ):85. Van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research. 2008;9(2579--2605):85.","journal-title":"Machine Learning Research."}],"event":{"name":"BCB '18: 9th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","sponsor":["SIGBio ACM Special Interest Group on Bioinformatics"],"location":"Washington DC USA","acronym":"BCB '18"},"container-title":["Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3233547.3233717","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3233547.3233717","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:07:11Z","timestamp":1750212431000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3233547.3233717"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8,15]]},"references-count":54,"alternative-id":["10.1145\/3233547.3233717","10.1145\/3233547"],"URL":"https:\/\/doi.org\/10.1145\/3233547.3233717","relation":{},"subject":[],"published":{"date-parts":[[2018,8,15]]},"assertion":[{"value":"2018-08-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}