{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T19:57:36Z","timestamp":1760385456934,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":33,"publisher":"ACM","license":[{"start":{"date-parts":[[2015,10,22]],"date-time":"2015-10-22T00:00:00Z","timestamp":1445472000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Australian Research Council Discovery Project grant: Natural language processing for automated validation of protein databases","award":["DP150101550"],"award-info":[{"award-number":["DP150101550"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2015,10,22]]},"DOI":"10.1145\/2811163.2811175","type":"proceedings-article","created":{"date-parts":[[2015,10,27]],"date-time":"2015-10-27T13:07:31Z","timestamp":1445951251000},"page":"4-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Evaluation of a Machine Learning Duplicate Detection Method for Bioinformatics Databases"],"prefix":"10.1145","author":[{"given":"Qingyu","family":"Chen","sequence":"first","affiliation":[{"name":"University of Melbourne, Melbourne, Australia"}]},{"given":"Justin","family":"Zobel","sequence":"additional","affiliation":[{"name":"University of Melbourne, Melbourne, Australia"}]},{"given":"Karin","family":"Verspoor","sequence":"additional","affiliation":[{"name":"University of Melbourne, Melbourne, Australia"}]}],"member":"320","published-online":{"date-parts":[[2015,10,22]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1038\/ng0294-119"},{"key":"e_1_3_2_1_2_1","volume-title":"The universal protein resource (uniprot). Nucleic acids research, 33(suppl 1):D154--D159","author":"Bairoch A.","year":"2005","unstructured":"A. Bairoch , R. Apweiler , C. H. Wu , W. C. Barker , B. Boeckmann , S. Ferro , E. Gasteiger , H. Huang , R. Lopez , M. Magrane , The universal protein resource (uniprot). Nucleic acids research, 33(suppl 1):D154--D159 , 2005 . A. Bairoch, R. Apweiler, C. H. Wu, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopez, M. Magrane, et al. The universal protein resource (uniprot). Nucleic acids research, 33(suppl 1):D154--D159, 2005."},{"key":"e_1_3_2_1_3_1","volume-title":"its effect on cross-sectional and trend analyses. Journal of clinical epidemiology, 47(3):293--301","author":"Bennett S.","year":"1994","unstructured":"S. Bennett . Blood pressure measurement error : its effect on cross-sectional and trend analyses. Journal of clinical epidemiology, 47(3):293--301 , 1994 . S. Bennett. Blood pressure measurement error: its effect on cross-sectional and trend analyses. Journal of clinical epidemiology, 47(3):293--301, 1994."},{"key":"e_1_3_2_1_4_1","volume-title":"Nucleic acids research, page gks1195","author":"Benson D. A.","year":"2012","unstructured":"D. A. Benson , M. Cavanaugh , K. Clark , I. Karsch-Mizrachi , D. J. Lipman , J. Ostell , and E. W. Sayers . Genbank . Nucleic acids research, page gks1195 , 2012 . D. A. Benson, M. Cavanaugh, K. Clark, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and E. W. Sayers. Genbank. Nucleic acids research, page gks1195, 2012."},{"key":"e_1_3_2_1_5_1","volume-title":"Nucleic acids research, page gks1195","author":"Benson D. A.","year":"2006","unstructured":"D. A. Benson , I. Karsch-Mizrachi , D. J. Lipman , J. Ostell , and D. L. Wheeler . Genbank . Nucleic acids research, page gks1195 , 2006 . D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler. Genbank. Nucleic acids research, page gks1195, 2006."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/0168-9525(96)60040-7"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1089\/cmb.2007.R005"},{"issue":"4","key":"e_1_3_2_1_8_1","article-title":"Detecting redundancy in biological databases? an efficient approach","volume":"9","author":"Chellamuthu S.","year":"2009","unstructured":"S. Chellamuthu and D. M. Punithavalli . Detecting redundancy in biological databases? an efficient approach . Global Journal of Computer Science and Technology , 9 ( 4 ), 2009 . S. Chellamuthu and D. M. Punithavalli. Detecting redundancy in biological databases? an efficient approach. Global Journal of Computer Science and Technology, 9(4), 2009.","journal-title":"Global Journal of Computer Science and Technology"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0020-0255(00)00070-0"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0168-9525(01)02348-4"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.9"},{"key":"e_1_3_2_1_12_1","first-page":"1","volume-title":"Web-Age Information Management","author":"Fan W.","year":"2012","unstructured":"W. Fan . Data quality: Theory and practice . In Web-Age Information Management , pages 1 -- 16 . Springer , 2012 . W. Fan. Data quality: Theory and practice. In Web-Age Information Management, pages 1--16. Springer, 2012."},{"key":"e_1_3_2_1_13_1","volume-title":"Proc. 27th Int'l Conf. Very Large Databases","author":"Galhardas H.","year":"2001","unstructured":"H. Galhardas , D. Florescu , D. Shasha , E. Simon , and C. Saita . Declarative data cleaning: Language, model, and algorithms . Proc. 27th Int'l Conf. Very Large Databases , 2001 . H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C. Saita. Declarative data cleaning: Language, model, and algorithms. Proc. 27th Int'l Conf. Very Large Databases, 2001."},{"issue":"1","key":"e_1_3_2_1_14_1","first-page":"1","article-title":"Cleanup: a fast computer program for removing redundancies from nucleotide sequence databases. Computer applications in the biosciences","volume":"12","author":"Grillo G.","year":"1996","unstructured":"G. Grillo , M. Attimonelli , S. Liuni , and G. Pesole . Cleanup: a fast computer program for removing redundancies from nucleotide sequence databases. Computer applications in the biosciences : CABIOS , 12 ( 1 ): 1 -- 8 , 1996 . G. Grillo, M. Attimonelli, S. Liuni, and G. Pesole. Cleanup: a fast computer program for removing redundancies from nucleotide sequence databases. Computer applications in the biosciences: CABIOS, 12(1):1--8, 1996.","journal-title":"CABIOS"},{"key":"e_1_3_2_1_15_1","volume-title":"The Comprehensive R Archive Network","author":"Hahsler M.","year":"2009","unstructured":"M. Hahsler , B. Gr\u00fcn , K. Hornik , and C. Buchta . Introduction to arules - a computational environment for mining association rules and frequent item sets . The Comprehensive R Archive Network , 2009 . M. Hahsler, B. Gr\u00fcn, K. Hornik, and C. Buchta. Introduction to arules - a computational environment for mining association rules and frequent item sets. The Comprehensive R Archive Network, 2009."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/14.5.423"},{"key":"e_1_3_2_1_17_1","volume-title":"Duplicate detection in biological data using association rule mining. Locus, 501(P34180):S22388","author":"Koh J. L.","year":"2004","unstructured":"J. L. Koh , M. L. Lee , A. M. Khan , P. T. Tan , and V. Brusic . Duplicate detection in biological data using association rule mining. Locus, 501(P34180):S22388 , 2004 . J. L. Koh, M. L. Lee, A. M. Khan, P. T. Tan, and V. Brusic. Duplicate detection in biological data using association rule mining. Locus, 501(P34180):S22388, 2004."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0145-305X(00)00044-6"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl158"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1093\/protein\/15.8.643"},{"key":"e_1_3_2_1_21_1","volume-title":"Eighth International Conference on Information Quality (IQ 2003)","author":"M\u00fcller H.","year":"2003","unstructured":"H. M\u00fcller , F. Naumann , and J.-C. Freytag . Data quality in genome databases . Eighth International Conference on Information Quality (IQ 2003) , 2003 . H. M\u00fcller, F. Naumann, and J.-C. Freytag. Data quality in genome databases. Eighth International Conference on Information Quality (IQ 2003), 2003."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1504\/IJDMB.2010.034196"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.6026\/97320630005234"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-009-0254-7"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm098"},{"key":"e_1_3_2_1_26_1","volume-title":"Blast 2 sequences, a new tool for comparing protein and nucleotide sequences. FEMS microbiology letters, 174(2):247--250","author":"Tatusova T. A.","year":"1999","unstructured":"T. A. Tatusova and T. L. Madden . Blast 2 sequences, a new tool for comparing protein and nucleotide sequences. FEMS microbiology letters, 174(2):247--250 , 1999 . T. A. Tatusova and T. L. Madden. Blast 2 sequences, a new tool for comparing protein and nucleotide sequences. FEMS microbiology letters, 174(2):247--250, 1999."},{"key":"e_1_3_2_1_27_1","volume-title":"R language definition","author":"Team R. C.","year":"2000","unstructured":"R. C. Team . R language definition , 2000 . R. C. Team. R language definition, 2000."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4379(01)00042-4"},{"key":"e_1_3_2_1_29_1","volume-title":"Using duplicate genotyped data in genetic analyses: testing association and estimating error rates. Statistical applications in genetics and molecular biology, 6(1)","author":"Tintle N. L.","year":"2007","unstructured":"N. L. Tintle , D. Gordon , F. J. McMahon , and S. J. Finch . Using duplicate genotyped data in genetic analyses: testing association and estimating error rates. Statistical applications in genetics and molecular biology, 6(1) , 2007 . N. L. Tintle, D. Gordon, F. J. McMahon, and S. J. Finch. Using duplicate genotyped data in genetic analyses: testing association and estimating error rates. Statistical applications in genetics and molecular biology, 6(1), 2007."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0020-0255(00)00013-X"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2007.331"},{"key":"e_1_3_2_1_32_1","volume-title":"Molecular phylogeny of north american branchiobdellida (annelida: Clitellata). Molecular phylogenetics and evolution, 66(1):30--42","author":"Williams B. W.","year":"2013","unstructured":"B. W. Williams , S. R. Gelder , H. C. Proctor , and D. W. Coltman . Molecular phylogeny of north american branchiobdellida (annelida: Clitellata). Molecular phylogenetics and evolution, 66(1):30--42 , 2013 . B. W. Williams, S. R. Gelder, H. C. Proctor, and D. W. Coltman. Molecular phylogeny of north american branchiobdellida (annelida: Clitellata). Molecular phylogenetics and evolution, 66(1):30--42, 2013."},{"key":"e_1_3_2_1_33_1","volume-title":"Starcode: sequence clustering based on all-pairs search. Bioinformatics, page btv053","author":"Zorita E. V.","year":"2015","unstructured":"E. V. Zorita , P. Cusc\u00f3 , and G. Filion . Starcode: sequence clustering based on all-pairs search. Bioinformatics, page btv053 , 2015 . E. V. Zorita, P. Cusc\u00f3, and G. Filion. Starcode: sequence clustering based on all-pairs search. Bioinformatics, page btv053, 2015."}],"event":{"name":"CIKM'15: 24th ACM International Conference on Information and Knowledge Management","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGIR ACM Special Interest Group on Information Retrieval"],"location":"Melbourne Australia","acronym":"CIKM'15"},"container-title":["Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical Informatics"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2811163.2811175","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2811163.2811175","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:04:15Z","timestamp":1750273455000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2811163.2811175"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,10,22]]},"references-count":33,"alternative-id":["10.1145\/2811163.2811175","10.1145\/2811163"],"URL":"https:\/\/doi.org\/10.1145\/2811163.2811175","relation":{},"subject":[],"published":{"date-parts":[[2015,10,22]]},"assertion":[{"value":"2015-10-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}