{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T22:06:10Z","timestamp":1765231570943,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","license":[{"start":{"date-parts":[[2004,8,22]],"date-time":"2004-08-22T00:00:00Z","timestamp":1093132800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2004,8,22]]},"DOI":"10.1145\/1014052.1014065","type":"proceedings-article","created":{"date-parts":[[2004,10,7]],"date-time":"2004-10-07T17:39:48Z","timestamp":1097170788000},"page":"89-98","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":116,"title":["Exploiting dictionaries in named entity extraction"],"prefix":"10.1145","author":[{"given":"William W.","family":"Cohen","sequence":"first","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA"}]},{"given":"Sunita","family":"Sarawagi","sequence":"additional","affiliation":[{"name":"IIT Bombay, Mumbai, India"}]}],"member":"320","published-online":{"date-parts":[[2004,8,22]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/336597.336644"},{"key":"e_1_3_2_1_2_1","volume-title":"Proceedings of the 20th International Conference on Machine Learning (ICML)","author":"Altun Y.","year":"2003","unstructured":"Y. Altun , I. Tsochantaridis , and T. Hofmann . Hidden markov support vector machines . In Proceedings of the 20th International Conference on Machine Learning (ICML) , 2003 . Y. Altun, I. Tsochantaridis, and T. Hofmann. Hidden markov support vector machines. In Proceedings of the 20th International Conference on Machine Learning (ICML), 2003."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007558221122"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/375663.375682"},{"key":"e_1_3_2_1_5_1","volume-title":"Sixth Workshop on Very Large Corpora New Brunswick, New Jersey. Association for Computational Linguistics.","author":"Borthwick A.","year":"1998","unstructured":"A. Borthwick , J. Sterling , E. Agichtein , and R. Grishman . Exploiting diverse knowledge sources via maximum entropy in named entity recognition . In Sixth Workshop on Very Large Corpora New Brunswick, New Jersey. Association for Computational Linguistics. , 1998 . A. Borthwick, J. Sterling, E. Agichtein, and R. Grishman. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In Sixth Workshop on Very Large Corpora New Brunswick, New Jersey. Association for Computational Linguistics., 1998."},{"key":"e_1_3_2_1_6_1","volume-title":"Learning to extract proteins and their interactions from medline abstracts. Available from http:\/\/www.cs.utexas.edu\/users\/ml\/publication\/ie.html","author":"Bunescu R.","year":"2002","unstructured":"R. Bunescu , R. Ge , R. J. Kate , E. M. Marcotte , R. J. Mooney , A. K. Ramani , and Y. W. Wong . Learning to extract proteins and their interactions from medline abstracts. Available from http:\/\/www.cs.utexas.edu\/users\/ml\/publication\/ie.html , 2002 . R. Bunescu, R. Ge, R. J. Kate, E. M. Marcotte, R. J. Mooney, A. K. Ramani, and Y. W. Wong. Learning to extract proteins and their interactions from medline abstracts. Available from http:\/\/www.cs.utexas.edu\/users\/ml\/publication\/ie.html, 2002."},{"key":"e_1_3_2_1_7_1","volume-title":"Available from http:\/\/www.cs.utexas.edu\/users\/ml\/publication\/ie.html","author":"Bunescu R.","year":"2002","unstructured":"R. Bunescu , R. Ge , R. J. Mooney , E. Marcotte , and A. K. Ramani . Extracting gene and protein names from biomedical abstracts. Unpublished Technical Note , Available from http:\/\/www.cs.utexas.edu\/users\/ml\/publication\/ie.html , 2002 . R. Bunescu, R. Ge, R. J. Mooney, E. Marcotte, and A. K. Ramani. Extracting gene and protein names from biomedical abstracts. Unpublished Technical Note, Available from http:\/\/www.cs.utexas.edu\/users\/ml\/publication\/ie.html, 2002."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244304322972685"},{"key":"e_1_3_2_1_9_1","unstructured":"W. W. Cohen and P. Ravikumar. Secondstring: An open-source Java toolkit of approximate string-matching techniques. Project web page http:\/\/secondstring.sourceforge.net 2003.  W. W. Cohen and P. Ravikumar. Secondstring: An open-source Java toolkit of approximate string-matching techniques. Project web page http:\/\/secondstring.sourceforge.net 2003."},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web (IIWeb-03)","author":"Cohen W. W.","year":"2003","unstructured":"W. W. Cohen , P. Ravikumar , and S. E. Fienberg . A comparison of string distance metrics for name-matching tasks . In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web (IIWeb-03) , 2003 . W. W. Cohen, P. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web (IIWeb-03), 2003."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118693.1118694"},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP99)","author":"Collins M.","year":"1999","unstructured":"M. Collins and Y. Singer . Unsupervised models for named entity classification . In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP99) , College Park, MD , 1999 . M. Collins and Y. Singer. Unsupervised models for named entity classification. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP99), College Park, MD, 1999."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1162\/jmlr.2003.3.4-5.951"},{"key":"e_1_3_2_1_14_1","first-page":"77","volume-title":"Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB-99)","author":"Craven M.","year":"1999","unstructured":"M. Craven and J. Kumlien . Constructing biological knowledge bases by extracting information from text sources . In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB-99) , pages 77 -- 86 . AAAI Press , 1999 . M. Craven and J. Kumlien. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB-99), pages 77--86. AAAI Press, 1999."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological sequence analysis - Probabilistic models of proteins and nucleic acids","author":"Durban R.","year":"1998","unstructured":"R. Durban , S. R. Eddy , A. Krogh , and G. Mitchison . Biological sequence analysis - Probabilistic models of proteins and nucleic acids . Cambridge University Press , Cambridge , 1998 . R. Durban, S. R. Eddy, A. Krogh, and G. Mitchison. Biological sequence analysis - Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, 1998."},{"key":"e_1_3_2_1_16_1","volume-title":"Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann","author":"Freitag D.","year":"1998","unstructured":"D. Freitag . Multistrategy learning for information extraction . In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann , 1998 . D. Freitag. Multistrategy learning for information extraction. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, 1998."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/279943.279985"},{"key":"e_1_3_2_1_19_1","first-page":"403","volume-title":"Pac Symp Biocomput","author":"Hanisch D.","year":"2003","unstructured":"D. Hanisch , J. Fluck , H. Mevissen , and R. Zimmer . Playing biology's name game: identifying protein names in scientific text . In Pac Symp Biocomput , pages 403 -- 414 , 2003 . D. Hanisch, J. Fluck, H. Mevissen, and R. Zimmer. Playing biology's name game: identifying protein names in scientific text. In Pac Symp Biocomput, pages 403--14, 2003."},{"key":"e_1_3_2_1_20_1","first-page":"502","volume-title":"Proceedings of 2000 the Pacific Symposium on Biocomputing (PSB-2000)","author":"Humphreys K.","year":"2000","unstructured":"K. Humphreys , G. Demetriou , and R. Gaizauskas . Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structures . In Proceedings of 2000 the Pacific Symposium on Biocomputing (PSB-2000) , pages 502 -- 513 , 2000 . K. Humphreys, G. Demetriou, and R. Gaizauskas. Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structures. In Proceedings of 2000 the Pacific Symposium on Biocomputing (PSB-2000), pages 502--513, 2000."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118693.1118695"},{"key":"e_1_3_2_1_22_1","volume-title":"Coordination in teams: evi-dence from a simulated management game. To appear in the Journal of Organizational Behavior","author":"Kraut R. E.","year":"2004","unstructured":"R. E. Kraut , S. R. Fussell , F. J. Lerch , and J. A. Espinosa . Coordination in teams: evi-dence from a simulated management game. To appear in the Journal of Organizational Behavior , 2004 . R. E. Kraut, S. R. Fussell, F. J. Lerch, and J. A. Espinosa. Coordination in teams: evi-dence from a simulated management game. To appear in the Journal of Organizational Behavior, 2004."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0378-1119(00)00431-5"},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML-2001)","author":"Lafferty J.","year":"2001","unstructured":"J. Lafferty , A. McCallum , and F. Pereira . Conditional random fields: Probabilistic models for segmenting and labeling sequence data . In Proceedings of the International Conference on Machine Learning (ICML-2001) , Williams, MA , 2001 . J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML-2001), Williams, MA, 2001."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.769447"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022869011914"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118853.1118872"},{"key":"e_1_3_2_1_28_1","first-page":"591","volume-title":"Proceedings of the International Conference on Machine Learning (ICML-2000)","author":"McCallum A.","year":"2000","unstructured":"A. McCallum , D. Freitag , and F. Pereira . Maximum entropy markov models for information extraction and segmentation . In Proceedings of the International Conference on Machine Learning (ICML-2000) , pages 591 -- 598 , Palo Alto, CA , 2000 . A. McCallum, D. Freitag, and F. Pereira. Maximum entropy markov models for information extraction and segmentation. In Proceedings of the International Conference on Machine Learning (ICML-2000), pages 591--598, Palo Alto, CA, 2000."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009953814988"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007502103375"},{"key":"e_1_3_2_1_31_1","first-page":"1044","volume-title":"Proceedings of the Sixteenth National Conference on Artificial Intelligence","author":"Riloff E.","year":"1999","unstructured":"E. Riloff and R. Jones . Learning Dictionaries for Information Extraction by Multi-level Boot-strapping . In Proceedings of the Sixteenth National Conference on Artificial Intelligence , pages 1044 -- 1049 , 1999 . E. Riloff and R. Jones. Learning Dictionaries for Information Extraction by Multi-level Boot-strapping. In Proceedings of the Sixteenth National Conference on Artificial Intelligence, pages 1044--1049, 1999."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775087"},{"key":"e_1_3_2_1_33_1","first-page":"37","volume-title":"Papers from the AAAI-99 Workshop on Machine Learning for Information Extraction","author":"Seymore K.","year":"1999","unstructured":"K. Seymore , A. McCallum , and R. Rosenfeld . Learning Hidden Markov Model structure for information extraction . In Papers from the AAAI-99 Workshop on Machine Learning for Information Extraction , pages 37 -- 42 , 1999 . K. Seymore, A. McCallum, and R. Rosenfeld. Learning Hidden Markov Model structure for information extraction. In Papers from the AAAI-99 Workshop on Machine Learning for Information Extraction, pages 37--42, 1999."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073445.1073473"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/101883.102055"},{"key":"e_1_3_2_1_36_1","volume-title":"Technical Report CMU-CS-03-168, CMU-ISRI-03-104","author":"Sweeney L.","year":"2003","unstructured":"L. Sweeney . Finding lists of people on the web. Technical Report CMU-CS-03-168, CMU-ISRI-03-104 , Carnegie Mellon University School of Computer Science , 2003 . Available from: http:\/\/privacy.cs.cmu.edu\/dataprivacy\/projects\/rosterfinder\/. L. Sweeney. Finding lists of people on the web. Technical Report CMU-CS-03-168, CMU-ISRI-03-104, Carnegie Mellon University School of Computer Science, 2003. Available from: http:\/\/privacy.cs.cmu.edu\/dataprivacy\/projects\/rosterfinder\/."},{"key":"e_1_3_2_1_37_1","volume-title":"Business Survey methods","author":"Winkler W. E.","year":"1995","unstructured":"W. E. Winkler . Matching and record linkage . In Business Survey methods . Wiley , 1995 . W. E. Winkler. Matching and record linkage. In Business Survey methods. Wiley, 1995."},{"key":"e_1_3_2_1_38_1","volume-title":"Proceedings of the ICML Workshop on The Continuum from Labeled to Unlabeled Data, Washington, D.C","author":"Winston Lin R. Y.","year":"2003","unstructured":"R. Y. Winston Lin and R. Grishman . Bootstrapped learning of semantic classes from positive and negative examples . In Proceedings of the ICML Workshop on The Continuum from Labeled to Unlabeled Data, Washington, D.C , August 2003 . R. Y. Winston Lin and R. Grishman. Bootstrapped learning of semantic classes from positive and negative examples. In Proceedings of the ICML Workshop on The Continuum from Labeled to Unlabeled Data, Washington, D.C, August 2003."}],"event":{"name":"KDD04: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data","ACM Association for Computing Machinery"],"location":"Seattle WA USA","acronym":"KDD04"},"container-title":["Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1014052.1014065","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1014052.1014065","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:31:30Z","timestamp":1750264290000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1014052.1014065"}},"subtitle":["combining semi-Markov extraction processes and data integration methods"],"short-title":[],"issued":{"date-parts":[[2004,8,22]]},"references-count":37,"alternative-id":["10.1145\/1014052.1014065","10.1145\/1014052"],"URL":"https:\/\/doi.org\/10.1145\/1014052.1014065","relation":{},"subject":[],"published":{"date-parts":[[2004,8,22]]},"assertion":[{"value":"2004-08-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}