{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:41:56Z","timestamp":1750308116389,"version":"3.41.0"},"reference-count":21,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2005,6,1]],"date-time":"2005-06-01T00:00:00Z","timestamp":1117584000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGKDD Explor. Newsl."],"published-print":{"date-parts":[[2005,6]]},"abstract":"<jats:p>We present a framework that bridges the gap between natural language processing (NLP) and text mining. Central to this is a new approach to text parameterization that captures many interesting attributes of text usually ignored by standard indices, like the term-document matrix. By storing NLP tags, the new index supports a higher degree of knowledge discovery and pattern finding from text. The index is relatively compact, enabling dynamic search of arbitrary relationships and events in large document collections. We can export search results in formats and data structures that are transparent to statistical analysis tools like S-PLUSID\u00ae. In a number of experiments, we demonstrate how this framework can turn mountains of unstructured information into informative statistical graphs.<\/jats:p>","DOI":"10.1145\/1089815.1089825","type":"journal-article","created":{"date-parts":[[2007,1,17]],"date-time":"2007-01-17T18:32:02Z","timestamp":1169058722000},"page":"67-75","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Extracting statistical data frames from text"],"prefix":"10.1145","volume":"7","author":[{"given":"Jisheng","family":"Liang","sequence":"first","affiliation":[{"name":"Insightful Corporation, Seattle, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Krzysztof","family":"Koperski","sequence":"additional","affiliation":[{"name":"Insightful Corporation, Seattle, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thien","family":"Nguyen","sequence":"additional","affiliation":[{"name":"Insightful Corporation, Seattle, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Giovanni","family":"Marchisio","sequence":"additional","affiliation":[{"name":"Insightful Corporation, Seattle, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2005,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046456.1046478"},{"key":"e_1_2_1_2_1","volume-title":"Foundation of Statistical Natural Language Processing","author":"Manning C. D.","year":"2000","unstructured":"Manning , C. D. , and Schutze H . Foundation of Statistical Natural Language Processing . The MIT Press , 2000 . Manning, C. D., and Schutze H. Foundation of Statistical Natural Language Processing. The MIT Press, 2000."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/1034678.1034679"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/18.12.1553"},{"key":"e_1_2_1_5_1","volume-title":"Dubitzky and Pereira, Artificial intelligence methods and tools for systems biology","author":"Cohen K. B.","year":"2004","unstructured":"Cohen , K. B. , and Hunter , H . Natural language processing and systems biology . In Dubitzky and Pereira, Artificial intelligence methods and tools for systems biology . Springer Verlag , 2004 . Cohen, K. B., and Hunter, H. Natural language processing and systems biology. In Dubitzky and Pereira, Artificial intelligence methods and tools for systems biology. Springer Verlag, 2004."},{"key":"e_1_2_1_6_1","volume-title":"Seventh Message Understanding Conferences (MUC-7). Morgan Kaufmann Publishers","author":"Proceedings","year":"1997","unstructured":"Proceedings of the Seventh Message Understanding Conferences (MUC-7). Morgan Kaufmann Publishers , 1997 . Proceedings of the Seventh Message Understanding Conferences (MUC-7). Morgan Kaufmann Publishers, 1997."},{"key":"e_1_2_1_7_1","unstructured":"NIST. Automatic Content Extraction (ACE) program. http:\/\/www.nist.gov\/speech\/tests\/ace\/  NIST. Automatic Content Extraction (ACE) program. http:\/\/www.nist.gov\/speech\/tests\/ace\/"},{"key":"e_1_2_1_8_1","unstructured":"JUNG Java Universal Network\/Graph Framework. http:\/\/jung.sourceforge.net  JUNG Java Universal Network\/Graph Framework. http:\/\/jung.sourceforge.net"},{"key":"e_1_2_1_9_1","first-page":"7821","article-title":"Community Structure","volume":"2002","author":"Girvan M","unstructured":"Girvan M , Newman MEJ . Community Structure in Social and Biological Networks. Proc. Natl. Acad. Sci. USA 2002 , 99: 7821 -- 7826 . Girvan M, Newman MEJ. Community Structure in Social and Biological Networks. Proc. Natl. Acad. Sci. USA 2002, 99:7821--7826.","journal-title":"Social and Biological Networks. Proc. Natl. Acad. Sci. USA"},{"key":"e_1_2_1_10_1","first-page":"65","volume-title":"Proceedings of the Pacific Asia Conf on Knowledge Discovery and Data Mining PAKDD'99 workshop on Knowledge Discovery from Advanced Databases","author":"Tan A. H.","year":"1999","unstructured":"Tan , A. H. 1999 , Text mining: The state of the art and the challenges , in Proceedings of the Pacific Asia Conf on Knowledge Discovery and Data Mining PAKDD'99 workshop on Knowledge Discovery from Advanced Databases , pp. 65 -- 70 . Tan, A. H. 1999, Text mining: The state of the art and the challenges, in Proceedings of the Pacific Asia Conf on Knowledge Discovery and Data Mining PAKDD'99 workshop on Knowledge Discovery from Advanced Databases, pp. 65--70."},{"key":"e_1_2_1_11_1","volume-title":"IEEE International Conference on Systems, Man, and Cybernetics","author":"Montes","year":"2001","unstructured":"Montes -Y-Gomez, M. Gelbukh , A. Lopez-Lopez , A. Baeza-Yates , R. , Text mining with conceptual graphs , In IEEE International Conference on Systems, Man, and Cybernetics , 2001 Montes-Y-Gomez, M. Gelbukh, A. Lopez-Lopez, A. Baeza-Yates, R., Text mining with conceptual graphs, In IEEE International Conference on Systems, Man, and Cybernetics, 2001"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1147\/sj.433.0516"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1147\/sj.404.0967"},{"key":"e_1_2_1_14_1","volume-title":"English language news stories","author":"Reuters Corp","year":"1996","unstructured":"Reuters Corp us, Volume 1 , English language news stories , 1996 -08-20 to 1997-08-19. Available at NIST: http:\/\/trec.nist.gov\/data\/reuters\/reuters.html Reuters Corpus, Volume 1, English language news stories, 1996-08-20 to 1997-08-19. Available at NIST: http:\/\/trec.nist.gov\/data\/reuters\/reuters.html"},{"key":"e_1_2_1_15_1","unstructured":"Tipster Corpus. Available at LDC: http:\/\/www.ldc.upenn.edu  Tipster Corpus. Available at LDC: http:\/\/www.ldc.upenn.edu"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"e_1_2_1_17_1","volume-title":"Proc. Symp. Document Image Understanding Technology","author":"Marchisio G.","year":"2001","unstructured":"Marchisio , G. and Liang , J . Experiments in trilingual cross-language information retrieval . Proc. Symp. Document Image Understanding Technology , 2001 . Marchisio, G. and Liang, J. Experiments in trilingual cross-language information retrieval. Proc. Symp. Document Image Understanding Technology, 2001."},{"key":"e_1_2_1_18_1","unstructured":"Marchisio G. Inverse inference engine for high performance Web search. United States patent No. 6 510 406 January 2003  Marchisio G. Inverse inference engine for high performance Web search. United States patent No. 6 510 406 January 2003"},{"key":"e_1_2_1_19_1","volume-title":"June 29","author":"Marchisio G.","year":"2004","unstructured":"Marchisio , G. Extended functionality for an inverse inference engine based Web search. United States patent No. 6,757,646 , June 29 , 2004 Marchisio, G. Extended functionality for an inverse inference engine based Web search. United States patent No. 6,757,646, June 29, 2004"},{"key":"e_1_2_1_20_1","unstructured":"Marchisio G. Internet navigation using soft hyperlinks. United States patent No. 6 862 710 March 2005  Marchisio G. Internet navigation using soft hyperlinks. United States patent No. 6 862 710 March 2005"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.842269"}],"container-title":["ACM SIGKDD Explorations Newsletter"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1089815.1089825","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1089815.1089825","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:08:16Z","timestamp":1750262896000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1089815.1089825"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,6]]},"references-count":21,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2005,6]]}},"alternative-id":["10.1145\/1089815.1089825"],"URL":"https:\/\/doi.org\/10.1145\/1089815.1089825","relation":{},"ISSN":["1931-0145","1931-0153"],"issn-type":[{"type":"print","value":"1931-0145"},{"type":"electronic","value":"1931-0153"}],"subject":[],"published":{"date-parts":[[2005,6]]},"assertion":[{"value":"2005-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}