{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:31:32Z","timestamp":1760131892631,"version":"3.41.0"},"reference-count":26,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2005,6,1]],"date-time":"2005-06-01T00:00:00Z","timestamp":1117584000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGKDD Explor. Newsl."],"published-print":{"date-parts":[[2005,6]]},"abstract":"<jats:p>At a time when experimental throughput in the field of molecular biology is increasing, it is necessary for biologists and people working in related fields to have access to sophisticated tools to enable them to efficiently process large amounts of information in order to stay abreast of current research.Rhetorical zone analysis is an application of natural language processing in which areas of text in scientific papers are classified in terms of argumentation and intellectual contribution in order to pinpoint and distinguish certain types of information. Such analysis can be employed to assist in information extraction, helping to assess and integrate data generated by experiments into the scientific community's store of knowledge.We present results for several experiments in automatic zone identification on the ZAISA-1 dataset, a new dataset composed of full biomedical research papers hand-annotated for rhetorical zones. We concentrate on general purpose and linguistically motivated features, and report results for a variety of sets of features. It is our intention to provide a baseline feature set for modeling, which can be extended in future work using combinations of heuristics and more sophisticated and task-specific modeling techniques.<\/jats:p>","DOI":"10.1145\/1089815.1089823","type":"journal-article","created":{"date-parts":[[2007,1,17]],"date-time":"2007-01-17T18:32:02Z","timestamp":1169058722000},"page":"52-58","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["A baseline feature set for learning rhetorical zones using full articles in the biomedical domain"],"prefix":"10.1145","volume":"7","author":[{"given":"Tony","family":"Mullen","sequence":"first","affiliation":[{"name":"National Institute of Informatics, Chiyoda-ku, Tokyo, Japan"}]},{"given":"Yoko","family":"Mizuta","sequence":"additional","affiliation":[{"name":"National Institute of Informatics, Chiyoda-ku, Tokyo, Japan"}]},{"given":"Nigel","family":"Collier","sequence":"additional","affiliation":[{"name":"National Institute of Informatics, Chiyoda-ku, Tokyo, Japan"}]}],"member":"320","published-online":{"date-parts":[[2005,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/29.1.242"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"A. Bairoch R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL in 200 Nucleic Acids Research 28:302--303. 2000.  A. Bairoch R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL in 200 Nucleic Acids Research 28:302--303. 2000.","DOI":"10.1093\/nar\/28.1.45"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"H. M. Berman J. Westbrook Z. Feng G. Gilliland T. N. Bhat H. Weissig I. N. Shindyalov and P. E. Bourne. The Protein Data Bank\/ Nucleic Acids Research 28:235--242. 2000.  H. M. Berman J. Westbrook Z. Feng G. Gilliland T. N. Bhat H. Weissig I. N. Shindyalov and P. E. Bourne. The Protein Data Bank\/ Nucleic Acids Research 28:235--242. 2000.","DOI":"10.1093\/nar\/28.1.235"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009715923555"},{"key":"e_1_2_1_5_1","first-page":"77","volume-title":"ISMB'99","author":"Craven M.","year":"1999"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/345662"},{"issue":"2","key":"e_1_2_1_7_1","first-page":"144","article-title":"the challenges of searching the scientific literature","volume":"1","author":"Dickman S.","year":"2003","journal-title":"PLoS Biology"},{"key":"e_1_2_1_8_1","first-page":"502","volume-title":"BSB2000","author":"Humphreys K.","year":"2000"},{"volume-title":"Kluwer Academic Publishers","year":"2001","author":"Joachims T.","key":"e_1_2_1_9_1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1101\/gr.835903"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1093\/protein\/gzh020"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/30.1.264"},{"key":"e_1_2_1_13_1","first-page":"1737","volume-title":"LREC2004","author":"Mizuta Y.","year":"2004"},{"volume-title":"International Journal of Medical Informatics","author":"Mizuta Y.","key":"e_1_2_1_15_1"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"S. Novichova S. Egorov and N. Darasalia. Medscan a natural language processing engine for medline abstracts. Bioinformatics; 19(13):1699--1706 2003.  S. Novichova S. Egorov and N. Darasalia. Medscan a natural language processing engine for medline abstracts. Bioinformatics; 19(13):1699--1706 2003.","DOI":"10.1093\/bioinformatics\/btg207"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/1567594.1567597"},{"key":"e_1_2_1_18_1","unstructured":"G. Salton and M. J. McGill. The SMART and SIRE Experimental Retrieval Systems pp.118--155 New York: McGraw-Hill. 1983.  G. Salton and M. J. McGill. The SMART and SIRE Experimental Retrieval Systems pp.118--155 New York: McGraw-Hill. 1983."},{"key":"e_1_2_1_19_1","unstructured":"H. Schauer and U. Hahn Phrases as carriers of coherence relations CogSci 2000---Proceedings of the 22nd Annual Conference of the Cognitive Science Society pp. 429--434. 2000.  H. Schauer and U. Hahn Phrases as carriers of coherence relations CogSci 2000---Proceedings of the 22nd Annual Conference of the Cognitive Science Society pp. 429--434. 2000."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/18.8.1124"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.3115\/974557.974568"},{"key":"e_1_2_1_22_1","unstructured":"S. Teufel. Arugmentative Zoning: Information Extraction from Scientific Text PhD Thesis. University of Edinburgh. 1999.  S. Teufel. Arugmentative Zoning: Information Extraction from Scientific Text PhD Thesis. University of Edinburgh. 1999."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120102762671936"},{"volume-title":"LREC2004","year":"2004","author":"Teufel S.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","unstructured":"V. N. Vapnik. Statistical Learning Theory. Springer. 1998.   V. N. Vapnik. Statistical Learning Theory. Springer. 1998."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-5-155"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0014-5793(01)03293-8"}],"container-title":["ACM SIGKDD Explorations Newsletter"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1089815.1089823","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1089815.1089823","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T16:08:16Z","timestamp":1750262896000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1089815.1089823"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,6]]},"references-count":26,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2005,6]]}},"alternative-id":["10.1145\/1089815.1089823"],"URL":"https:\/\/doi.org\/10.1145\/1089815.1089823","relation":{},"ISSN":["1931-0145","1931-0153"],"issn-type":[{"type":"print","value":"1931-0145"},{"type":"electronic","value":"1931-0153"}],"subject":[],"published":{"date-parts":[[2005,6]]},"assertion":[{"value":"2005-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}