{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T08:11:40Z","timestamp":1773735100992,"version":"3.50.1"},"reference-count":38,"publisher":"MIT Press - Journals","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computational Linguistics"],"published-print":{"date-parts":[[2013,9]]},"abstract":"<jats:p> With the increasing rate of patent application filings, automated patent classification is of rising economic importance. This article investigates how patent classification can be improved by using different representations of the patent documents. Using the Linguistic Classification System (LCS), we compare the impact of adding statistical phrases (in the form of bigrams) and linguistic phrases (in two different dependency formats) to the standard bag-of-words text representation on a subset of 532,264 English abstracts from the CLEF-IP 2010 corpus. In contrast to previous findings on classification with phrases in the Reuters-21578 data set, for patent classification the addition of phrases results in significant improvements over the unigram baseline. The best results were achieved by combining all four representations, and the second best by combining unigrams and lemmatized bigrams. This article includes extensive analyses of the class models (a.k.a. class profiles) created by the classifiers in the LCS framework, to examine which types of phrases are most informative for patent classification. It appears that bigrams contribute most to improvements in classification accuracy. Similar experiments were performed on subsets of French and German abstracts to investigate the generalizability of these findings. <\/jats:p>","DOI":"10.1162\/coli_a_00149","type":"journal-article","created":{"date-parts":[[2012,11,16]],"date-time":"2012-11-16T14:50:51Z","timestamp":1353077451000},"page":"755-775","source":"Crossref","is-referenced-by-count":45,"title":["Text Representations for Patent Classification"],"prefix":"10.1162","volume":"39","author":[{"given":"Eva","family":"D'hondt","sequence":"first","affiliation":[{"name":"Radboud University Nijmegen"}]},{"given":"Suzan","family":"Verberne","sequence":"additional","affiliation":[{"name":"Radboud University Nijmegen"}]},{"given":"Cornelis","family":"Koster","sequence":"additional","affiliation":[{"name":"Radboud University Nijmegen"}]},{"given":"Lou","family":"Boves","sequence":"additional","affiliation":[{"name":"Radboud University Nijmegen"}]}],"member":"281","reference":[{"key":"R1","doi-asserted-by":"publisher","DOI":"10.1145\/183422.183423"},{"key":"R3","volume-title":"Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2010)","author":"Beney Jean","year":"2010"},{"key":"R4","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-19231-9_12"},{"key":"R5","first-page":"489","volume-title":"Proceedings of Progress in Artificial Intelligence, 14th Portuguese Conference on Artificial Intelligence (EPIA 2009)","author":"Braga Igor","year":"2009"},{"key":"R6","first-page":"78","volume-title":"Text Databases & Document Management.","author":"Caropreso Maria Fernanda","year":"2001"},{"key":"R7","first-page":"59","volume-title":"Proceedings of the 9th Australasian Document Computing Symposium (ADCS)","author":"Crawford Elisabeth","year":"2004"},{"key":"R8","first-page":"55","volume-title":"Proceedings of 2nd Conference on Empirical Methods in NLP","author":"Dagan Ido","year":"1997"},{"key":"R9","doi-asserted-by":"publisher","DOI":"10.3115\/1608858.1608859"},{"key":"R10","volume-title":"Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2010)","author":"Derieux Franck","year":"2010"},{"key":"R11","doi-asserted-by":"publisher","DOI":"10.1145\/288627.288651"},{"key":"R13","doi-asserted-by":"publisher","DOI":"10.1145\/945546.945547"},{"key":"R15","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-48412-4_41"},{"key":"R16","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45268-0_6"},{"key":"R17","volume-title":"Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2010)","author":"Guyot Jacques","year":"2010"},{"key":"R19","first-page":"169","volume-title":"Advances in Kernel Methods.","author":"Joachims Thorsten","year":"1999"},{"key":"R20","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-19231-9_13"},{"key":"R21","first-page":"19","volume-title":"Proceedings Benelearn 2001.","author":"Koster Cornelis","year":"2001"},{"key":"R22","first-page":"546","volume-title":"Perspectives of Systems Informatics: 5th International Andrei Ershov Memorial Conference, volume 2890 of Lecture Notes in Computer Science.","author":"Koster Cornelis","year":"2003"},{"key":"R23","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-36618-0_12"},{"key":"R24","doi-asserted-by":"publisher","DOI":"10.1016\/S0172-2190(02)00026-1"},{"key":"R25","first-page":"87","volume-title":"Working Notes of the Workshop on Learning for Text Categorization, 15th National Conference on AI","author":"Larkey Leah","year":"1998"},{"key":"R26","doi-asserted-by":"publisher","DOI":"10.1145\/313238.313304"},{"key":"R27","doi-asserted-by":"publisher","DOI":"10.1145\/133160.133172"},{"key":"R28","volume-title":"Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC'08)","author":"Mille Simon","year":"2008"},{"key":"R29","first-page":"200","volume-title":"Proceedings of RIAO'97 Computer-Assisted Information Searching on Internet","author":"Mitra Mandar","year":"1997"},{"key":"R30","first-page":"145","volume-title":"Proceedings of the 17th Electrotechnical and Computer Science Conference (ERK98)","author":"Mladenic Dunja","year":"1998"},{"key":"R31","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-24752-4_14"},{"key":"R33","doi-asserted-by":"publisher","DOI":"10.1561\/1500000015"},{"key":"R34","first-page":"195","volume-title":"Proceedings of Tenth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2009)","author":"Ozg\u00fcr Levent","year":"2009"},{"key":"R35","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2010.05.005"},{"key":"R36","doi-asserted-by":"publisher","DOI":"10.1007\/s10044-010-0195-5"},{"key":"R37","doi-asserted-by":"publisher","DOI":"10.1145\/1651343.1651351"},{"key":"R38","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"key":"R39","first-page":"379","volume-title":"Proceedings of the Sixteenth International Conference on Machine Learning (ICML '99)","author":"Scott Sam","year":"1999"},{"key":"R40","doi-asserted-by":"publisher","DOI":"10.1016\/S0172-2190(02)00067-4"},{"key":"R41","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(01)00045-0"},{"key":"R42","volume-title":"Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2011)","author":"Verberne Suzan","year":"2011"},{"key":"R43","volume-title":"Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2010)","author":"Verberne Suzan","year":"2010"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/COLI_a_00149","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:27:25Z","timestamp":1615584445000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/39\/3\/755-775\/1435"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,9]]},"references-count":38,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2013,9]]}},"alternative-id":["10.1162\/COLI_a_00149"],"URL":"https:\/\/doi.org\/10.1162\/coli_a_00149","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,9]]}}}