{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T17:25:21Z","timestamp":1777051521207,"version":"3.51.4"},"reference-count":57,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,12,18]],"date-time":"2021-12-18T00:00:00Z","timestamp":1639785600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,12,18]],"date-time":"2021-12-18T00:00:00Z","timestamp":1639785600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Scientometrics"],"published-print":{"date-parts":[[2022,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Patent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.<\/jats:p>","DOI":"10.1007\/s11192-021-04179-4","type":"journal-article","created":{"date-parts":[[2021,12,18]],"date-time":"2021-12-18T00:03:37Z","timestamp":1639785817000},"page":"207-231","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":71,"title":["PatentNet: multi-label classification of patent documents using deep learning based language understanding"],"prefix":"10.1007","volume":"127","author":[{"given":"Arousha","family":"Haghighian Roudsari","sequence":"first","affiliation":[]},{"given":"Jafar","family":"Afshar","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8586-4577","authenticated-orcid":false,"given":"Wookey","family":"Lee","sequence":"additional","affiliation":[]},{"given":"Suan","family":"Lee","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,12,18]]},"reference":[{"key":"4179_CR1","unstructured":"Abdelgawad, L., Kluegl, P., Genc, E., Falkner, S., & Hutter, F. (2019). Optimizing neural networks for patent classification. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 688\u2013703). Springer."},{"key":"4179_CR2","doi-asserted-by":"crossref","unstructured":"Al\u00a0Shamsi, F., & Aung, Z. (2016). Automatic patent classification by a three-phase model with document frequency matrix and boosted tree. In 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA) (pp. 1\u20134). IEEE.","DOI":"10.1109\/ICEDSA.2016.7818566"},{"key":"4179_CR3","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","volume":"5","author":"P Bojanowski","year":"2017","unstructured":"Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135\u2013146.","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"4179_CR4","unstructured":"Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et\u00a0al. (2020). Language models are few-shot learners. arXiv preprint arXiv:200514165"},{"key":"4179_CR5","doi-asserted-by":"crossref","unstructured":"Caruana, R., Lawrence, S., & Giles, C. L. (2001). Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems (pp. 402\u2013408).","DOI":"10.1109\/IJCNN.2000.857823"},{"key":"4179_CR6","volume-title":"Multilabel classification: Problem analysis, metrics and techniques","author":"F Charte","year":"2016","unstructured":"Charte, F., del Jesus, M. J., & Rivera, A. J. (2016). Multilabel classification: Problem analysis, metrics and techniques. Berlin: Springer."},{"issue":"3","key":"4179_CR7","doi-asserted-by":"publisher","first-page":"2091","DOI":"10.1007\/s11192-020-03666-4","volume":"125","author":"J Chen","year":"2020","unstructured":"Chen, J., Chen, J., Zhao, S., Zhang, Y., & Tang, J. (2020). Exploiting word embedding for heterogeneous topic model towards patent recommendation. Scientometrics, 125(3), 2091\u20132108.","journal-title":"Scientometrics"},{"issue":"1","key":"4179_CR8","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1007\/s11192-020-03634-y","volume":"125","author":"L Chen","year":"2020","unstructured":"Chen, L., Xu, S., Zhu, L., Zhang, J., Lei, X., & Yang, G. (2020). A deep learning based method for extracting semantic information from patent documents. Scientometrics, 125(1), 289\u2013312.","journal-title":"Scientometrics"},{"key":"4179_CR9","unstructured":"Chollet, F., et\u00a0al. (2015). Keras. https:\/\/github.com\/fchollet\/keras."},{"key":"4179_CR10","unstructured":"Clark, K., Luong, M. T., Le, Q. V., & Manning, C. D. (2020). Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:200310555."},{"key":"4179_CR11","doi-asserted-by":"crossref","unstructured":"Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:190102860.","DOI":"10.18653\/v1\/P19-1285"},{"key":"4179_CR12","unstructured":"Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805."},{"key":"4179_CR13","doi-asserted-by":"crossref","unstructured":"D\u2019hondt, E., & Verberne, S. (2010). Clef-ip 2010: Prior art retrieval using the different sections in patent documents.","DOI":"10.1007\/978-3-642-15754-7_60"},{"key":"4179_CR14","doi-asserted-by":"crossref","unstructured":"D\u2019hondt, E., Verberne, S., Koster, C., & Boves, L. (2013). Text representations for patent classification. Computational Linguistics 39(3), 755\u2013775.","DOI":"10.1162\/COLI_a_00149"},{"key":"4179_CR15","doi-asserted-by":"crossref","unstructured":"D\u2019hondt, E., Verberne, S., Oostdijk, N., & Boves, L. (2017). Patent classification on subgroup level using balanced winnow. In CCPIR (pp. 299\u2013324). Springer.","DOI":"10.1007\/978-3-662-53817-3_11"},{"key":"4179_CR16","doi-asserted-by":"crossref","unstructured":"Fall, C. J., T\u00f6rcsv\u00e1ri, A., Benzineb, K., & Karetka, G. (2003). Automated categorization in the international patent classification. In Acm Sigir Forum, ACM New York, NY, USA (Vol. 37, pp. 10\u201325).","DOI":"10.1145\/945546.945547"},{"issue":"6","key":"4179_CR17","first-page":"411","volume":"4","author":"E Gibaja","year":"2014","unstructured":"Gibaja, E., & Ventura, S. (2014). Multi-label learning: a review of the state of the art and ongoing research. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(6), 411\u2013444.","journal-title":"Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery"},{"issue":"3","key":"4179_CR18","doi-asserted-by":"publisher","first-page":"1239","DOI":"10.1007\/s11192-019-03246-1","volume":"121","author":"JC Gomez","year":"2019","unstructured":"Gomez, J. C. (2019). Analysis of the effect of data properties in automated patent classification. Scientometrics, 121(3), 1239\u20131268.","journal-title":"Scientometrics"},{"key":"4179_CR19","doi-asserted-by":"crossref","unstructured":"Gomez, J. C., & Moens, M. F. (2014). A survey of automated hierarchical classification of patents. In PSMW (pp. 215\u2013249). Springer.","DOI":"10.1007\/978-3-319-12511-4_11"},{"key":"4179_CR20","doi-asserted-by":"crossref","unstructured":"Grawe, M. F., Martins, C. A., & Bonfante, A. G. (2017). Automated patent classification using word embedding. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 408\u2013411). IEEE.","DOI":"10.1109\/ICMLA.2017.0-127"},{"issue":"1","key":"4179_CR21","doi-asserted-by":"publisher","first-page":"219","DOI":"10.3390\/su10010219","volume":"10","author":"J Hu","year":"2018","unstructured":"Hu, J., Li, S., Hu, J., & Yang, G. (2018). A hierarchical feature extraction model for multi-label mechanical patent classification. Sustainability, 10(1), 219.","journal-title":"Sustainability"},{"issue":"2","key":"4179_CR22","doi-asserted-by":"publisher","first-page":"104","DOI":"10.3390\/e20020104","volume":"20","author":"J Hu","year":"2018","unstructured":"Hu, J., Li, S., Yao, Y., Yu, L., Yang, G., & Hu, J. (2018). Patent keyword extraction algorithm based on distributed representation for patent classification. Entropy, 20(2), 104.","journal-title":"Entropy"},{"key":"4179_CR23","unstructured":"Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980."},{"key":"4179_CR24","doi-asserted-by":"crossref","unstructured":"Kudo, T., & Richardson, J. (2018). Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:180806226.","DOI":"10.18653\/v1\/D18-2012"},{"key":"4179_CR25","doi-asserted-by":"crossref","unstructured":"Lee, J. S., & Hsiang, J. (2019). Patentbert: Patent classification with fine-tuning a pre-trained bert model. arXiv preprint arXiv:190602124.","DOI":"10.1016\/j.wpi.2020.101965"},{"issue":"2","key":"4179_CR26","doi-asserted-by":"publisher","first-page":"721","DOI":"10.1007\/s11192-018-2905-5","volume":"117","author":"S Li","year":"2018","unstructured":"Li, S., Hu, J., Cui, Y., & Hu, J. (2018). Deeppatent: patent classification with convolutional neural networks and word embedding. Scientometrics, 117(2), 721\u2013744.","journal-title":"Scientometrics"},{"key":"4179_CR27","doi-asserted-by":"crossref","unstructured":"Liu, J., Chang, W. C., Wu, Y., & Yang, Y. (2017). Deep learning for extreme multi-label text classification. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 115\u2013124).","DOI":"10.1145\/3077136.3080834"},{"key":"4179_CR28","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:190711692."},{"issue":"1","key":"4179_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/1500000027","volume":"7","author":"M Lupu","year":"2013","unstructured":"Lupu, M., & Hanbury, A. (2013). Patent retrieval. Foundations and Trends in Information Retrieval, 7(1), 1\u201397.","journal-title":"Foundations and Trends in Information Retrieval"},{"key":"4179_CR30","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-53817-3","volume-title":"Current challenges in patent information retrieval","author":"M Lupu","year":"2017","unstructured":"Lupu, M., Mayer, K., Kando, N., & Trippe, A. J. (2017). Current challenges in patent information retrieval (Vol. 37). Berlin: Springer."},{"key":"4179_CR31","unstructured":"Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781."},{"key":"4179_CR32","doi-asserted-by":"crossref","unstructured":"Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2020). Deep learning based text classification: A comprehensive review. arXiv preprint arXiv:200403705.","DOI":"10.1145\/3439726"},{"key":"4179_CR33","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532\u20131543).","DOI":"10.3115\/v1\/D14-1162"},{"key":"4179_CR34","unstructured":"Piroi, F., Lupu, M., Hanbury, A., & Zenz, V. (2011). Clef-ip 2011: Retrieval in the intellectual property domain. In  CLEF (notebook papers\/labs\/workshop)."},{"key":"4179_CR35","unstructured":"Rajapakse, T. (2019). Simple transformers. https:\/\/github.com\/ThilinaRajapakse\/simpletransformers."},{"key":"4179_CR36","unstructured":"\u0158eh\u016f\u0159ek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA, Valletta, Malta (pp. 45\u201350), http:\/\/is.muni.cz\/publication\/884893\/en."},{"key":"4179_CR37","doi-asserted-by":"crossref","unstructured":"Risch, J., & Krestel, R. (2019). Domain-specific word embeddings for patent classification. Data Technologies and Applications .","DOI":"10.1108\/DTA-01-2019-0002"},{"key":"4179_CR39","doi-asserted-by":"publisher","unstructured":"Roudsari, A. H., Afshar, J., Lee, S., & Lee, W. (2021). Comparison and analysis of embedding methods for patent documents. In 2021 IEEE International Conference on Big Data and Smart Computing (BigComp) (pp. 152\u2013155). https:\/\/doi.org\/10.1109\/BigComp51126.2021.00037.","DOI":"10.1109\/BigComp51126.2021.00037"},{"key":"4179_CR40","doi-asserted-by":"crossref","unstructured":"Schuster, M., & Nakajima, K. (2012). Japanese and korean voice search. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5149\u20135152). IEEE.","DOI":"10.1109\/ICASSP.2012.6289079"},{"key":"4179_CR41","doi-asserted-by":"crossref","unstructured":"Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:150807909.","DOI":"10.18653\/v1\/P16-1162"},{"key":"4179_CR42","doi-asserted-by":"crossref","unstructured":"Shalaby, M., Stutzki, J., Schubert, M., & G\u00fcnnemann, S. (2018). An lstm approach to patent classification based on fixed hierarchy vectors. In SIAM (pp. 495\u2013503). SIAM.","DOI":"10.1137\/1.9781611975321.56"},{"key":"4179_CR43","doi-asserted-by":"crossref","unstructured":"Shalaby, W., & Zadrozny, W. (2019). Patent retrieval: a literature review. Knowledge and Information Systems, 1\u201330.","DOI":"10.1007\/s10115-018-1322-7"},{"key":"4179_CR44","doi-asserted-by":"publisher","first-page":"101603","DOI":"10.1016\/j.datak.2017.07.006","volume":"123","author":"JJ Song","year":"2019","unstructured":"Song, J. J., Lee, W., & Afshar, J. (2019). An effective high recall retrieval method. Data & Knowledge Engineering, 123, 101603.","journal-title":"Data & Knowledge Engineering"},{"key":"4179_CR45","doi-asserted-by":"crossref","unstructured":"Souza, C. M., Meireles, M. R., & Almeida, P. E. (2020). A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset. Scientometrics, 1\u201322.","DOI":"10.1007\/s11192-020-03732-x"},{"key":"4179_CR46","unstructured":"Srebrovic, R., & Yonamine, J. (2020). Leveraging the bert algorithm for patents with tensorflow and bigquery [white paper]. https:\/\/services.google.com\/fh\/files\/blogs\/bert_for_patents_white_paper.pdf."},{"issue":"1","key":"4179_CR47","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929\u20131958.","journal-title":"The Journal of Machine Learning Research"},{"key":"4179_CR48","doi-asserted-by":"crossref","unstructured":"Tsoumakas, G., Katakis, I., & Vlahavas, I. (2009). Mining multi-label data. In Data mining and knowledge discovery handbook (pp. 667\u2013685). Springer.","DOI":"10.1007\/978-0-387-09823-4_34"},{"key":"4179_CR49","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:170603762."},{"key":"4179_CR50","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., et\u00a0al. (2019) Huggingface\u2019s transformers: State-of-the-art natural language processing. arXiv:191003771.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"issue":"4","key":"4179_CR51","doi-asserted-by":"publisher","first-page":"1164","DOI":"10.1016\/j.asoc.2009.11.033","volume":"10","author":"CH Wu","year":"2010","unstructured":"Wu, C. H., Ken, Y., & Huang, T. (2010). Patent classification system using a new hybrid genetic algorithm support vector machine. Applied Soft Computing, 10(4), 1164\u20131177.","journal-title":"Applied Soft Computing"},{"key":"4179_CR52","doi-asserted-by":"publisher","first-page":"305","DOI":"10.1016\/j.asoc.2016.01.020","volume":"41","author":"JL Wu","year":"2016","unstructured":"Wu, J. L., Chang, P. C., Tsao, C. C., & Fan, C. Y. (2016). A patent quality analysis and classification system using self-organizing maps with support vector machine. Applied Soft Computing, 41, 305\u2013316.","journal-title":"Applied Soft Computing"},{"key":"4179_CR53","unstructured":"Wu, X. Z., & Zhou, Z. H. (2017). A unified view of multi-label performance measures. In International Conference on Machine Learning, PMLR (pp. 3780\u20133788)."},{"key":"4179_CR54","doi-asserted-by":"crossref","unstructured":"Yang, B., Sun, J. T., Wang, T., & Chen, Z. (2009). Effective multi-label active learning for text classification. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 917\u2013926).","DOI":"10.1145\/1557019.1557119"},{"key":"4179_CR55","unstructured":"Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:190608237."},{"issue":"3","key":"4179_CR56","first-page":"55","volume":"13","author":"T Young","year":"2018","unstructured":"Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. CIM, 13(3), 55\u201375.","journal-title":"CIM"},{"key":"4179_CR57","doi-asserted-by":"publisher","first-page":"106636","DOI":"10.1016\/j.cie.2020.106636","volume":"147","author":"J Yun","year":"2020","unstructured":"Yun, J., & Geum, Y. (2020). Automated classification of patents: A topic modeling approach. Computers & Industrial Engineering, 147, 106636.","journal-title":"Computers & Industrial Engineering"},{"issue":"2","key":"4179_CR58","first-page":"1","volume":"16","author":"L Zhang","year":"2015","unstructured":"Zhang, L., Li, L., & Li, T. (2015). Patent mining: a survey. SIGKDD Explorations, 16(2), 1\u201319.","journal-title":"Patent mining: a survey. SIGKDD Explorations"}],"container-title":["Scientometrics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11192-021-04179-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11192-021-04179-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11192-021-04179-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,1,21]],"date-time":"2022-01-21T16:33:35Z","timestamp":1642782815000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11192-021-04179-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,18]]},"references-count":57,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,1]]}},"alternative-id":["4179"],"URL":"https:\/\/doi.org\/10.1007\/s11192-021-04179-4","relation":{},"ISSN":["0138-9130","1588-2861"],"issn-type":[{"value":"0138-9130","type":"print"},{"value":"1588-2861","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,18]]},"assertion":[{"value":"12 November 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 October 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 December 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}