{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T15:35:46Z","timestamp":1778600146291,"version":"3.51.4"},"reference-count":42,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2025,2,15]],"date-time":"2025-02-15T00:00:00Z","timestamp":1739577600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100009100","name":"Universiti Brunei Darussalam","doi-asserted-by":"publisher","award":["UBD\/RSCH\/1.3\/FICBF(b)\/2024\/023"],"award-info":[{"award-number":["UBD\/RSCH\/1.3\/FICBF(b)\/2024\/023"]}],"id":[{"id":"10.13039\/100009100","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>The rapid advancement of technology has led to a sustained accumulation of patent documents globally, as newly filed applications add to an ever-expanding repository of prior art. The need for innovation and progress within the patent system underscores the significance of robust patent investigation, which includes prior art searches. The swift expansion of the patent arena poses challenges for experts employing conventional qualitative practices to handle the increasing quantitative needs. In this study, we propose a novel method to enhance patent prior art search through the integration of advanced natural language processing (NLP) techniques. Our approach leverages the abstract and top terms of patent documents to generate a unique set of labelled databases. This database is then utilized to train Bidirectional Encoder Representations from Transformers (BERT) for patents, enabling domain-specific prior art searches. Testing our method on the Google Public Patent Database yielded an improved F1 score of 0.94 on the testing data. Not only does our method demonstrate superior accuracy compared to baseline approaches, but it also exhibits enhanced computational efficiency. The refined prior art search promises to provide valuable assistance to specialists in their decision-making processes, offering insightful analyses and relevant information that can significantly increase the efficiency and accuracy of their judgments.<\/jats:p>","DOI":"10.3390\/info16020145","type":"journal-article","created":{"date-parts":[[2025,2,17]],"date-time":"2025-02-17T03:41:47Z","timestamp":1739763707000},"page":"145","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Optimizing Patent Prior Art Search: An Approach Using Patent Abstract and Key Terms"],"prefix":"10.3390","volume":"16","author":[{"given":"Amna","family":"Ali","sequence":"first","affiliation":[{"name":"Faculty of Integrated Technologies, Universiti Brunei Darussalam, Bandar Seri Begawan BE1410, Brunei"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammad Ali","family":"Humayun","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Information Technology University of the Punjab, Lahore 54590, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liyanage Chandratilak De","family":"Silva","sequence":"additional","affiliation":[{"name":"School of Digital Science, Universiti Brunei Darussalam, Bandar Seri Begawan BE1410, Brunei"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7006-3838","authenticated-orcid":false,"given":"Pg Emeroylariffion","family":"Abas","sequence":"additional","affiliation":[{"name":"Faculty of Integrated Technologies, Universiti Brunei Darussalam, Bandar Seri Begawan BE1410, Brunei"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,2,15]]},"reference":[{"key":"ref_1","unstructured":"Suzgun, M., Melas-Kyriazi, L., Sarkar, S., Kominers, S.D., and Shieber, S. (2022). The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Roudsari, A.H., Afshar, J., Lee, S., and Lee, W. (2021, January 17\u201320). Comparison and Analysis of Embedding Methods for Patent Documents. Proceedings of the IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju Island, Republic of Korea.","DOI":"10.1109\/BigComp51126.2021.00037"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1007\/s10115-018-1322-7","article-title":"Patent retrieval: A literature review","volume":"61","author":"Shalaby","year":"2019","journal-title":"Knowl. Inf. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"37","DOI":"10.11114\/aef.v8i5.5182","article-title":"Transformer-Based Patent Novelty Search by Training Claims to Their Own Description","volume":"8","author":"Freunek","year":"2021","journal-title":"Appl. Econ. Financ."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"103379","DOI":"10.1016\/j.compind.2020.103379","article-title":"Patent infringement analysis using a text mining technique based on SAO structure","volume":"125","author":"Kim","year":"2021","journal-title":"Comput. Ind."},{"key":"ref_6","first-page":"216","article-title":"Efficiency of Boolean Search strings for Information Retrieval","volume":"6","author":"Muhammad","year":"2017","journal-title":"Am. J. Eng. Res. (AJER)"},{"key":"ref_7","unstructured":"Management, C. (2025, January 09). Boolean Search. Available online: https:\/\/ceopedia.org\/index.php\/Boolean_search."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Helmers, L., Horn, F., Biegler, F., Oppermann, T., and M\u00fcller, K.R. (2019). Automating the search for a patent\u2019s prior art with a full text similarity search. PLoS ONE, 14.","DOI":"10.1371\/journal.pone.0212103"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"7396","DOI":"10.1109\/ACCESS.2022.3141494","article-title":"A fast and scalable algorithm for prior art search","volume":"10","author":"Lee","year":"2022","journal-title":"IEEE Access"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1016\/j.procs.2019.01.036","article-title":"Toward Contextual Information Retrieval: A Review And Trends","volume":"148","author":"Merrouni","year":"2019","journal-title":"DataProcedia Comput. Sci."},{"key":"ref_11","unstructured":"Risch, J., Alder, N., Hewel, C., and Krestel, R. (2020). PatentMatch: A Dataset for Matching Patent Claims & Prior Art. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ganesh, P., Chen, Y., Lou, X., Khan, M., Yang, Y., Chen, D., Winslett, M., Sajjad, H., and Nakov, P. (2020). Compressing Large-Scale Transformer-Based Models: A Case Study on BERT. arXiv.","DOI":"10.1162\/tacl_a_00413"},{"key":"ref_13","unstructured":"Srebrovic, R. (2025, January 09). BERT for Patents. Available online: https:\/\/github.com\/google\/patents-public-data\/blob\/master\/models\/BERT%20for%20Patents.md."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"102021","DOI":"10.1016\/j.wpi.2021.102021","article-title":"Artificial intelligence for patent prior art searching","volume":"64","author":"Setchi","year":"2021","journal-title":"World Pat. Inf."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"117627","DOI":"10.1016\/j.eswa.2022.117627","article-title":"Summarization, Simplification, and Generation: The Case of Patents","volume":"205","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Andersson, L., Lupu, M., Palotti, J., Hanbury, A., and Rauber, A. (2016, January 24\u201328). When is the Time Ripe for Natural Language Processing for Patent Passage Retrieval?. Proceedings of the CIKM \u201916: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, IN, USA.","DOI":"10.1145\/2983323.2983858"},{"key":"ref_17","unstructured":"Hannes Jansson, J.N. (2020). Using Natural Language Processing to Identify Similar Patent Documents, Department of Computer Science, Lund University."},{"key":"ref_18","first-page":"2113","article-title":"Articles Ignoring Information Quality","volume":"89","author":"Freilich","year":"2021","journal-title":"FLR Fordham Law Rev."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Kang, D.M., Lee, C.C., Lee, S., and Lee, W. (2020, January 12\u201318). Patent prior art search using deep learning language model. Proceedings of the IDEAS \u201920: Proceedings of the 24th Symposium on International Database Engineering & Applications, Incheon, Republic of Korea.","DOI":"10.1145\/3410566.3410597"},{"key":"ref_20","first-page":"108","article-title":"Domain-Specific Word Embeddings for Patent Classification","volume":"53","author":"Risch","year":"2019","journal-title":"Data Technol. Appl."},{"key":"ref_21","unstructured":"Pogiatzis, A. (2025, January 09). NLP: Contextualized Word Embeddings from BERT. Retrieved from Towards Data Science. Available online: https:\/\/medium.com\/towards-data-science\/nlp-extract-contextualized-word-embeddings-from-bert-keras-tf-67ef29f60a7b."},{"key":"ref_22","first-page":"5534","article-title":"A Knowledge-Enriched Ensemble Method for Word Embedding and Multi-Sense Embedding","volume":"35","author":"Fang","year":"2022","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"6115","DOI":"10.1007\/s00521-022-07944-5","article-title":"A transformer fine-tuning strategy for text dialect identification","volume":"35","author":"Humayun","year":"2023","journal-title":"Neural Comput. Appl."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"100048","DOI":"10.1016\/j.nlp.2023.100048","article-title":"A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4","volume":"6","author":"Kalyan","year":"2024","journal-title":"Nat. Lang. Process. J."},{"key":"ref_25","unstructured":"Liu, Y., Naman Goyal, M.O., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"101965","DOI":"10.1016\/j.wpi.2020.101965","article-title":"Patent Classification by Fine-Tuning BERT Language Model","volume":"61","author":"Lee","year":"2020","journal-title":"World Pat. Inf."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1007\/s11192-018-2905-5","article-title":"DeepPatent: Patent classification with convolutional neural networks and word embedding","volume":"117","author":"Li","year":"2018","journal-title":"Scientometrics"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Stamatis, V. (2022). End to End Neural Retrieval for Patent Prior Art Search. Proceedings of Advances in Information Retrieval, Springer.","DOI":"10.1007\/978-3-030-99739-7_66"},{"key":"ref_29","unstructured":"RSrebrovic, R., and Yonamine, J. (2025, January 09). Leveraging the BERT Algorithm for Patents with TensorFlow and BigQuery. Available online: https:\/\/services.google.com\/fh\/files\/blogs\/bert_for_patents_white_paper.pdf."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Chikkamath, R., Parmar, V.R., Otiefy, Y., and Endres, M. (2022, January 23\u201325). Patent Classification Using BERT-for-Patents on USPTO. Proceedings of the MLNLP \u201922: Proceedings of the 2022 5th International Conference on Machine Learning and Natural Language Processing, Sanya, China.","DOI":"10.1145\/3578741.3578746"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"102192","DOI":"10.1016\/j.wpi.2023.102192","article-title":"SEARCHFORMER: Semantic patent embeddings by siamese transformers for prior art search","volume":"73","author":"Vowinckel","year":"2023","journal-title":"World Pat. Inf."},{"key":"ref_32","first-page":"446","article-title":"Similarity Matching for Patent Documents Using Ensemble BERT-Related Model and Novel Text Processing Method","volume":"15","author":"Yu","year":"2024","journal-title":"J. Adv. Inf. Technol."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"102282","DOI":"10.1016\/j.wpi.2024.102282","article-title":"A novel re-ranking architecture for patent search","volume":"78","author":"Stamatis","year":"2024","journal-title":"World Pat. Inf."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"102254","DOI":"10.1016\/j.wpi.2023.102254","article-title":"Is your search query well-formed? A natural query understanding for patent prior art search","volume":"76","author":"Chikkamath","year":"2023","journal-title":"World Pat. Inf."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Ali, A., Tufail, A., De Silva, L., and Abas, P.E. (2024). Innovating Patent Retrieval: A Comprehensive Review of Techniques, Trends, and Challenges in Prior Art Searches. Appl. Syst. Innov., 7.","DOI":"10.3390\/asi7050091"},{"key":"ref_36","unstructured":"Google, I.C.P.S. (2025, January 06). Google Patents Public Data, I.C.P.S.a. Google, Editor. Available online: https:\/\/cloud.google.com\/bigquery\/."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1149\/2.F07202IF","article-title":"Looking at Patent Law: Defining the Meaning and Scope of Claims through Claim Construction; Insights from the Lithium Battery Patent Infringement Case","volume":"29","author":"Taylor","year":"2020","journal-title":"Electrochem. Soc. Interface"},{"key":"ref_38","unstructured":"Office, U.I.P. (2025, February 13). Patent Factsheets: Abstract. IPO, Ed. UK, Available online: www.gov.uk\/ipo."},{"key":"ref_39","unstructured":"(2025, January 09). Lens. How to Read a Patent. Available online: https:\/\/support.lens.org\/knowledge-base\/how-to-read-a-patent\/."},{"key":"ref_40","unstructured":"Kumar, R., Tripathi, R.C., and Singh, V. (2015, January 11\u201312). Keyword based search and its limitations in the Patent document to secure the idea from its infringement. Proceedings of the International Conference on Information Security & Privacy (ICISP2015), Nagpur, India."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1016\/j.patrec.2023.02.005","article-title":"Soft precision and recall","volume":"167","year":"2023","journal-title":"Pattern Recognit. Lett."},{"key":"ref_42","unstructured":"(2025, January 09). Minesoft Origin | Advanced AI Patent Search. 1996 Edition. 2024. Last Updated January 2025. Available online: https:\/\/minesoft.com\/."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/2\/145\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:35:21Z","timestamp":1760027721000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/2\/145"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,15]]},"references-count":42,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,2]]}},"alternative-id":["info16020145"],"URL":"https:\/\/doi.org\/10.3390\/info16020145","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,15]]}}}