{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T14:04:38Z","timestamp":1753884278106,"version":"3.41.2"},"reference-count":15,"publisher":"World Scientific Pub Co Pte Ltd","issue":"09","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Patt. Recogn. Artif. Intell."],"published-print":{"date-parts":[[2023,7]]},"abstract":"<jats:p> Extracting information from scanned invoices and other commercial documents, a critical component of corporate function, typically requires significant manual processing. Much research has been conducted in the field of automated information extraction and document processing to alleviate the manual resources used for document analysis, but resultant literature and commercially available products have demonstrated limitations in customizability for identifying specific information. In this paper, we propose a customized machine learning-based pipeline for extracting and tabulating relevant key\u2013value pairs from commercial invoice documents. Specifically, the pipeline combines general document understanding, OCR extraction, and key\u2013value matching with custom rules pertaining to a provided invoice dataset. Then, we demonstrate that the pipeline greatly outperforms a commercially available product and can significantly reduce the amount of manual labor required to process invoice documents. Future work will focus on generalizing the pipeline, so as to apply it on more varied datasets. <\/jats:p>","DOI":"10.1142\/s0218001423540137","type":"journal-article","created":{"date-parts":[[2023,6,5]],"date-time":"2023-06-05T03:21:00Z","timestamp":1685935260000},"source":"Crossref","is-referenced-by-count":1,"title":["Customized Information Extraction and Processing Pipeline for Commercial Invoices"],"prefix":"10.1142","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7888-0381","authenticated-orcid":false,"given":"Pierce","family":"Lai","sequence":"first","affiliation":[{"name":"CSAIL, MIT, 32 Vassar St, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abhishek","family":"Mohan","sequence":"additional","affiliation":[{"name":"CSAIL, MIT, 32 Vassar St, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Seok","family":"Kim","sequence":"additional","affiliation":[{"name":"CSAIL, MIT, 32 Vassar St, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jung Soo Victor","family":"Chu","sequence":"additional","affiliation":[{"name":"CSAIL, MIT, 32 Vassar St, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Samuel","family":"Lee","sequence":"additional","affiliation":[{"name":"CSAIL, MIT, 32 Vassar St, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Prabhakar","family":"Kafle","sequence":"additional","affiliation":[{"name":"CSAIL, MIT, 32 Vassar St, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Patrick","family":"Wang","sequence":"additional","affiliation":[{"name":"CSAIL, MIT, 32 Vassar St, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2023,7,17]]},"reference":[{"key":"S0218001423540137BIB005","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1007\/978-3-319-54472-4_61","volume-title":"ACIIDS 2017: Intelligent Information and Database Systems","author":"Chen S.-H.","year":"2017"},{"key":"S0218001423540137BIB006","first-page":"1","volume-title":"2018 Conf. Information and Communication Technology (CICT)","author":"Ghosh R.","year":"2018"},{"issue":"3","key":"S0218001423540137BIB009","doi-asserted-by":"crossref","first-page":"e13331","DOI":"10.2196\/13331","volume":"7","author":"Han J.","year":"2019","journal-title":"JMIR Med. Inf."},{"issue":"1","key":"S0218001423540137BIB010","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1007\/s42001-021-00149-1","volume":"5","author":"Hegghammer T.","year":"2021","journal-title":"J. Comput. Soc. Sci."},{"volume-title":"Towards Data Science","year":"2021","author":"Holomb V.","key":"S0218001423540137BIB011"},{"key":"S0218001423540137BIB012","first-page":"1516","volume-title":"2019 Int. Conf. Document Analysis and Recognition (ICDAR)","author":"Huang Z.","year":"2019"},{"key":"S0218001423540137BIB014","doi-asserted-by":"crossref","first-page":"4614","DOI":"10.1145\/3503161.3547751","volume-title":"Proc. 30th ACM Int. Conf. Multimedia, MM\u201922","author":"Li X.","year":"2022"},{"key":"S0218001423540137BIB016","first-page":"220","volume-title":"2013 12th Int. Conf. Document Analysis and Recognition","author":"Nafchi H. Z.","year":"2013"},{"issue":"6","key":"S0218001423540137BIB017","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3453476","volume":"54","author":"Nguyen T. T. H.","year":"2021","journal-title":"ACM Comput. Surv."},{"key":"S0218001423540137BIB019","doi-asserted-by":"crossref","first-page":"150","DOI":"10.21786\/bbrc\/13.13\/21","volume":"13","author":"Priya K.","year":"2020","journal-title":"Biosci. Biotechnol. Res. Commun."},{"key":"S0218001423540137BIB022","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1007\/s10032-002-0082-8","volume":"5","author":"Schulz K.","year":"2001","journal-title":"Int. J. Doc. Anal. Recogn."},{"issue":"1","key":"S0218001423540137BIB023","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1080\/17439884.2014.921628","volume":"40","author":"Selwyn N.","year":"2015","journal-title":"Learn., Media Technol."},{"key":"S0218001423540137BIB024","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4842-6222-1","volume-title":"Practical Machine Learning with AWS","author":"Singh H.","year":"2021"},{"issue":"3","key":"S0218001423540137BIB025","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1093\/jamia\/ocx132","volume":"25","author":"Soysal E.","year":"2017","journal-title":"J. Am. Med. Informatics Assoc."},{"key":"S0218001423540137BIB028","first-page":"1192","volume-title":"Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, KDD\u201920","author":"Xu Y.","year":"2020"}],"container-title":["International Journal of Pattern Recognition and Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218001423540137","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,16]],"date-time":"2023-08-16T04:27:21Z","timestamp":1692160041000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0218001423540137"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7]]},"references-count":15,"journal-issue":{"issue":"09","published-print":{"date-parts":[[2023,7]]}},"alternative-id":["10.1142\/S0218001423540137"],"URL":"https:\/\/doi.org\/10.1142\/s0218001423540137","relation":{},"ISSN":["0218-0014","1793-6381"],"issn-type":[{"type":"print","value":"0218-0014"},{"type":"electronic","value":"1793-6381"}],"subject":[],"published":{"date-parts":[[2023,7]]},"article-number":"2354013"}}