{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T14:12:48Z","timestamp":1753884768648,"version":"3.41.2"},"reference-count":3,"publisher":"World Scientific Pub Co Pte Ltd","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Patt. Recogn. Artif. Intell."],"published-print":{"date-parts":[[2024,9,30]]},"abstract":"<jats:p> Extracting information from scanned business documents, while a necessary commercial task, continues to be mostly done manually, requiring significant human effort. Current solutions for automated document information extraction still have limited capabilities in regards to user-required customizability and extraction of dataset-specific information, leaving the area as a very active field of research. In this paper, we propose modifications and improvements to our previously developed custom pipeline for extracting and tabulating key-value pairs from commercial invoice documents. Our design changes and additions adapt the pipeline to a wider variety of document types and use cases, primarily through the implementation of dataset-specific configuration files that promote customizability along with new technical modules that address both general and dataset-specific complexities. We compare our pipeline\u2019s performance against current machine learning and commercial solutions on a real-world dataset, and demonstrate that it is able to extract a wider variety of fields while maintaining competitive or greater accuracies compared to the alternate solutions. <\/jats:p>","DOI":"10.1142\/s0218001424590122","type":"journal-article","created":{"date-parts":[[2024,7,5]],"date-time":"2024-07-05T05:55:08Z","timestamp":1720158908000},"source":"Crossref","is-referenced-by-count":0,"title":["Configurable Customized Information Extraction and Processing Pipeline"],"prefix":"10.1142","volume":"38","author":[{"given":"Seok","family":"Kim","sequence":"first","affiliation":[{"name":"Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), 32 Vassar Street, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pierce","family":"Lai","sequence":"additional","affiliation":[{"name":"Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), 32 Vassar Street, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dariyan","family":"Khan","sequence":"additional","affiliation":[{"name":"Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), 32 Vassar Street, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kevin","family":"Zhao","sequence":"additional","affiliation":[{"name":"Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), 32 Vassar Street, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Brian","family":"Le","sequence":"additional","affiliation":[{"name":"Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), 32 Vassar Street, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alex","family":"Luchianov","sequence":"additional","affiliation":[{"name":"Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), 32 Vassar Street, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Margaret","family":"Yu","sequence":"additional","affiliation":[{"name":"Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), 32 Vassar Street, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Patrick","family":"Wang","sequence":"additional","affiliation":[{"name":"Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology (MIT), 32 Vassar Street, Cambridge, MA 02139, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2024,8,22]]},"reference":[{"issue":"8","key":"S0218001424590122BIB013","first-page":"38","volume":"5","author":"Kunduru A. R.","year":"2023","journal-title":"Int. J. Orange Technol."},{"key":"S0218001424590122BIB014","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001423540137"},{"key":"S0218001424590122BIB026","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-99-4752-2_7"}],"container-title":["International Journal of Pattern Recognition and Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218001424590122","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,24]],"date-time":"2024-09-24T03:15:16Z","timestamp":1727147716000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0218001424590122"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,22]]},"references-count":3,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,9,30]]}},"alternative-id":["10.1142\/S0218001424590122"],"URL":"https:\/\/doi.org\/10.1142\/s0218001424590122","relation":{},"ISSN":["0218-0014","1793-6381"],"issn-type":[{"type":"print","value":"0218-0014"},{"type":"electronic","value":"1793-6381"}],"subject":[],"published":{"date-parts":[[2024,8,22]]},"article-number":"2459012"}}