{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T23:16:53Z","timestamp":1777591013980,"version":"3.51.4"},"reference-count":90,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,1,16]],"date-time":"2023-01-16T00:00:00Z","timestamp":1673827200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"U.S. Dept of Energy through a subcontract from Oak Ridge National Laboratory","award":["4000175929"],"award-info":[{"award-number":["4000175929"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Manage. Inf. Syst."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"<jats:p>There is an urgent need in many critical infrastructure sectors, including the energy sector, for attaining detailed insights into cybersecurity features and compliance with cybersecurity requirements related to their Operational Technology (OT) deployments. Frequent feature changes of OT devices interfere with this need, posing a great risk to customers. One effective way to address this challenge is via a semi-automated cyber-physical security assurance approach, which enables verification and validation of the OT device cybersecurity claims against actual capabilities, both pre- and post-deployment. To realize this approach, this article presents new methodology and algorithms to automatically identify cybersecurity-related claims expressed in natural language form in ICS device documents. We developed an identification process that employs natural language processing (NLP) techniques with the goal of semi-automated vetting of detected claims against their device implementation. We also present our novel NLP components for verifying feature claims against relevant cybersecurity requirements. The verification pipeline includes components such as automated vendor identification, device document curation, feature claim identification utilizing sentiment analysis for conflict resolution, and reporting of features that are claimed to be supported or indicated as unsupported. Our novel matching engine represents the first automated information system available in the cybersecurity domain that directly aids the generation of ICS compliance reports.<\/jats:p>","DOI":"10.1145\/3546580","type":"journal-article","created":{"date-parts":[[2022,7,14]],"date-time":"2022-07-14T11:16:01Z","timestamp":1657797361000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Design of a Novel Information System for Semi-automated Management of Cybersecurity in Industrial Control Systems"],"prefix":"10.1145","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2879-1871","authenticated-orcid":false,"given":"Kimia","family":"Ameri","sequence":"first","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, University of Nebraska-Lincoln, Omaha NE, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7091-8349","authenticated-orcid":false,"given":"Michael","family":"Hempel","sequence":"additional","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, University of Nebraska-Lincoln, Omaha NE, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6229-2043","authenticated-orcid":false,"given":"Hamid","family":"Sharif","sequence":"additional","affiliation":[{"name":"Department of Electrical &amp; Computer Engineering, University of Nebraska-Lincoln, Omaha NE, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5083-8627","authenticated-orcid":false,"suffix":"Jr.","given":"Juan","family":"Lopez","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge TN, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7458-0832","authenticated-orcid":false,"given":"Kalyan","family":"Perumalla","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge TN, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,1,16]]},"reference":[{"issue":"4","key":"e_1_3_1_2_2","doi-asserted-by":"crossref","first-page":"615","DOI":"10.3390\/jcp1040031","article-title":"CyBERT: Cybersecurity claim classification by fine-tuning the BERT language model","volume":"1","author":"Ameri Kimia","year":"2021","unstructured":"Kimia Ameri, Michael Hempel, Hamid Sharif, Juan Lopez Jr., and Kalyan Perumalla. 2021. CyBERT: Cybersecurity claim classification by fine-tuning the BERT language model. J. Cybersecur. Privacy 1, 4 (2021), 615\u2013637.","journal-title":"J. Cybersecur. Privacy"},{"key":"e_1_3_1_3_2","article-title":"Smart semi-supervised accumulation of large repositories for industrial control systems device information","author":"Ameri Kimia","year":"2021","unstructured":"Kimia Ameri, Michael Hempel, Hamid Sharif, Juan Lopez Jr., and Kalyan Perumalla. 2021. Smart semi-supervised accumulation of large repositories for industrial control systems device information. In Proceedings of the 16th International Conference on Cyber Warfare and Security. Academic Conferences Limited, 1.","journal-title":"Proceedings of the 16th International Conference on Cyber Warfare and Security"},{"issue":"4","key":"e_1_3_1_4_2","first-page":"685","article-title":"Gather-narrow-extract: A framework for studying local policy variation using web-scraping and natural language processing","volume":"12","author":"Anglin Kylie L.","year":"2019","unstructured":"Kylie L. Anglin. 2019. Gather-narrow-extract: A framework for studying local policy variation using web-scraping and natural language processing. J. Res. Edu. Effect. 12, 4 (2019), 685\u2013706.","journal-title":"J. Res. Edu. Effect."},{"key":"e_1_3_1_5_2","unstructured":"Dogu Araci. 2019. Finbert: Financial sentiment analysis with pre-trained language models. Retrieved from https:\/\/arXiv:1908.10063."},{"key":"e_1_3_1_6_2","first-page":"1","volume-title":"Proceedings of the Digital Image Computing: Techniques and Applications (DICTA\u201918)","author":"Arif Saman","year":"2018","unstructured":"Saman Arif and Faisal Shafait. 2018. Table detection in document images using foreground and background features. In Proceedings of the Digital Image Computing: Techniques and Applications (DICTA\u201918). IEEE, 1\u20138."},{"key":"e_1_3_1_7_2","unstructured":"Manuel Aristar\u00e1n and Mike Tigas. 2013. Introducing Tabula - Features - Source: An OpenNews project. Retrieved December 2 2014 from https:\/\/source.opennews.org\/en-US\/articles\/introducing-tabula\/."},{"key":"e_1_3_1_8_2","first-page":"74","volume-title":"Proceedings of the 1st Workshop on Financial Technology and Natural Language Processing","author":"Azzi Abderrahim Ait","year":"2019","unstructured":"Abderrahim Ait Azzi, Houda Bouamor, and Sira Ferradans. 2019. The finsbd-2019 shared task: Sentence boundary detection in pdf noisy text in the financial domain. In Proceedings of the 1st Workshop on Financial Technology and Natural Language Processing. 74\u201380."},{"key":"e_1_3_1_9_2","volume-title":"Adobe Acrobat 6: The Professional User\u2019s Guide","author":"Baker Donna L.","year":"2008","unstructured":"Donna L. Baker and Tom Carson. 2008. Adobe Acrobat 6: The Professional User\u2019s Guide. Apress."},{"key":"e_1_3_1_10_2","doi-asserted-by":"crossref","unstructured":"Iz Beltagy Kyle Lo and Arman Cohan. 2019. SciBERT: A pretrained language model for scientific text. Retrieved from https:\/\/arXiv:1903.10676.","DOI":"10.18653\/v1\/D19-1371"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944937"},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1007\/978-3-030-57058-3_15","volume-title":"Proceedings of the International Workshop on Document Analysis Systems","author":"Casado-Garc\u00eda \u00c1ngela","year":"2020","unstructured":"\u00c1ngela Casado-Garc\u00eda, C\u00e9sar Dom\u00ednguez, J\u00f3nathan Heras, Eloy Mata, and Vico Pascual. 2020. The benefits of close-domain fine-tuning for table detection in document images. In Proceedings of the International Workshop on Document Analysis Systems. Springer, 199\u2013215."},{"key":"e_1_3_1_13_2","volume-title":"Python API Development Fundamentals: Develop a full-stack Web Application with Python and Flask","author":"Chan Jack","year":"2019","unstructured":"Jack Chan, Ray Chung, and Jack Huang. 2019. Python API Development Fundamentals: Develop a full-stack Web Application with Python and Flask. Packt Publishing."},{"key":"e_1_3_1_14_2","doi-asserted-by":"crossref","first-page":"853","DOI":"10.1007\/978-981-15-0947-6_81","volume-title":"Embedded Systems and Artificial Intelligence","author":"Chandrika G. Naga","year":"2020","unstructured":"G. Naga Chandrika, Somula Ramasubbareddy, K. Govinda, and E. Swetha. 2020. Web scraping for unstructured data over web. In Embedded Systems and Artificial Intelligence. Springer, 853\u2013859."},{"key":"e_1_3_1_15_2","first-page":"236","volume-title":"Proceedings of the 4th International Conference on Software Engineering and Information Management","author":"Chen YuXuan","year":"2021","unstructured":"YuXuan Chen, Jianwei Ding, Dashuang Li, and Zhouguo Chen. 2021. Joint BERT model based cybersecurity named entity recognition. In Proceedings of the 4th International Conference on Software Engineering and Information Management. 236\u2013242."},{"key":"e_1_3_1_16_2","first-page":"1130","volume-title":"Proceedings of the 43rd International Convention on Information, Communication and Electronic Technology (MIPRO\u201920)","author":"Cherepanov Igor","year":"2020","unstructured":"Igor Cherepanov, Andrey Mikhailov, Alexey Shigarov, and Viacheslav Paramonov. 2020. On automated workflow for fine-tuning deepneural network models for table detection in document images. In Proceedings of the 43rd International Convention on Information, Communication and Electronic Technology (MIPRO\u201920). IEEE, 1130\u20131133."},{"key":"e_1_3_1_17_2","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1145\/3311790.3396647","volume-title":"Practice and Experience in Advanced Research Computing","author":"Cleveland Sean B.","year":"2020","unstructured":"Sean B. Cleveland, Anagha Jamthe, Smruti Padhy, Joe Stubbs, Michale Packard, Julia Looney, Steve Terry, Richard Cardone, Maytal Dahan, and Gwen A. Jacobs. 2020. Tapis API development with Python: Best practices in scientific REST API implementation: experience implementing a distributed stream API. In Practice and Experience in Advanced Research Computing. 181\u2013187."},{"key":"e_1_3_1_18_2","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1145\/3085228.3085278","volume-title":"Proceedings of the 18th Annual International Conference on Digital Government Research","author":"Corr\u00eaa Andreiwid Sheffer","year":"2017","unstructured":"Andreiwid Sheffer Corr\u00eaa and P\u00e4r-Ola Zander. 2017. Unleashing tabular content to open data: A survey on PDF table extraction methods and tools. In Proceedings of the 18th Annual International Conference on Digital Government Research. 54\u201363."},{"key":"e_1_3_1_19_2","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1007\/11669487_12","volume-title":"Proceedings of the International Workshop on Document Analysis Systems","author":"D\u00e9jean Herv\u00e9","year":"2006","unstructured":"Herv\u00e9 D\u00e9jean and Jean-Luc Meunier. 2006. A system for converting PDF documents into structured XML format. In Proceedings of the International Workshop on Document Analysis Systems. Springer, 129\u2013140."},{"key":"e_1_3_1_20_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. Retrieved from https:\/\/arXiv:1810.04805."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.5555\/932295"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1186\/s42400-021-00072-y"},{"key":"e_1_3_1_23_2","first-page":"1510","volume-title":"Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201919)","author":"Gao Liangcai","year":"2019","unstructured":"Liangcai Gao, Yilun Huang, Herv\u00e9 D\u00e9jean, Jean-Luc Meunier, Qinqin Yan, Yu Fang, Florian Kleber, and Eva Lang. 2019. Icdar 2019 competition on table detection and recognition (ctdar). In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201919). IEEE, 1510\u20131515."},{"key":"e_1_3_1_24_2","first-page":"553","volume-title":"Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR\u201917)","volume":"1","author":"Gao Liangcai","year":"2017","unstructured":"Liangcai Gao, Xiaohan Yi, Yuan Liao, Zhuoren Jiang, Zuoyu Yan, and Zhi Tang. 2017. A deep learning-based formula detection method for PDF documents. In Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR\u201917), Vol. 1. IEEE, 553\u2013558."},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1007\/978-3-030-44041-1_39","volume-title":"Proceedings of the International Conference on Advanced Information Networking and Applications","author":"Ghasiya Piyush","year":"2020","unstructured":"Piyush Ghasiya and Koji Okamura. 2020. Comparative analysis of Japan and the US cybersecurity related newspaper articles: A content and sentiment analysis approach. In Proceedings of the International Conference on Advanced Information Networking and Applications. Springer, 431\u2013443."},{"key":"e_1_3_1_26_2","first-page":"771","volume-title":"Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR\u201917)","volume":"1","author":"Gilani Azka","year":"2017","unstructured":"Azka Gilani, Shah Rukh Qasim, Imran Malik, and Faisal Shafait. 2017. Table detection using deep learning. In Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR\u201917), Vol. 1. IEEE, 771\u2013776."},{"key":"e_1_3_1_27_2","first-page":"45","volume-title":"Proceedings of the ACM Symposium on Document Engineering","author":"G\u00f6bel Max","year":"2012","unstructured":"Max G\u00f6bel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. 2012. A methodology for evaluating algorithms for table understanding in PDF documents. In Proceedings of the ACM Symposium on Document Engineering. 45\u201348."},{"key":"e_1_3_1_28_2","first-page":"1449","volume-title":"Proceedings of the 12th International Conference on Document Analysis and Recognition","author":"G\u00f6bel Max","year":"2013","unstructured":"Max G\u00f6bel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. 2013. ICDAR 2013 table competition. In Proceedings of the 12th International Conference on Document Analysis and Recognition. IEEE, 1449\u20131453."},{"key":"e_1_3_1_29_2","article-title":"Twitter sentiment analysis: An examination of cybersecurity attitudes and behavior","volume":"3","author":"Gupta Babita","year":"2016","unstructured":"Babita Gupta, Shwadhin Sharma, and Anitha Chennamaneni. 2016. Twitter sentiment analysis: An examination of cybersecurity attitudes and behavior. Proceedings of the Pre-ICIS SIGDSA\/IFIP WG8 3 (2016).","journal-title":"Proceedings of the Pre-ICIS SIGDSA\/IFIP WG8"},{"key":"e_1_3_1_30_2","first-page":"175","volume-title":"Proceedings of the International Conference on Informatics and Computing (ICIC\u201916)","author":"Haekal Muhamad","year":"2016","unstructured":"Muhamad Haekal et\u00a0al. 2016. Token-based authentication using JSON web token on SIKASIR RESTful web service. In Proceedings of the International Conference on Informatics and Computing (ICIC\u201916). IEEE, 175\u2013179."},{"key":"e_1_3_1_31_2","first-page":"287","volume-title":"Proceedings of the 12th IAPR Workshop on Document Analysis Systems (DAS\u201916)","author":"Hao Leipeng","year":"2016","unstructured":"Leipeng Hao, Liangcai Gao, Xiaohan Yi, and Zhi Tang. 2016. A table detection method for PDF documents based on convolutional neural networks. In Proceedings of the 12th IAPR Workshop on Document Analysis Systems (DAS\u201916). IEEE, 287\u2013292."},{"key":"e_1_3_1_32_2","unstructured":"S. Hoffstaetter M. Lee J. Bochi and L. Kistner. Python-docx. A Python library for creating and updating Microsoft Word (.docx) files\u2014python-docx 0.8.5 documentation. Retrieved from https:\/\/python-docx.readthedocs.io\/en\/latest\/."},{"key":"e_1_3_1_33_2","unstructured":"S. Hoffstaetter M. Lee J. Bochi and L. Kistner. 2014. pytesseract. https:\/\/pypi.org\/project\/pytesseract\/. (2014). https:\/\/pypi.org\/project\/pytesseract\/."},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. Retrieved from https:\/\/arXiv:1801.06146.","DOI":"10.18653\/v1\/P18-1031"},{"key":"e_1_3_1_35_2","first-page":"813","volume-title":"Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201919)","author":"Huang Yilun","year":"2019","unstructured":"Yilun Huang, Qinqin Yan, Yibo Li, Yifan Chen, Xiong Wang, Liangcai Gao, and Zhi Tang. 2019. A YOLO-based table detection method. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201919). IEEE, 813\u2013818."},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1609\/icwsm.v8i1.14550"},{"key":"e_1_3_1_37_2","doi-asserted-by":"crossref","unstructured":"M. Jones J. Bradles and N. Sakimura. May 2015. RFC 7519: JSON web token (JWT). Tech. Rep. Internet Engineering Task Force. Retrieved from http:\/\/www.ietf.org\/rfc\/rfc7515.txt.","DOI":"10.17487\/RFC7519"},{"key":"e_1_3_1_38_2","unstructured":"M. Jones J. Padilla and J. Lindsay. pyJWT: A Python implementation of RFC 7519. Retrieved from https:\/\/pypi.python.org\/pypi\/pyjwt."},{"key":"e_1_3_1_39_2","unstructured":"M. Jones J. Padilla and J. Lindsay. 2016. PyMuPDF. Retrieved from https:\/\/github.com\/pymupdf\/PyMuPDF."},{"issue":"19","key":"e_1_3_1_40_2","doi-asserted-by":"crossref","first-page":"4062","DOI":"10.3390\/app9194062","article-title":"exbake: Automatic fake news detection model based on bidirectional encoder representations from transformers (bert)","volume":"9","author":"Jwa Heejung","year":"2019","unstructured":"Heejung Jwa, Dongsuk Oh, Kinam Park, Jang Mook Kang, and Heuiseok Lim. 2019. exbake: Automatic fake news detection model based on bidirectional encoder representations from transformers (bert). Appl. Sci. 9, 19 (2019), 4062.","journal-title":"Appl. Sci."},{"key":"e_1_3_1_41_2","article-title":"Expanding the orthologous matrix (OMA) programmatic interfaces: REST API and the OmaDB packages for R and Python","volume":"8","author":"Kaleb Klara","year":"2019","unstructured":"Klara Kaleb, Alex Warwick Vesztrocy, Adrian Altenhoff, and Christophe Dessimoz. 2019. Expanding the orthologous matrix (OMA) programmatic interfaces: REST API and the OmaDB packages for R and Python. F1000Res. 8 (2019).","journal-title":"F1000Res."},{"key":"e_1_3_1_42_2","first-page":"1366","volume-title":"Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201919)","author":"Khan Saqib Ali","year":"2019","unstructured":"Saqib Ali Khan, Syed Muhammad Daniyal Khalid, Muhammad Ali Shahzad, and Faisal Shafait. 2019. Table structure extraction with bi-directional gated recurrent unit networks. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201919). IEEE, 1366\u20131371."},{"key":"e_1_3_1_43_2","doi-asserted-by":"crossref","unstructured":"Vivek Khetan Roshni Ramnani Mayuresh Anand Shubhashis Sengupta and Andrew E. Fano. 2020. Causal BERT: Language models for causality detection between events expressed in text. Retrieved from https:\/\/arXiv:2012.05453.","DOI":"10.1007\/978-3-030-80119-9_64"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.3390\/app9112347"},{"issue":"18","key":"e_1_3_1_45_2","article-title":"Web mining of firm websites: A framework for web scraping and a pilot study for Germany","author":"Kinne Jan","year":"2018","unstructured":"Jan Kinne and Janna Axenbeck. 2018. Web mining of firm websites: A framework for web scraping and a pilot study for Germany. ZEW-Centre for European Economic Research Discussion Paper18-033 (2018).","journal-title":"ZEW-Centre for European Economic Research Discussion Paper"},{"key":"e_1_3_1_46_2","volume-title":"Cybersecurity\u2014What\u2019s Language got to do with it?","author":"Klavans Judith L.","year":"2015","unstructured":"Judith L. Klavans. 2015. Cybersecurity\u2014What\u2019s Language got to do with it?Technical Report."},{"issue":"1","key":"e_1_3_1_47_2","doi-asserted-by":"crossref","first-page":"169","DOI":"10.2308\/jeta-52063","article-title":"Research note: Scraping financial data from the web using the R language","volume":"15","author":"Krotov Vlad","year":"2018","unstructured":"Vlad Krotov and Matthew Tennyson. 2018. Research note: Scraping financial data from the web using the R language. J. Emerg. Technol. Account. 15, 1 (2018), 169\u2013181.","journal-title":"J. Emerg. Technol. Account."},{"key":"e_1_3_1_48_2","first-page":"1","article-title":"Deep-learning and graph-based approach to table structure recognition","author":"Lee Eunji","year":"2021","unstructured":"Eunji Lee, Jaewoo Park, Hyung Il Koo, and Nam Ik Cho. 2021. Deep-learning and graph-based approach to table structure recognition. Multimedia Tools Appl. (2021), 1\u201322.","journal-title":"Multimedia Tools Appl."},{"issue":"4","key":"e_1_3_1_49_2","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: A pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee Jinhyuk","year":"2020","unstructured":"Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234\u20131240.","journal-title":"Bioinformatics"},{"key":"e_1_3_1_50_2","unstructured":"Jieh-Sheng Lee and Jieh Hsiang. 2019. Patentbert: Patent classification with fine-tuning a pre-trained bert model. Retrieved from https:\/\/arXiv:1906.02124."},{"key":"e_1_3_1_51_2","first-page":"231","volume-title":"Proceedings of the International Workshop on Document Analysis Systems","author":"Li Xiao-Hui","year":"2020","unstructured":"Xiao-Hui Li, Fei Yin, and Cheng-Lin Liu. 2020. Page segmentation using convolutional neural network and graphical model. In Proceedings of the International Workshop on Document Analysis Systems. Springer, 231\u2013245."},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.2200\/S00416ED1V01Y201204HLT016"},{"issue":"2010","key":"e_1_3_1_53_2","first-page":"627","article-title":"Sentiment analysis and subjectivity.","volume":"2","author":"Liu Bing","year":"2010","unstructured":"Bing Liu et\u00a0al. 2010. Sentiment analysis and subjectivity.Handbook Natur. Lang. Process. 2, 2010 (2010), 627\u2013666.","journal-title":"Handbook Natur. Lang. Process."},{"key":"e_1_3_1_54_2","first-page":"172","volume-title":"Proceedings of the International Conference on Knowledge Science, Engineering and Management","author":"Liu Chao","year":"2019","unstructured":"Chao Liu, Xinghua Wu, Min Yu, Gang Li, Jianguo Jiang, Weiqing Huang, and Xiang Lu. 2019. A two-stage model based on BERT for short fake news detection. In Proceedings of the International Conference on Knowledge Science, Engineering and Management. Springer, 172\u2013183."},{"key":"e_1_3_1_55_2","doi-asserted-by":"crossref","unstructured":"Edward Loper and Steven Bird. 2002. Nltk: The natural language toolkit. Retrieved from https:\/\/cs\/0205028.","DOI":"10.3115\/1118108.1118117"},{"key":"e_1_3_1_56_2","first-page":"1","article-title":"Algorithmic thinking in the public interest: Navigating technical, legal, and ethical hurdles to web scraping in the social sciences","author":"Luscombe Alex","year":"2021","unstructured":"Alex Luscombe, Kevin Dick, and Kevin Walby. 2021. Algorithmic thinking in the public interest: Navigating technical, legal, and ethical hurdles to web scraping in the social sciences. Qual. Quant. (2021), 1\u201322.","journal-title":"Qual. Quant."},{"key":"e_1_3_1_57_2","first-page":"88","volume-title":"Proceedings of the Extended Semantic Web Conference","author":"Maynard Diana","year":"2011","unstructured":"Diana Maynard and Adam Funk. 2011. Automatic detection of political opinions in tweets. In Proceedings of the Extended Semantic Web Conference. Springer, 88\u201399."},{"key":"e_1_3_1_58_2","unstructured":"Vinayak Mehta. 2018. Camelot. https:\/\/pypi.org\/project\/camelot-py\/. Retrieved from https:\/\/pypi.org\/project\/camelot-py\/."},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1145\/1557019.1557156"},{"key":"e_1_3_1_60_2","first-page":"23","volume-title":"Proceedings of the International Conference on Computer Networks, Big Data, and IoT","author":"Modi Sangita S.","year":"2018","unstructured":"Sangita S. Modi and Sudhir B. Jagtap. 2018. Multimodal web content mining to filter non-learning sites using NLP. In Proceedings of the International Conference on Computer Networks, Big Data, and IoT. Springer, 23\u201330."},{"key":"e_1_3_1_61_2","first-page":"271","volume-title":"Proceedings of the International Conference on Information Communication and Embedded Systems (ICICES\u201913)","author":"Mouthami K.","year":"2013","unstructured":"K. Mouthami, K. Nirmala Devi, and V. Murali Bhaskaran. 2013. Sentiment analysis and classification based on textual reviews. In Proceedings of the International Conference on Information Communication and Embedded Systems (ICICES\u201913). IEEE, 271\u2013276."},{"key":"e_1_3_1_62_2","volume-title":"Algorithmic Extraction of Data in Tables in PDF Documents","author":"Nurminen Anssi","year":"2013","unstructured":"Anssi Nurminen. 2013. Algorithmic Extraction of Data in Tables in PDF Documents. Master\u2019s thesis."},{"key":"e_1_3_1_63_2","first-page":"1","volume-title":"Proceedings of the IEEE Kansas Power and Energy Conference (KPEC\u201920)","author":"Perumalla Kalyan","year":"2020","unstructured":"Kalyan Perumalla, Juan Lopez, Maksudul Alam, Olivera Kotevska, Michael Hempel, and Hamid Sharif. 2020. A novel vetting approach to cybersecurity verification in energy grid systems. In Proceedings of the IEEE Kansas Power and Energy Conference (KPEC\u201920). IEEE, 1\u20136."},{"key":"e_1_3_1_64_2","doi-asserted-by":"crossref","unstructured":"Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee and Luke Zettlemoyer. 2018. Deep contextualized word representations. Retrieved from https:\/\/arXiv:1802.05365.","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_3_1_65_2","doi-asserted-by":"crossref","unstructured":"Qiao Qian Minlie Huang Jinhao Lei and Xiaoyan Zhu. 2016. Linguistically regularized lstms for sentiment classification. Retrieved from https:\/\/arXiv:1611.03949.","DOI":"10.18653\/v1\/P17-1154"},{"issue":"8","key":"e_1_3_1_66_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et\u00a0al. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_3_1_67_2","first-page":"70","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Raja Sachin","year":"2020","unstructured":"Sachin Raja, Ajoy Mondal, and C. V. Jawahar. 2020. Table structure recognition using top-down and bottom-up cues. In Proceedings of the European Conference on Computer Vision. Springer, 70\u201386."},{"issue":"1","key":"e_1_3_1_68_2","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1093\/phe\/phaa006","article-title":"Scraping the web for public health gains: Ethical considerations from a \u201cbig data\u201d research project on HIV and incarceration","volume":"13","author":"Rennie Stuart","year":"2020","unstructured":"Stuart Rennie, Mara Buchbinder, Eric Juengst, Lauren Brinkley-Rubinstein, Colleen Blue, and David L. Rosen. 2020. Scraping the web for public health gains: Ethical considerations from a \u201cbig data\u201d research project on HIV and incarceration. Public Health Ethics 13, 1 (2020), 111\u2013121.","journal-title":"Public Health Ethics"},{"key":"e_1_3_1_69_2","unstructured":"Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: Smaller faster cheaper and lighter. Retrieved from https:\/\/arXiv:1910.01108."},{"key":"e_1_3_1_70_2","first-page":"964X","article-title":"Implementasi algoritme Blake2s pada JSON web token (JWT) sebagai algoritme hashing untuk Mekanisme Autentikasi Layanan REST-API","volume":"2548","author":"Satria Bagus","year":"2018","unstructured":"Bagus Satria, Ari Kusyanti, and Widhi Yahya. 2018. Implementasi algoritme Blake2s pada JSON web token (JWT) sebagai algoritme hashing untuk Mekanisme Autentikasi Layanan REST-API. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer e-ISSN 2548 (2018), 964X.","journal-title":"Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer e-ISSN"},{"key":"e_1_3_1_71_2","first-page":"1162","volume-title":"Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR\u201917)","volume":"1","author":"Schreiber Sebastian","year":"2017","unstructured":"Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. 2017. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR\u201917), Vol. 1. IEEE, 1162\u20131167."},{"key":"e_1_3_1_72_2","first-page":"119","volume-title":"Proceedings of the ACM Symposium on Document Engineering","author":"Shigarov Alexey","year":"2016","unstructured":"Alexey Shigarov, Andrey Mikhailov, and Andrey Altaev. 2016. Configurable table structure recognition in untagged PDF documents. In Proceedings of the ACM Symposium on Document Engineering. 119\u2013122."},{"issue":"9","key":"e_1_3_1_73_2","doi-asserted-by":"crossref","first-page":"1527","DOI":"10.3390\/electronics9091527","article-title":"A new text classification model based on contrastive word embedding for detecting cybersecurity intelligence in twitter","volume":"9","author":"Shin Han-Sub","year":"2020","unstructured":"Han-Sub Shin, Hyuk-Yoon Kwon, and Seung-Jin Ryu. 2020. A new text classification model based on contrastive word embedding for detecting cybersecurity intelligence in twitter. Electronics 9, 9 (2020), 1527.","journal-title":"Electronics"},{"key":"e_1_3_1_74_2","first-page":"377","volume-title":"Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation","author":"Shu Kai","year":"2018","unstructured":"Kai Shu, Amy Sliva, Justin Sampson, and Huan Liu. 2018. Understanding cyber attack behaviors with sentiment information on social media. In Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, 377\u2013388."},{"key":"e_1_3_1_75_2","first-page":"1631","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Socher Richard","year":"2013","unstructured":"Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631\u20131642."},{"key":"e_1_3_1_76_2","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1007\/978-3-030-32381-3_16","volume-title":"Proceedings of the China National Conference on Chinese Computational Linguistics","author":"Sun Chi","year":"2019","unstructured":"Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune bert for text classification? In Proceedings of the China National Conference on Chinese Computational Linguistics. Springer, 194\u2013206."},{"key":"e_1_3_1_77_2","first-page":"16","volume-title":"Proceedings of the International Conference on Applications of Natural Language to Information Systems","author":"Tikhomirov Mikhail","year":"2020","unstructured":"Mikhail Tikhomirov, N. Loukachevitch, Anastasiia Sirotina, and Boris Dobrov. 2020. Using bert and augmentation in named entity recognition for cybersecurity domain. In Proceedings of the International Conference on Applications of Natural Language to Information Systems. Springer, 16\u201324."},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2016.05.023"},{"key":"e_1_3_1_79_2","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4842-3582-9","volume-title":"Practical Web Scraping for Data Science: Best Practices and Examples with Python","author":"Broucke Seppe vanden","year":"2018","unstructured":"Seppe vanden Broucke and Bart Baesens. 2018. Practical Web Scraping for Data Science: Best Practices and Examples with Python. Apress."},{"key":"e_1_3_1_80_2","first-page":"1","volume-title":"Proceedings of International Conference on Intelligent Computing, Information and Control Systems","author":"Varela Noel","year":"2021","unstructured":"Noel Varela, Omar Bonerge Pineda Lezama, and Milvio Charris. 2021. Web scraping and Na\u00efve Bayes classification for political analysis. In Proceedings of International Conference on Intelligent Computing, Information and Control Systems. Springer, 1\u20138."},{"key":"e_1_3_1_81_2","first-page":"120","article-title":"Confusion matrix-based feature selection.","volume":"710","author":"Visa Sofia","year":"2011","unstructured":"Sofia Visa, Brian Ramsay, Anca L. Ralescu, and Esther Van Der Knaap. 2011. Confusion matrix-based feature selection.MAICS 710 (2011), 120\u2013127.","journal-title":"MAICS"},{"key":"e_1_3_1_82_2","first-page":"599","volume-title":"Proceedings of the IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA\u201920)","author":"Vogel Inna","year":"2020","unstructured":"Inna Vogel and Meghana Meghana. 2020. Detecting fake news spreaders on Twitter from a multilingual perspective. In Proceedings of the IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA\u201920). IEEE, 599\u2013606."},{"issue":"1","key":"e_1_3_1_83_2","first-page":"1","article-title":"PathMarker: Protecting web contents against inside crawlers","volume":"2","author":"Wan Shengye","year":"2019","unstructured":"Shengye Wan, Yue Li, and Kun Sun. 2019. PathMarker: Protecting web contents against inside crawlers. Cybersecurity 2, 1 (2019), 1\u201317.","journal-title":"Cybersecurity"},{"key":"e_1_3_1_84_2","first-page":"214","volume-title":"Proceedings of the 30th Conference on Computational Linguistics and Speech Processing (ROCLING\u201918)","author":"Wang Jenq-Haur","year":"2018","unstructured":"Jenq-Haur Wang, Ting-Wei Liu, Xiong Luo, and Long Wang. 2018. An LSTM approach to short text sentiment classification with word embeddings. In Proceedings of the 30th Conference on Computational Linguistics and Speech Processing (ROCLING\u201918). 214\u2013223."},{"key":"e_1_3_1_85_2","doi-asserted-by":"publisher","DOI":"10.3115\/1220575.1220619"},{"key":"e_1_3_1_86_2","article-title":"The named entity recognition of Chinese cybersecurity using an active learning strategy","volume":"2021","author":"Xie Bo","year":"2021","unstructured":"Bo Xie, Guowei Shen, Chun Guo, and Yunhe Cui. 2021. The named entity recognition of Chinese cybersecurity using an active learning strategy. Wireless Commun. Mobile Comput 2021. (2021).","journal-title":"Wireless Commun. Mobile Comput"},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403172"},{"key":"e_1_3_1_88_2","first-page":"1773","volume-title":"Proceedings of the Indian International Conference on Artificial Intelligence (IICAI)","author":"Yildiz Burcu","year":"2005","unstructured":"Burcu Yildiz, Katharina Kaiser, and Silvia Miksch. 2005. pdf2table: A method to extract table information from PDF files. In Proceedings of the Indian International Conference on Artificial Intelligence (IICAI). Citeseer, 1773\u20131785."},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.106529"},{"key":"e_1_3_1_90_2","first-page":"697","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Zheng Xinyi","year":"2021","unstructured":"Xinyi Zheng, Douglas Burdick, Lucian Popa, Xu Zhong, and Nancy Xin Ru Wang. 2021. Global table extractor (GTE): A framework for joint table identification and cell structure recognition using visual context. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 697\u2013706."},{"key":"e_1_3_1_91_2","first-page":"316","volume-title":"Proceedings of the IEEE 6th International Conference on Big Data Analytics (ICBDA\u201921)","author":"Zhou Shieheng","year":"2021","unstructured":"Shieheng Zhou, Jingju Liu, Xiaofeng Zhong, and Wendian Zhao. 2021. Named entity recognition using BERT with whole world masking in cybersecurity domain. In Proceedings of the IEEE 6th International Conference on Big Data Analytics (ICBDA\u201921). IEEE, 316\u2013320."}],"container-title":["ACM Transactions on Management Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3546580","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3546580","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:44:02Z","timestamp":1750272242000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3546580"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,16]]},"references-count":90,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3546580"],"URL":"https:\/\/doi.org\/10.1145\/3546580","relation":{},"ISSN":["2158-656X","2158-6578"],"issn-type":[{"value":"2158-656X","type":"print"},{"value":"2158-6578","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,16]]},"assertion":[{"value":"2021-10-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-24","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}