{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,12]],"date-time":"2025-11-12T14:01:14Z","timestamp":1762956074609,"version":"build-2065373602"},"reference-count":51,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2019,3,4]],"date-time":"2019-03-04T00:00:00Z","timestamp":1551657600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["FA8750- 14-C-0240"],"award-info":[{"award-number":["FA8750- 14-C-0240"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>With advances in machine learning, knowledge discovery systems have become very complicated to set up, requiring extensive tuning and programming effort. Democratizing such technology so that non-technical domain experts can avail themselves of these advances in an interactive and personalized way is an important problem. We describe myDIG, a highly modular, open source pipeline-construction system that is specifically geared towards investigative users (e.g., law enforcement) with no programming abilities. The myDIG system allows users both to build a knowledge graph of entities, relationships, and attributes for illicit domains from a raw HTML corpus and also to set up a personalized search interface for analyzing the structured knowledge. We use qualitative and quantitative data from five case studies involving investigative experts from illicit domains such as securities fraud and illegal firearms sales to illustrate the potential of myDIG.<\/jats:p>","DOI":"10.3390\/fi11030059","type":"journal-article","created":{"date-parts":[[2019,3,4]],"date-time":"2019-03-04T05:22:26Z","timestamp":1551676946000},"page":"59","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["myDIG: Personalized Illicit Domain-Specific Knowledge Discovery with No Programming"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5988-8305","authenticated-orcid":false,"given":"Mayank","family":"Kejriwal","sequence":"first","affiliation":[{"name":"Information Sciences Institute, University of Southern California, Marina del Rey, CA 90502, USA"}]},{"given":"Pedro","family":"Szekely","sequence":"additional","affiliation":[{"name":"Information Sciences Institute, University of Southern California, Marina del Rey, CA 90502, USA"}]}],"member":"1968","published-online":{"date-parts":[[2019,3,4]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.accinf.2008.03.001","article-title":"Measuring the effects of business intelligence systems: The relationship between business process and organizational performance","volume":"9","author":"Elbashir","year":"2008","journal-title":"Int. J. Account. Inf. Syst."},{"key":"ref_2","first-page":"135","article-title":"Approach to building and implementing business intelligence systems","volume":"2","author":"Olszak","year":"2007","journal-title":"Interdiscip. J. Inf. Knowl. Manag."},{"key":"ref_3","unstructured":"Veak, T.J. (2012). Democratizing Technology: Andrew Feenberg\u2019s Critical Theory Of Technology, Suny Press."},{"key":"ref_4","unstructured":"Tanenbaum, J.G., Williams, A.M., Desjardins, A., and Tanenbaum, K. (May, January 27). Democratizing technology: Pleasure, utility and expressiveness in DIY and maker practice. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Paris, France."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Lakkaraju, K., Yurcik, W., and Lee, A.J. (2004, January 29). NVisionIP: Netflow visualizations of system state for security situational awareness. Proceedings of the 2004 ACM Workshop on Visualization and Data Mining for Computer Security, Washington, DC, USA.","DOI":"10.1145\/1029208.1029219"},{"key":"ref_6","first-page":"773","article-title":"An Investment Masquerade: A Descriptive Overview of penny stock fraud and the federal securities laws","volume":"47","author":"Goldstein","year":"1992","journal-title":"Bus. Lawyer"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Lavrenko, V., Allan, J., DeGuzman, E., LaFlamme, D., Pollard, V., and Thomas, S. (2002, January 24\u201327). Relevance models for topic detection and tracking. Proceedings of the Second International Conference on Human Language Technology Research, San Diego, CA, USA.","DOI":"10.3115\/1289189.1289268"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"42","DOI":"10.4018\/jswis.2012070103","article-title":"Elementary: Large-scale knowledge-base construction via machine learning and statistical inference","volume":"8","author":"Niu","year":"2012","journal-title":"Int. J. Semant. Web Inf. Syst. (IJSWIS)"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/S0004-3702(00)00004-7","article-title":"Learning to construct knowledge bases from the World Wide Web","volume":"118","author":"Craven","year":"2000","journal-title":"Artif. Intell."},{"key":"ref_10","unstructured":"Singhal, A. (2019, February 27). Introducing the Knowledge Graph: Things, Not Strings. Available online: https:\/\/www.blog.google\/products\/search\/introducing-knowledge-graph-things-not\/."},{"key":"ref_11","unstructured":"World Wide Web Consortium (2019, February 27). RDF 1.1 Concepts and Abstract Syntax 2014. Available online: https:\/\/www.w3.org\/TR\/rdf11-concepts\/."},{"key":"ref_12","first-page":"28","article-title":"The semantic web","volume":"284","author":"Hendler","year":"2001","journal-title":"Sci. Am."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1561\/1900000003","article-title":"Information extraction","volume":"1","author":"Sarawagi","year":"2008","journal-title":"Found. Trends\u00ae Databases"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Aggarwal, C.C., and Zhai, C. (2012). Mining Text Data, Springer Science & Business Media.","DOI":"10.1007\/978-1-4614-3223-4"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1411","DOI":"10.1109\/TKDE.2006.152","article-title":"A survey of web information extraction systems","volume":"18","author":"Chang","year":"2006","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_16","unstructured":"Kushmerick, N., Weld, D.S., and Doorenbos, R. (1997). Wrapper Induction for Information Extraction, University of Washington."},{"key":"ref_17","first-page":"57","article-title":"Web wrapper induction: A brief survey","volume":"17","author":"Flesca","year":"2004","journal-title":"AI Commun."},{"key":"ref_18","unstructured":"Riloff, E., and Jones, R. (1999). Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, AAAI\/IAAI."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Rebele, T., Suchanek, F., Hoffart, J., Biega, J., Kuzey, E., and Weikum, G. (2016, January 17\u201321). YAGO: A multilingual knowledge base from wikipedia, wordnet, and geonames. Proceedings of the International Semantic Web Conference, Kobe, Japan.","DOI":"10.1007\/978-3-319-46547-0_19"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1145\/2935694.2935696","article-title":"A Relational Framework for Information Extraction","volume":"44","author":"Fagin","year":"2016","journal-title":"ACM SIGMOD Rec."},{"key":"ref_21","first-page":"25","article-title":"DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference","volume":"12","author":"Niu","year":"2012","journal-title":"VLDS"},{"key":"ref_22","unstructured":"(2017, September 19). SpaCy Natural Language Package. Available online: https:\/\/SpaCy.io\/."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Finkel, J.R., Grenager, T., and Manning, C. (2005, January 25\u201330). Incorporating non-local information into information extraction systems by gibbs sampling. Proceedings of the 43rd Annual Meeting on Association For Computational Linguistics, Ann Arbor, MI, USA.","DOI":"10.3115\/1219840.1219885"},{"key":"ref_24","first-page":"3567","article-title":"Data programming: Creating large training sets, quickly","volume":"29","author":"Ratner","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1016\/j.ipm.2005.05.005","article-title":"Analysis of multiple query reformulations on the web: The interactive information retrieval context","volume":"42","author":"Rieh","year":"2006","journal-title":"Inf. Process. Manag."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"101900M","DOI":"10.1117\/12.2266590","article-title":"In context query reformulation for failing SPARQL queries","volume":"10190","author":"Viswanathan","year":"2017","journal-title":"Proc. SPIE"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Muslea, I. (2004, January 22\u201325). Machine learning for online query relaxation. Proceedings of the Tenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, Seattle, WA, USA.","DOI":"10.1145\/1014052.1014081"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Mirzadeh, N., Ricci, F., and Bansal, M. (September, January 31). Supporting user query relaxation in a recommender system. Proceedings of the Ec-Web, Zaragoza, Spain.","DOI":"10.1007\/978-3-540-30077-9_4"},{"key":"ref_29","unstructured":"Gormley, C., and Tong, Z. (2015). Elasticsearch: The Definitive Guide: A Distributed Real-Time Search and Analytics Engine, O\u2019Reilly Media, Inc."},{"key":"ref_30","unstructured":"Han, J., Haihong, E., Le, G., and Du, J. (2011, January 26\u201328). Survey on NoSQL database. Proceedings of the 2011 6th International Conference on Pervasive Computing and Applications (ICPCA), Port Elizabeth, South Africa."},{"key":"ref_31","unstructured":"Amitay, E., Carmel, D., Golbandi, N., Har\u2019el, N.Y., Ofek-Koifman, S., and Yogev, S. (2011). Information Retrieval with Unified Search Using Multiple Facets. (8,024,324), U.S. Patent."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Banks, K., and Hersman, E. (2009, January 17\u201319). FrontlineSMS and Ushahidi-a demo. Proceedings of the 2009 International Conference on Information and Communication Technologies and Development (ICTD), Doha, Qatar.","DOI":"10.1109\/ICTD.2009.5426725"},{"key":"ref_33","unstructured":"Jadhav, A.S., Purohit, H., Kapanipathi, P., Anantharam, P., Ranabahu, A.H., Nguyen, V., Mendes, P.N., Smith, A.G., Cooney, M., and Sheth, A.P. (2019, February 27). Twitris 2.0: Semantically Empowered System for Understanding Perceptions From Social Data. Available online: https:\/\/works.bepress.com\/amit_sheth\/284\/."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Abel, F., Hauff, C., Houben, G.J., Stronkman, R., and Tao, K. (2012, January 16\u201320). Twitcident: Fighting fire with information from social web streams. Proceedings of the 21st International Conference on World Wide Web, Lyon, France.","DOI":"10.1145\/2187980.2188035"},{"key":"ref_35","unstructured":"Imran, M., Castillo, C., Lucas, J., Meier, P., and Vieweg, S. (2014, January 7\u201311). AIDR: Artificial intelligence for disaster response. Proceedings of the 23rd International Conference on World Wide Web, Seoul, Korea."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"4:1","DOI":"10.1147\/JRD.2013.2260692","article-title":"CrisisTracker: Crowdsourced social media curation for disaster awareness","volume":"57","author":"Rogstadius","year":"2013","journal-title":"IBM J. Res. Dev."},{"key":"ref_37","unstructured":"Kumar, S., Barbier, G., Abbasi, M.A., and Liu, H. (2011, January 17\u201321). TweetTracker: An Analysis Tool for Humanitarian and Disaster Relief. Proceedings of the Fifth International Conference on Weblogs and Social Media (ICWSM), Barcelona, Spain."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Choi, S., and Bae, B. (2015). The real-time monitoring system of social big data for disaster management. Computer Science and Its Applications, Springer.","DOI":"10.1007\/978-3-662-45402-2_115"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Thom, D., Kr\u00fcger, R., Ertl, T., Bechstedt, U., Platz, A., Zisgen, J., and Volland, B. (2015, January 14\u201317). Can twitter really save your life? A case study of visual social media analytics for situation awareness. Proceedings of the 2015 IEEE Pacific Visualization Symposium (PacificVis), Hangzhou, China.","DOI":"10.1109\/PACIFICVIS.2015.7156376"},{"key":"ref_40","unstructured":"Gao, T., Hullman, J.R., Adar, E., Hecht, B., and Diakopoulos, N. (May, January 26). NewsViews: An Automated Pipeline for Creating Custom Geovisualizations for News. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI \u201914), Toronto, ON, Canada."},{"key":"ref_41","unstructured":"Krishnamurthy, Y., Pham, K., A\u00e9cio, S., and Freire, J. (2016, January 14). Interactive Exploration for Domain Discovery on the Web. Proceedings of the KDD IDEA, San Francisco, CA, USA."},{"key":"ref_42","first-page":"196","article-title":"Topical Web Crawling for Domain-Specific Resource Discovery Enhanced by Selectively using Link-Context","volume":"12","author":"Liu","year":"2015","journal-title":"Int. Arab J. Inf. Technol. (IAJIT)"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Lopez, L.A., Duerr, R., and Khalsa, S.J.S. (November, January 29). Optimizing apache nutch for domain specific crawling at large scale. Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA.","DOI":"10.1109\/BigData.2015.7363976"},{"key":"ref_44","unstructured":"Ramnandan, S.K., Mittal, A., Knoblock, C.A., and Szekely, P. (June, January 31). Assigning semantic labels to data sources. Proceedings of the European Semantic Web Conference, Portoroz, Slovenia."},{"key":"ref_45","unstructured":"(2017, September 19). Inferlink. Available online: http:\/\/www.inferlink.com\/."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1145\/2934664","article-title":"Apache Spark: A unified engine for big data processing","volume":"59","author":"Zaharia","year":"2016","journal-title":"Commun. ACM"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Crockford, D. (2006). The Application\/json Media Type for Javascript Object Notation (Json), The Internet Society.","DOI":"10.17487\/rfc4627"},{"key":"ref_48","unstructured":"McCandless, M., Hatcher, E., and Gospodnetic, O. (2010). Lucene in Action: Covers Apache Lucene 3.0, Manning Publications Co."},{"key":"ref_49","first-page":"229","article-title":"The Prot\u00e9g\u00e9 OWL plugin: An open development environment for semantic web applications","volume":"Volume 3298","author":"Knublauch","year":"2004","journal-title":"Proceedings of the International Semantic Web Conference"},{"key":"ref_50","first-page":"2004","article-title":"OWL web ontology language overview","volume":"10","author":"McGuinness","year":"2004","journal-title":"W3C Recomm."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene Ontology: Tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/11\/3\/59\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:36:04Z","timestamp":1760186164000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/11\/3\/59"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,4]]},"references-count":51,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["fi11030059"],"URL":"https:\/\/doi.org\/10.3390\/fi11030059","relation":{},"ISSN":["1999-5903"],"issn-type":[{"type":"electronic","value":"1999-5903"}],"subject":[],"published":{"date-parts":[[2019,3,4]]}}}