{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T00:17:41Z","timestamp":1762042661502,"version":"build-2065373602"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2021,10,11]],"date-time":"2021-10-11T00:00:00Z","timestamp":1633910400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,11]],"date-time":"2021-10-11T00:00:00Z","timestamp":1633910400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The process of integration through classification provides a unified representation of diverse data sources in Big data. The main challenges of big data analysis are due to the various granularities, irreconcilable data models, and multipart interdependencies between data content. Previously designed models were facing problems in integrating and analyzing big data due to highly complex and dynamic multi-source and heterogeneous information variation and also in processing and classifying the association among the attributes in a schema. In this paper, we propose an integration and classification approach through designing a Probabilistic Semantic Association (PSA) method to generate the feature pattern for the sources of big data. The PSA approach is trained to understand the data association and dependency pattern between the data class and incoming data to map the data objects accurately. It initially builds a data integration mechanism by transforming data into structured and learn to utilize the trained knowledge to classify the probabilistic association among the data and knowledge patterns. Later it builds a data analysis mechanism to analyze the mapped data through PSA to evaluate the integration efficiency. An experimental evaluation is performed over a real-time crime dataset generated from multiple locations having various events classes. The analysis of results confined that the utilization of knowledge patterns of accurate classification to enhance the integration of multiple source data is appropriate. The measure of precision, recall, fall-out rate, and F-measure approve the efficiency of the proposed PSA method. Even in comparison with the state-of-art classification method and with SC-LDA algorithm shows an improvisation in the prediction accuracy and enhance the data integration.<\/jats:p>","DOI":"10.1007\/s40747-021-00548-x","type":"journal-article","created":{"date-parts":[[2021,10,11]],"date-time":"2021-10-11T11:04:07Z","timestamp":1633950247000},"page":"3681-3694","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Integration and classification approach based on probabilistic semantic association for big data"],"prefix":"10.1007","volume":"9","author":[{"given":"Vishnu","family":"VandanaKolisetty","sequence":"first","affiliation":[]},{"given":"Dharmendra Singh","family":"Rajput","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,10,11]]},"reference":[{"key":"548_CR1","doi-asserted-by":"publisher","first-page":"171754","DOI":"10.1109\/ACCESS.2020.3024558","volume":"8","author":"JL Lcuadrado","year":"2020","unstructured":"Lcuadrado JL, Carrasco IG, Hern\u00e1ndez JLL, Fern\u00e1ndez PM, Fern\u00e1ndez JLM (2020) Automatic learning framework for pharmaceutical record matching. IEEE Access 8:171754","journal-title":"IEEE Access"},{"key":"548_CR2","doi-asserted-by":"publisher","first-page":"148370","DOI":"10.1109\/ACCESS.2020.3008820","volume":"8","author":"G Yan","year":"2020","unstructured":"Yan G, Wang H (2020) Autonomous coordinated control strategy for complex process of traffic information physical fusion system based on big data. IEEE Access 8:148370","journal-title":"IEEE Access"},{"issue":"1","key":"548_CR3","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1109\/MCOM.2015.7010514","volume":"53","author":"R Mao","year":"2015","unstructured":"Mao R, Xu H, Wu W, Li J, Li Y, Lu M (2015) Overcoming the challenge of variety: Big data abstraction, the next evolution of data management for AAL communication systems. IEEE Communication Magazine 53(1):42\u201347","journal-title":"IEEE Communication Magazine"},{"key":"548_CR4","doi-asserted-by":"crossref","unstructured":"L. da Silva Daniel, L. P. Pedro, S. L. Stanzani, A. Paulo, A. C. Sheffer (2014) A computational framework for integrating and retrieving biodiversity data on a large scale. IEEE International Congress on Big Data","DOI":"10.1109\/BigData.Congress.2014.123"},{"issue":"1","key":"548_CR5","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1186\/s40537-019-0254-8","volume":"6","author":"K Adnan","year":"2019","unstructured":"Adnan K, Akbar R (2019) An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data 6(1):91","journal-title":"Journal of Big Data"},{"key":"548_CR6","first-page":"1","volume":"1","author":"XL Dong","year":"2015","unstructured":"Dong XL, Srivastava D (2015) Big data integration. Morgan & Claypool 1:1\u201398","journal-title":"Morgan & Claypool"},{"issue":"1","key":"548_CR7","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1109\/TKDE.2016.2611577","volume":"29","author":"B Gu","year":"2017","unstructured":"Gu B, Li Z, Zhang X, Liu A, Liu G, Zheng K, Zhao L, Zhou X (2017) The Interaction between schema matching and record matching in data integration. IEEE Trans Knowl Data Eng 29(1):186\u2013199","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"548_CR8","unstructured":"Y. Roh, G. Heo, S. E. Whang (2019) A survey on data collection for machine learning: a big data integration perspective. IEEE Transactions on Knowledge and Data Engineering, 1\u20131"},{"key":"548_CR9","doi-asserted-by":"publisher","first-page":"103285","DOI":"10.1016\/j.jbi.2019.103285","volume":"99","author":"VS Paniagua","year":"2019","unstructured":"Paniagua VS, Zavala RMR, Segura-Bedmar I, Mart\u00ednez P (2019) A two-stage deep learning approach for extracting entities and relationships from medical texts. Journal Biomed. Information 99:103285","journal-title":"Journal Biomed. Information"},{"key":"548_CR10","doi-asserted-by":"crossref","unstructured":"B. Louie, L. Detwiler, N. N. Dalvi, R. Shaker, P. Tarczy-Hornoch, D. Suciu, (2007) Incorporating uncertainty metrics into a general-purpose data integration system. 19th International Conference on Scientific and Statistical Database Management (SSDBM) 19\u201319","DOI":"10.1109\/SSDBM.2007.36"},{"key":"548_CR11","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1016\/j.ins.2018.02.014","volume":"439\u2013440","author":"B Desmet","year":"2018","unstructured":"Desmet B, Hoste V (2018) Online suicide prevention through optimized text classification. Information Science 439\u2013440:61\u201378","journal-title":"Information Science"},{"key":"548_CR12","doi-asserted-by":"publisher","first-page":"594","DOI":"10.1016\/j.future.2018.05.039","volume":"88","author":"P Gong","year":"2018","unstructured":"Gong P, Cao Y, Cai B, Li K (2018) Multi-information location data fusion system of railway signal based on cloud computing. Future Gener Computer System 88:594\u2013598","journal-title":"Future Gener Computer System"},{"key":"548_CR13","unstructured":"B. Marthi, B. Milch, S. Russell (2003) First-order probabilistic models for information extraction. In IJCAI workshop on learning statistical models from relational data"},{"key":"548_CR14","first-page":"177","volume":"4","author":"ME Califf","year":"2003","unstructured":"Califf ME, Mooney RJ (2003) Bottom-up relational learning of pattern matching rules for information extraction. J Mach Learn Res 4:177\u2013210","journal-title":"J Mach Learn Res"},{"issue":"1","key":"548_CR15","doi-asserted-by":"publisher","first-page":"20","DOI":"10.4018\/IJDWM.2016010102","volume":"12","author":"Y Sun","year":"2016","unstructured":"Sun Y, Bie R, Zhang J (2016) Measuring semantic-based structural similarity in multi-relational networks. International Journal of Data Warehouse and Mining 12(1):20\u201333","journal-title":"International Journal of Data Warehouse and Mining"},{"key":"548_CR16","first-page":"405","volume":"2","author":"Y Zhang","year":"2013","unstructured":"Zhang Y, Wu H, Sorathia V, Prasanna VK (2013) Event recommendation in social networks with linked data enablement. In ICEIS Conference 2:405","journal-title":"In ICEIS Conference"},{"key":"548_CR17","doi-asserted-by":"publisher","DOI":"10.1007\/11603412_5","author":"P Shvaiko","year":"2005","unstructured":"Shvaiko P, Euzenat J (2005) A survey of schema-based matching approaches. Springer J Data Semantics. https:\/\/doi.org\/10.1007\/11603412_5","journal-title":"Springer J Data Semantics"},{"key":"548_CR18","doi-asserted-by":"crossref","unstructured":"S. Bergamaschi, L. Po, S. Sorrentino (2007) Automatic annotation in data integration systems. OTM Workshops Springer Berlin Heidelberg Berlin, Heidelberg pp 27-28","DOI":"10.1007\/978-3-540-76888-3_14"},{"key":"548_CR19","doi-asserted-by":"crossref","unstructured":"M. Magnani, N. Rizopoulos, P. McBrien, D. Montesi (2005) Schema integration based on uncertain semantic mappings, In Proc. of Conference on Conceptual Modeling, Springer Berlin Heidelberg, Berlin, Heidelberg","DOI":"10.1007\/11568322_3"},{"key":"548_CR20","volume-title":"Principles of Data Integration","author":"A Doan","year":"2012","unstructured":"Doan A, Halevy A, Ives Z (2012) Principles of Data Integration. Morgan Kaufmann, Waltham, MA, USA"},{"key":"548_CR21","unstructured":"A. Jinchuan, C. Yueguo, Xiaoyong, Li. Cuiping, L. Jiaheng, Z. Suyun, Z. Xuan (2013) Big data challenge: a data management perspective. Front Computer Science*******"},{"key":"548_CR22","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1016\/j.jnca.2014.09.017","volume":"59","author":"Y Sun","year":"2016","unstructured":"Sun Y, Bie R, Zhang J (2016) Semantic relation computing theory and its application. In J Net Comput Applicat 59:219\u2013229","journal-title":"In J Net Comput Applicat"},{"issue":"8","key":"548_CR23","doi-asserted-by":"publisher","first-page":"1821","DOI":"10.1007\/s00779-014-0786-z","volume":"18","author":"Y Sun","year":"2014","unstructured":"Sun Y, Jara AJ (2014) An extensible and active semantic model of information organizing for the internet of things. Pers Ubiquit Computing 18(8):1821\u20131833","journal-title":"Pers Ubiquit Computing"},{"key":"548_CR24","unstructured":"M. D. Lee, B. Pincombe, and M. Welsh (2005) an empirical evaluation of models of text document similarity. Mahwah, NJ: Erlbaum, pp. 1254\u20131259"},{"key":"548_CR25","doi-asserted-by":"publisher","DOI":"10.1007\/s00779-016-0940","author":"J Zhang","year":"2016","unstructured":"Zhang J, Yao C, Sun Y, Fang Z (2016) Building text-based temporally linked event network for scientific big data analytics. Pers Ubiquit Comput. https:\/\/doi.org\/10.1007\/s00779-016-0940","journal-title":"Pers Ubiquit Comput"},{"key":"548_CR26","unstructured":"M. A. Hasan, V. Chaoji, S. Salem, M. Zaki (2006) Link prediction using supervised learning. In Proceedings of SDM-06 workshop on Link Analysis, Counterterrorism and Security"},{"key":"548_CR27","doi-asserted-by":"crossref","unstructured":"E. Agichtein, V. Ganti (2004) Mining reference tables for automatic text segmentation\". In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 20\u201329","DOI":"10.1145\/1014052.1014058"},{"key":"548_CR28","doi-asserted-by":"crossref","unstructured":"G. Kumaran and J. Allan (2004) Text classification and named entities for new event detection, In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 297\u2013304","DOI":"10.1145\/1008992.1009044"},{"key":"548_CR29","doi-asserted-by":"crossref","unstructured":"N. N. Dalvi, D. Suciu, (2007) Management of probabilistic data: foundations and challenges\", PODS, pp. 1\u201312, ACM","DOI":"10.1145\/1265530.1265531"},{"key":"548_CR30","volume-title":"Pattern Recognition and Machine Learning","author":"CM Bishop","year":"2006","unstructured":"Bishop CM (2006) Pattern Recognition and Machine Learning. Springer, New York, NY, USA"},{"key":"548_CR31","unstructured":"SFPD Datasets: \"City and County of San Francisco-SF Open Data\". https:\/\/www.kaggle.com\/c\/sf-crime\/data, May 2015."},{"key":"548_CR32","doi-asserted-by":"crossref","unstructured":"N. Rizopoulos, P. McBrien (2005) A general approach to the generation of conceptual model transformations, In Proc. CAiSE. LNCS","DOI":"10.1007\/11431855_23"},{"key":"548_CR33","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4419-9326-7","volume-title":"\"Ensemble Machine Learning: Methods and Applications\", Springer: Boston","author":"C Zhang","year":"2012","unstructured":"Zhang C, Ma Y (2012) \u201cEnsemble Machine Learning: Methods and Applications\u201d, Springer: Boston. MA, USA"},{"issue":"4","key":"548_CR34","doi-asserted-by":"publisher","first-page":"2608","DOI":"10.1109\/TNSE.2020.3009832","volume":"7","author":"G Bovenzi","year":"2020","unstructured":"Bovenzi G, Aceto G, Ciuonzo D, Persico V, Pescap\u00e9 A (2020) A big data-enabled hierarchical framework for traffic classification. IEEE Trans Netw Sci Eng 7(4):2608\u20132619","journal-title":"IEEE Trans Netw Sci Eng"},{"key":"548_CR35","doi-asserted-by":"publisher","first-page":"1405","DOI":"10.1109\/ACCESS.2017.2726543","volume":"5","author":"JKP Seng","year":"2017","unstructured":"Seng JKP, Ang KL-M (2017) Big feature data analytics: split and combine linear discriminant analysis (SC-LDA) for integration towards decision making analytics. IEEE Access 5:1405","journal-title":"IEEE Access"},{"issue":"4","key":"548_CR36","doi-asserted-by":"publisher","first-page":"306","DOI":"10.26599\/BDMA.2019.9020008","volume":"2","author":"W Liu","year":"2019","unstructured":"Liu W, Wu W, Wang Y, Fu Y, Lin Y (2019) Selective ensemble learning method for belief-rule-base classification system based on PAES. Big Data Mining and Analytics 2(4):306\u20133018","journal-title":"Big Data Mining and Analytics"},{"key":"548_CR37","doi-asserted-by":"publisher","first-page":"5343","DOI":"10.1109\/ACCESS.2018.2888508","volume":"7","author":"C Zhang","year":"2019","unstructured":"Zhang C, Cui C, Gao S, Nie X, Xu W, Yang L, Xi X, Yin Y (2019) Multi-Gram CNN-based self-attention model for relation classification. IEEE Access 7:5343\u20135357","journal-title":"IEEE Access"},{"key":"548_CR38","doi-asserted-by":"publisher","first-page":"54776","DOI":"10.1109\/ACCESS.2020.2980942","volume":"8","author":"GT Reddy","year":"2020","unstructured":"Reddy GT, Reddy MPK, Lakshmanna K, Kaluri R, Rajput DS, Srivastava G, Baker T (2020) Analysis of dimensionality reduction techniques on big data. In IEEE Access 8:54776\u201354788","journal-title":"In IEEE Access"},{"key":"548_CR39","doi-asserted-by":"crossref","unstructured":"N. Deepa, Q. -V. Pham, D. C. Nguyen, S. Bhattacharya, B. Prabadevi, T. R. Gadekallu, P. K. R. Maddikunta, F, Fang, P. N. Pathirana (2021) A survey on blockchain for big data: Approaches, opportunities, and future directions. In Cryptography and Security","DOI":"10.1016\/j.future.2022.01.017"},{"issue":"3","key":"548_CR40","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1109\/TBDATA.2017.2723570","volume":"5","author":"M Tang","year":"2019","unstructured":"Tang M, Alazab M, Luo Y (2019) Big data for cybersecurity: vulnerability disclosure trends and dependencies. IEEE Trans Big Data 5(3):317\u2013329","journal-title":"IEEE Trans Big Data"},{"issue":"11","key":"548_CR41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3390\/app9112375","volume":"9","author":"RU Khan","year":"2019","unstructured":"Khan RU, Zhang X, Kumar R, Sharif A, Golilarz NA, Alazab M (2019) An adaptive multi-layer botnet detection technique using machine learning classifiers. Appl Sci 9(11):1\u201322","journal-title":"Appl Sci"},{"key":"548_CR42","first-page":"1","volume":"18","author":"T Gadekallu","year":"2020","unstructured":"Gadekallu T, Rajput D, Reddy P, Lakshman K, Bhattacharya S, Singh S, Jolfaei A, Alazab M (2020) A novel PCA\u2013whale optimization-based deep neural network model for classification of tomato plant diseases using GPU. J Real-Time Image Proc 18:1\u201314","journal-title":"J Real-Time Image Proc"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00548-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-021-00548-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00548-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,27]],"date-time":"2023-07-27T13:08:26Z","timestamp":1690463306000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-021-00548-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,11]]},"references-count":42,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["548"],"URL":"https:\/\/doi.org\/10.1007\/s40747-021-00548-x","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2021,10,11]]},"assertion":[{"value":"30 March 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 September 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 October 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding author states that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}