{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T07:27:41Z","timestamp":1770276461482,"version":"3.49.0"},"reference-count":61,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2019,4,9]],"date-time":"2019-04-09T00:00:00Z","timestamp":1554768000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"published-print":{"date-parts":[[2019,4,11]]},"abstract":"<jats:p>\n                    \u00a0Various sources and sophisticated tools are used to gather and process the comparatively large volume of data or big data that sometimes leads to privacy disclosure (at broader or finer level) for the data owner. Privacy preserving data publishing approaches such as\n                    <jats:italic>k<\/jats:italic>\n                    -anonymity,\n                    <jats:italic>l<\/jats:italic>\n                    -diversity, and\n                    <jats:italic>t<\/jats:italic>\n                    -closeness are very well used to de-identify data, however, chances of re-identification of attributes always exist as data is collected from multiple sources such as public web, social media, Internet whereabouts, and sensors that are highly prone to data linkages. In literature,\n                    <jats:italic>k<\/jats:italic>\n                    -anonymity stands out amongst the most popular mainstream data anonymization approaches that can also be used for large sized data. However, applying\n                    <jats:italic>k<\/jats:italic>\n                    -anonymization for variety of data (especially unstructured data) is difficult in the traditional way, due to the fact that it requires the given data to be classified into the personal data, the quasi identifiers, and the sensitive data. We identify existing approaches from the literature of Natural Language Processing(NLP) to convert the unstructured data to structured form in order to apply\n                    <jats:italic>k<\/jats:italic>\n                    -anonymization over the generated structured records. We adopt a two phase Conditional Random Field (CRF) based Named Entity Recognition (NER) approach to represent unstructured data into the structured form. Further, we propose an Improved Scalable\n                    <jats:italic>k<\/jats:italic>\n                    -Anonymization (ImSKA) to anonymize the well represented unstructured data that achieves privacy preserving unstructured big data publishing. We compare both of the propose approaches namely NER and ImSKA with existing approaches and the results show that our proposed solutions outperform the existing approaches in terms of\n                    <jats:italic>F<\/jats:italic>\n                    1 score and Normalized Cardinality Penalty (NCP), respectively. Since, NER approaches are widely used for bio-medical datasets, we have also used a well-known Bio-NER dataset called GENIA corpus for measuring the performance.\n                  <\/jats:p>","DOI":"10.3233\/jifs-181231","type":"journal-article","created":{"date-parts":[[2019,4,12]],"date-time":"2019-04-12T10:37:33Z","timestamp":1555065453000},"page":"3471-3482","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":12,"title":["Towards privacy preserving unstructured big data publishing"],"prefix":"10.1177","volume":"36","author":[{"given":"Brijesh","family":"Mehta","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, College of Technology and Engineering, Maharana Pratap University of Agriculture and Technology, Udaipur, Rajasthan, India"}]},{"given":"Udai Pratap","family":"Rao","sequence":"additional","affiliation":[{"name":"Computer Engineering Department, Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat, India"}]},{"given":"Ruchika","family":"Gupta","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering Department, Chandigarh University, Mohali, Punjab, India"}]},{"given":"Mauro","family":"Conti","sequence":"additional","affiliation":[{"name":"Department of Mathematics and HIT Center, University of Padua, Italy"}]}],"member":"179","published-online":{"date-parts":[[2019,4,9]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-016-0059-y"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-38586-5_8"},{"key":"e_1_3_1_4_2","first-page":"120","volume-title":"Procedia Computer Science","volume":"78","author":"Mehta B.B.","unstructured":"MehtaB.B. and RaoU.P., Privacy preserving unstructured big data analytics: Issues and challenges, Procedia Computer Science. Elsevier, Jan 2016, vol. 78, pp. 120\u2013124, Jan 2016, 1st International Conference on Information Security and Privacy 2015, Nagpur, India."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0020-0255(99)00035-3"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiolchem.2009.07.004"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0959-440X(96)80056-X"},{"key":"e_1_3_1_8_2","first-page":"96","volume-title":"Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications","author":"GuoDong Z.","unstructured":"GuoDongZ. and JianS., Exploring deep knowledge resources in biomedical name recognition, in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, ser. JNLPBA \u201904. Geneva, Switzerland: Association for Computational Linguistics, Aug 2004, pp. 96\u201399."},{"issue":"1","key":"e_1_3_1_9_2","first-page":"4","article-title":"An introduction to hidden markov models","volume":"3","author":"Rabiner L.","year":"1986","unstructured":"RabinerL. and JuangB., An introduction to hidden markov models, IEEE Acoustics, Speech, and Signal Processing Magazine. IEEE 3(1) (1986), 4\u201316.","journal-title":"IEEE Acoustics, Speech, and Signal Processing Magazine. IEEE"},{"key":"e_1_3_1_10_2","unstructured":"Introduction to support vector machine [Online] Available: http:\/\/docs.opencv.org\/2.4\/doc\/tutorials\/ml\/introduction_to_svm\/introduction_to_svm.html [Accessed: 23-Jun-2017]."},{"key":"e_1_3_1_11_2","unstructured":"Support vector machine [Online] Available: http:\/\/scikit-learn.org\/stable\/modules\/svm.html [Accessed: 23-Jun-2017]."},{"key":"e_1_3_1_12_2","first-page":"80","volume-title":"Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications","author":"Lee C.","unstructured":"LeeC., HouW.-J., ChenH.-H., Annotating multiple types of biomedical entities: a single word classification approach, in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, ser. JNLPBA \u201904. Geneva, Switzerland: Association for Computational Linguistics, Aug 2004, pp. 80\u201383."},{"key":"e_1_3_1_13_2","first-page":"88","volume-title":"Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications","author":"Finkel J.","unstructured":"FinkelJ., DingareS., NguyenH., NissimM., ManningC., SinclairG., Exploiting context for biomedical entity recognition: from syntax to the web, in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, ser. JNLPBA \u201904. Geneva, Switzerland: Association for Computational Linguistics, Aug 2004, pp. 88\u201391."},{"issue":"1","key":"e_1_3_1_14_2","first-page":"1","article-title":"Identifying gene and protein mentions in text using conditional random fields, BMC bioin-formatics","volume":"6","author":"McDonald R.","year":"2005","unstructured":"McDonaldR., PereiraF., Identifying gene and protein mentions in text using conditional random fields, BMC bioin-formatics, BioMed Central 6(1) (2005), 1\u20137.","journal-title":"BioMed Central"},{"issue":"5","key":"e_1_3_1_15_2","first-page":"1","article-title":"NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition","volume":"7","author":"Tsai R.T.-H.","year":"2006","unstructured":"TsaiR.T.-H., SungC.-L., DaiH.-J., HungH.-C., SungT.-Y. , and HsuW.-L., NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition, BMC bioinformatics. BioMed Central 7(5) (2006), 1\u201314.","journal-title":"BMC bioinformatics. BioMed Central"},{"key":"e_1_3_1_16_2","first-page":"282","volume-title":"Proceedings of the eighteenth international conference on machine learning, ICML","volume":"1","author":"Lafferty J.","unstructured":"LaffertyJ., McCallumA., PereiraF., Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in Proceedings of the eighteenth international conference on machine learning, ICML, vol. 1. Williamstown, MA, USA: Morgan Kaufmann Publishers Inc., Jun 2001, pp. 282\u2013289."},{"key":"e_1_3_1_17_2","first-page":"85","volume-title":"Proceedings of the Second International Symposium on Semantic Mining in Biomedicine (SMBM 2006)","volume":"7","author":"Friedrich C.M.","unstructured":"FriedrichC.M., RevillionT., HofmannM., and FluckJ., Biomedical and chemical named entity recognition with conditional random fields: The advantage of dictionary features, in Proceedings of the Second International Symposium on Semantic Mining in Biomedicine (SMBM 2006), vol. 7. Jena, Germany: BMC Bioinformatics, Apr 2006, pp. 85\u201389."},{"key":"e_1_3_1_18_2","first-page":"104","volume-title":"Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications","author":"Settles B.","unstructured":"SettlesB., Biomedical named entity recognition using conditional random fields and rich feature sets, in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, ser. JNLPBA \u201904. Geneva, Switzerland: Association for Computational Linguistics, Aug 2004, pp. 104\u2013107."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2005.09.072"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11767-006-0255-6"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2013.106"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-013-0637-7"},{"key":"e_1_3_1_23_2","first-page":"33","volume-title":"Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine-Volume 13","author":"Lee K.-J.","unstructured":"LeeK.-J., HwangY.-S., RimH.-C., Two-phase biomedical NE recognition based on SVMs, in Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine-Volume 13, Sapporo, Japan: Association for Computational Linguistics, Jul 2003), pp. 33\u201340."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btg1023"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/11562214_57"},{"issue":"7","key":"e_1_3_1_26_2","first-page":"1103","article-title":"Experimental study on a two phase method for biomedical named entity recognition, IEICE transactions on information and systems","volume":"90","author":"Seonho K.","year":"2007","unstructured":"SeonhoK., Experimental study on a two phase method for biomedical named entity recognition, IEICE transactions on information and systems, The Institute of Electronics, Information and Communication Engineers, 90(7) (2007), 1103\u20131110.","journal-title":"The Institute of Electronics, Information and Communication Engineers,"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/IALP.2010.41"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.3115\/1119176.1119206"},{"key":"e_1_3_1_29_2","first-page":"1","volume-title":"Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","author":"Samarati P.","unstructured":"SamaratiP., SweeneyL., Generalizing data to provide anonymity when disclosing information, in Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ser. PODS \u201998. Seattle, Washington, USA: ACM, Jun 1998, pp. 1\u201313."},{"key":"e_1_3_1_30_2","unstructured":"SamaratiP. SweeneyL. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression Computer Science Laboratory SRI International Menlo Park California United States Tech. Rep. Apr 1998 [Online] Available: http:\/\/www.csl.sri.com\/papers\/sritr-98-04\/ [Accessed: 18-Feb-2015]."},{"key":"e_1_3_1_31_2","first-page":"1","volume-title":"Proceedings of the 22nd International Conference on Data Engineering","author":"LeFevre K.","unstructured":"LeFevreK., DeWittD.J., RamakrishnanR., Mondrian multidimensional k-anonymity, in Proceedings of the 22nd International Conference on Data Engineering, ser. ICDE \u201906. Washington, DC, USA: IEEE Computer Society, Apr 2006), 1\u201311."},{"key":"e_1_3_1_32_2","first-page":"49","volume-title":"Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data","author":"LeFevre K.","unstructured":"LeFevreK., DeWittD.J., RamakrishnanR., Incognito: Efficient full-domain k-anonymity, in Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD \u201905. Baltimore, Maryland: ACM, Jun 2005, 49\u201360."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.210"},{"key":"e_1_3_1_34_2","first-page":"747","volume-title":"Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data","author":"Wong W.K.","unstructured":"WongW.K., MamoulisN., CheungD.W.L., Nonhomogeneous generalization in privacy preserving data publishing, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD \u201910. Indianapolis, Indiana, USA: ACM, Jun 2010), 747\u2013758."},{"key":"e_1_3_1_35_2","first-page":"93","volume-title":"Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data","author":"Liu K.","unstructured":"LiuK., TerziE., Towards identity anonymization on graphs, in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD \u201908. Vancouver, Canada: ACM, June 2008, pp. 93\u2013106."},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453873"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10207-013-0196-7"},{"key":"e_1_3_1_38_2","first-page":"648","volume-title":"Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology","author":"Zhou B.","unstructured":"ZhouB., HanY., PeiJ., JiangB., TaoY., JiaY., Continuous privacy preserving publishing of data streams, in Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, ser. EDBT \u201909. Saint Petersburg, Russia: ACM, March 2009, pp. 648\u2013659."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/2339530.2339696"},{"key":"e_1_3_1_40_2","first-page":"28","volume-title":"Proceedings of the 2016 Sixth International Conference on Advanced Computing and Communication Technologies","author":"Mehta B.B.","year":"2016","unstructured":"MehtaB.B., RaoU.P., KumarN., GadekulaS.K., Towards privacy preserving big data analytics, in Proceedings of the 2016 Sixth International Conference on Advanced Computing and Communication Technologies, ser. ACCT-Rohtak, India: Research Publishing, (2016), 28\u201335."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2013.48"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.1015"},{"key":"e_1_3_1_44_2","first-page":"249","volume-title":"Proceedings of the Fourth IEEE International Conference on Data Mining, 2004. ICDMrsquo;04","author":"Wang K.","unstructured":"WangK., YuP.S., ChakrabortyS., Bottom-up generalization: A data mining solution to privacy protection, in Proceedings of the Fourth IEEE International Conference on Data Mining, 2004. ICDMrsquo;04, IEEE. Brighton, UK: IEEE, Nov 2004, pp. 249\u2013256."},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcss.2014.02.007"},{"key":"e_1_3_1_46_2","first-page":"26:1","volume-title":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","author":"Zakerzadeh H.","unstructured":"ZakerzadehH., AggarwalC.C., BarkerK., Privacypreserving big data publishing, in Proceedings of the 27th International Conference on Scientific and Statistical Database Management, ser. SSDBM \u201915. La Jolla, California: ACM, Jun 2015 pp. 26:1\u201326:11."},{"key":"e_1_3_1_47_2","first-page":"271","article-title":"Privacy preserving big data publishing: a scalable k-anonymization approach using mapreduce, IET Software","volume":"11","author":"Mehta B.B.","year":"2017","unstructured":"MehtaB.B., RaoU.P., Privacy preserving big data publishing: a scalable k-anonymization approach using mapreduce, IET Software, Institution of Engineering and Technology 11 (2017), 271\u2013276.","journal-title":"Institution of Engineering and Technology"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10586-015-0426-z"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2014.2368568"},{"key":"e_1_3_1_50_2","unstructured":"GENIA tagger v3.0 [Online] Available: http:\/\/www.nactem.ac.uk\/GENIA\/tagger\/ [Accessed: 11-Nov-2017]."},{"key":"e_1_3_1_51_2","unstructured":"MEDLINE [Online] Available: https:\/\/www.nlm.nih.gov\/pubs\/factsheets\/medline.html [Accessed: 11-Nov-2017]."},{"key":"e_1_3_1_52_2","unstructured":"Wall street journal corpus [Online] Available: https:\/\/catalog.ldc.upenn.edu\/docs\/LDC94S13A\/wsj1.txt [Accessed: 11-Nov-2017]."},{"key":"e_1_3_1_53_2","unstructured":"PennBioIE corpus [Online] Available: https:\/\/catalog.ldc.upenn.edu\/LDCT21 [Accessed: 11-Nov-2017]."},{"issue":"3","key":"e_1_3_1_54_2","first-page":"1409","article-title":"A geometric interpretation of darroch and ratcliffrsquo;s generalized iterative scaling","volume":"17","author":"Csiszar I.","year":"1989","unstructured":"CsiszarI., A geometric interpretation of darroch and ratcliffrsquo;s generalized iterative scaling, The Annals of Statistics. Institute of Mathematical Statistics 17(3) (1989), 1409\u20131413.","journal-title":"The Annals of Statistics. Institute of Mathematical Statistics"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.588021"},{"key":"e_1_3_1_56_2","first-page":"87","volume-title":"Proceedings of 2009 13th Panhellenic Conference on Informatics","author":"Livieris I.E.","unstructured":"LivierisI.E., ApostolopoulouM.S., SotiropoulosD. G., SioutasS., and PintelasP., Classification of Large Biomedical Data Using ANNs Based on BFGS Method, in Proceedings of 2009 13th Panhellenic Conference on Informatics, Corfu Island, Greece: IEEE, Sept 2009, pp. 87\u201391."},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF01589116"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1973.9030"},{"key":"e_1_3_1_59_2","first-page":"70","volume-title":"Proceedings of the international joint workshop on natural language processing in biomedicine and its applications","author":"Kim J.-D.","unstructured":"KimJ.-D., OhtaT., TsuruokaY., TateisiY., CollierN., Introduction to the bio-entity recognition task at jnlpba, in Proceedings of the international joint workshop on natural language processing in biomedicine and its applications, ser. JNLPBA \u201904. Geneva, Switzerland: Association for Computational Linguistics, Aug 2004, pp. 70\u201375."},{"key":"e_1_3_1_60_2","first-page":"465","volume-title":"Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics","author":"Okanohara D.","unstructured":"OkanoharaD., MiyaoY., TsuruokaY., TsujiiJ., Improving the scalability of semi-markov conditional random fields for named entity recognition, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ser. ACL-44. Sydney, Australia: Association for Computational Linguistics, Jul 2006, pp. 465\u2013472."},{"key":"e_1_3_1_61_2","first-page":"1103","volume-title":"IEICE -Transactions on Information Systems","volume":"90","author":"Kim S.","unstructured":"KimS., YoonJ., Experimental study on a two phase method for biomedical named entity recognition, IEICE -Transactions on Information Systems, Oxford, UK: Oxford University Press, Jul 2007, vol. E90-D, no. 7, pp. 1103\u20131110, Jul 2007."},{"key":"e_1_3_1_62_2","unstructured":"GhinitaG. KarrasP. KalnisP. MamoulisN. Fast data anonymization with low information loss in Proceedings of the 33rd International Conference on Very Large Data Bases ser. VLDB \u201907. Vienna Austria: VLDB Endowment Sep 2007 pp. 758\u2013769."}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-181231","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.3233\/JIFS-181231","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-181231","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T17:28:26Z","timestamp":1770226106000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.3233\/JIFS-181231"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,9]]},"references-count":61,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,4,11]]}},"alternative-id":["10.3233\/JIFS-181231"],"URL":"https:\/\/doi.org\/10.3233\/jifs-181231","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,4,9]]}}}