{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T15:16:53Z","timestamp":1781104613212,"version":"3.54.1"},"reference-count":43,"publisher":"IGI Global Scientific Publishing","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,7,1]]},"abstract":"<p>Choosing the optimal threshold for the collocations extraction remains a manual task performed by experts. Until today, there is no serious work, based on deep studies, which explores possible solutions to automate the learning of the threshold in the statistical terminology field. In this paper, the authors try to spotlight on this problem by exploring, firstly, the evaluation performance techniques used in several scientific areas (such as biomedical and biometric) and applying them, subsequently, on the statistical terminology field. The experimental study gives promoters results. First, it shows the effectiveness of usual techniques (such as ROC and Precision-Recall curves) used to evaluate the performance of binary classification systems. Second, it provides a practical solution for automatic estimation of optimal thresholds for collocation extraction systems.<\/p>","DOI":"10.4018\/ijitwe.2013070103","type":"journal-article","created":{"date-parts":[[2014,1,23]],"date-time":"2014-01-23T12:07:36Z","timestamp":1390478856000},"page":"34-49","source":"Crossref","is-referenced-by-count":12,"title":["Estimation of a Priori Decision Threshold for Collocations Extraction"],"prefix":"10.4018","volume":"8","author":[{"given":"Fethi","family":"Fkih","sequence":"first","affiliation":[{"name":"MARS Research Unit, Faculty of sciences of Monastir, University of Monastir, Monastir, Tunisia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mohamed Nazih","family":"Omri","sequence":"additional","affiliation":[{"name":"MARS Research Unit, Faculty of sciences of Monastir, University of Monastir, Monastir, Tunisia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"2432","reference":[{"key":"ijitwe.2013070103-0","doi-asserted-by":"publisher","DOI":"10.1191\/1471082X04st075oa"},{"key":"ijitwe.2013070103-1","doi-asserted-by":"publisher","DOI":"10.1093\/ijl\/3.1.23"},{"key":"ijitwe.2013070103-2","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(96)00142-2"},{"key":"ijitwe.2013070103-3","doi-asserted-by":"crossref","unstructured":"Church, K., Gale, W., Hanks, P., & Hindle, D. (1989). Parsing, word associations and typical predicate-argument relations. In Proc. The workshop on Speech and Natural Language (HLT '89). Stroudsburg, PA: Association for Computational Linguistics.","DOI":"10.3115\/1075434.1075449"},{"issue":"1","key":"ijitwe.2013070103-4","first-page":"22","article-title":"Word association norms, mutual information and lexicography. J.","volume":"16","author":"K.Church","year":"1990","journal-title":"Computational Linguistics"},{"issue":"3","key":"ijitwe.2013070103-5","doi-asserted-by":"crossref","first-page":"963","DOI":"10.1093\/genetics\/138.3.963","article-title":"Empirical threshold values for quantitative trait mapping.","volume":"138","author":"G. A.Churchill","year":"1994","journal-title":"Genetics"},{"key":"ijitwe.2013070103-6","unstructured":"Daille, B., Gaussier, E., & Lang\u00e9, J. (1996). An evaluation of statistical scores for word association. In J. Ginzburg, Z. Khasidashvili, C. Vogel, J.-J. L\u00e9vy and E. Vallduv\u00ed (eds.), The Tbilisi Symposium on Logic, Language and Computation: Selected Papers. Studies in Logic, Language and Information (pp. 177\u2013188). CSLI Publications."},{"issue":"4","key":"ijitwe.2013070103-7","doi-asserted-by":"crossref","first-page":"1019","DOI":"10.1214\/aos\/1176346318","article-title":"Estimating the stable index a in order to measure tail thickness: A critique.","volume":"11","author":"W. H.DuMouchel","year":"1983","journal-title":"Annals of Statistics"},{"key":"ijitwe.2013070103-8","unstructured":"Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. J. Computational Linguistics - Special Issue on Using Large Corpora, 19(1), 61-74. Cambridge, MA: MIT Press."},{"key":"ijitwe.2013070103-9","doi-asserted-by":"publisher","DOI":"10.1075\/arcl.7.08ell"},{"key":"ijitwe.2013070103-10","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33483-2"},{"key":"ijitwe.2013070103-11","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2005.02.005"},{"key":"ijitwe.2013070103-12","author":"R.Fano","year":"1961","journal-title":"Transmission of information: A statistical theory of communications"},{"key":"ijitwe.2013070103-13","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2005.10.012"},{"key":"ijitwe.2013070103-14","first-page":"l-32","author":"J. R.Firth","year":"1957","journal-title":"A synopsis of linguistic theory 1930-1955. Studies in Linguistic Analysis"},{"key":"ijitwe.2013070103-15","unstructured":"Fkih, F., & Omri, M. N. (2012). Learning the size of the sliding window for the collocations extraction: A ROC-based approach. In Proc. The 2012 International Conference on Artificial Intelligence (ICAI'12), Las Vegas, NV (pp. 1071-1077)."},{"key":"ijitwe.2013070103-16","unstructured":"Fkih, F., & Omri, M. N. (2013). A statistical classifier based Markov chain for complex terms filtration. In Proc. International Conference on Web and Information Technologies (ICWIT'13), Hammamet, Tunisia."},{"key":"ijitwe.2013070103-17","doi-asserted-by":"publisher","DOI":"10.1109\/TE.2008.930092"},{"key":"ijitwe.2013070103-18","first-page":"148","author":"M. A. K.Halliday","year":"1966","journal-title":"Lexis as a linguistic level"},{"key":"ijitwe.2013070103-19","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-009-5119-5"},{"key":"ijitwe.2013070103-20","doi-asserted-by":"publisher","DOI":"10.1016\/S0304-4076(99)00025-1"},{"key":"ijitwe.2013070103-21","doi-asserted-by":"publisher","DOI":"10.1021\/ac025747h"},{"key":"ijitwe.2013070103-22","doi-asserted-by":"publisher","DOI":"10.1201\/9781439800225"},{"key":"ijitwe.2013070103-23","doi-asserted-by":"publisher","DOI":"10.1109\/TDEI.2012.6180248"},{"key":"ijitwe.2013070103-24","unstructured":"Lin, J. F., Li, S., & Cai, Y. (2008). A new collocation extraction method combining multiple association measures. In Proc. International Conference on Machine Learning and Cybernetics (pp. 12-17)."},{"key":"ijitwe.2013070103-25","author":"C. D.Manning","year":"1999","journal-title":"Foundations of statistical natural language processing"},{"key":"ijitwe.2013070103-26","doi-asserted-by":"crossref","unstructured":"Mazurowski, M. A., & Tourassi, G. D. (2009). Evaluating classifiers: Relation between area under the receiver operator characteristic curve and overall accuracy. In Proc. International Joint Conference on Neural Networks, Atlanta, GA (pp. 2045- 2049).","DOI":"10.1109\/IJCNN.2009.5178752"},{"key":"ijitwe.2013070103-27","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-009-9101-4"},{"key":"ijitwe.2013070103-28","doi-asserted-by":"crossref","unstructured":"Pecina, P., & Schlesinger, P. (2006). Combining association measures for collocation extraction. In Proc. COLING-ACL '06 Proceedings of the COLING\/ACL (pp. 651-658). Stroudsburg, PA: Association for Computational Linguistics.","DOI":"10.3115\/1273073.1273157"},{"key":"ijitwe.2013070103-29","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2009.06.001"},{"key":"ijitwe.2013070103-30","unstructured":"Provost, F. J., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 445-453)."},{"key":"ijitwe.2013070103-31","doi-asserted-by":"publisher","DOI":"10.1145\/65943.65945"},{"key":"ijitwe.2013070103-32","unstructured":"Roche, M., Az\u00e9, J., Kodratoff, Y., & Sebag, M. (2004). Learning interestingness measures in terminology extraction a ROC based approach. In Proc. ROC Analysis in AI Workshop (ECAI 2004), Valencia, Espagne."},{"key":"ijitwe.2013070103-33","doi-asserted-by":"publisher","DOI":"10.1002\/bimj.200710415"},{"issue":"1","key":"ijitwe.2013070103-34","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1097\/01.ede.0000147512.81966.ba","article-title":"Optimal cut-point and its corresponding Youden index to discriminate individuals using pooled blood samples.","volume":"16","author":"E. F.Schisterman","year":"2005","journal-title":"Journal of Epidemiology"},{"key":"ijitwe.2013070103-35","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1948.tb01338.x"},{"key":"ijitwe.2013070103-36","doi-asserted-by":"crossref","unstructured":"Shao, H., & Zou, H. (2009). Threshold estimation based on perona-malik model. In Proceedings of the Conf. International Conference on Computational Intelligence and Software Engineering (CiSE 2009) (pp. 1-4).","DOI":"10.1109\/CISE.2009.5366025"},{"key":"ijitwe.2013070103-37","unstructured":"Smadja, F. (1993). Retrieving collocations from text: Xtract. J. Computational Linguistics - Special Issue on Using Large Corpora, 19(1), 143-177. Cambridge, MA: MIT Press."},{"key":"ijitwe.2013070103-38","doi-asserted-by":"crossref","unstructured":"Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006). Beyond accuracy, f-score and ROC: A family of discriminant measures for performance evaluation. In Abdul Sattar and Byeong-ho Kang (Eds.), Proceedings of the Advances in Artificial Intelligence (AI 2006) (pp. 1015-1021). Springer Berlin Heidelberg.","DOI":"10.1007\/11941439_114"},{"key":"ijitwe.2013070103-39","doi-asserted-by":"publisher","DOI":"10.1038\/scientificamerican1000-82"},{"key":"ijitwe.2013070103-40","doi-asserted-by":"publisher","DOI":"10.1109\/10.846690"},{"key":"ijitwe.2013070103-41","doi-asserted-by":"crossref","unstructured":"Wermter, J., & Hahn, U. (2004). Collocation extraction based on modifiability statistics. In Proc of the 20th International Conference on Computational Linguistics (COLING '04). Stroudsburg, PA: Association for Computational Linguistics.","DOI":"10.3115\/1220355.1220496"},{"key":"ijitwe.2013070103-42","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1002\/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3","article-title":"Index for rating diagnostic tests.","volume":"3","author":"W. J.Youden","year":"1950","journal-title":"Journal of Cancer"}],"container-title":["International Journal of Information Technology and Web Engineering"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=100051","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,1]],"date-time":"2022-06-01T12:58:34Z","timestamp":1654088314000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/ijitwe.2013070103"}},"subtitle":["An Empirical Study"],"short-title":[],"issued":{"date-parts":[[2013,7,1]]},"references-count":43,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2013,7]]}},"URL":"https:\/\/doi.org\/10.4018\/ijitwe.2013070103","relation":{},"ISSN":["1554-1045","1554-1053"],"issn-type":[{"value":"1554-1045","type":"print"},{"value":"1554-1053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,7,1]]}}}