{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:15:48Z","timestamp":1777853748038,"version":"3.51.4"},"reference-count":63,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2024,8,27]],"date-time":"2024-08-27T00:00:00Z","timestamp":1724716800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Integrated Design and Process Science: Transactions of the SDPS, Official Journal of the Society for Design and Process Science"],"published-print":{"date-parts":[[2025,5]]},"abstract":"<jats:p>Despite recent improvements in collected drilling data quality and volume, the actual number of wells being used in studies remain low and are often limited to a single source and oil field, producing results that are prone to overfitting and are non-transferable.<\/jats:p>\n                  <jats:p>In our study, we access oil drilling data from 5 of more than 20 oil drilling companies collected from 2005 to 2016 from our industrial partner to create well drilling duration models for well planning. This project could lead to the creation of more generalized models from larger datasets than others in literature. However, the data is difficult to process without expert knowledge, further complicated by properties such as unharmonized, source-locked, semantic heterogeneity, sparse and unlabelled. Conventional automated methods for feature selection, propositionalization, multi-source, or block-wise missing techniques could not be used.<\/jats:p>\n                  <jats:p>In this paper, we describe our method to assist the Knowledge Discovery in Databases (KDD) Selection stage of the abovementioned data - Feature Selection before Propositionalization (FSbP) via Database Attribute Health Feature Reduction (DAHFR) and Report Feature Correlation Matrix (RFCM), collectively known as FvDR. DAHFR and RFCM are filter-type feature selection techniques that could measure relational missingness and keyword correlations respectively despite the complexity of multi-source oil drilling data. FvDR successfully reduced the scope from 700 tables containing 20,000 columns to 22 tables containing fewer than 707 columns while successfully selecting 13 of 16 relevant tables suggested by literature. Despite the loss of information from limitations of subsequent KDD procedures, preliminary models show promising results with over half the test predictions falling within the 20% error margin required for well planning. FvDR proves to be indispensable in KDD as a FSbP framework as it reduces features for examination and streamlines the research process necessary to understand business rules for data harmonization and propositionalization.<\/jats:p>","DOI":"10.3233\/jid-230026","type":"journal-article","created":{"date-parts":[[2024,8,27]],"date-time":"2024-08-27T10:09:44Z","timestamp":1724753384000},"page":"106-129","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Feature Selection Before Propositionalization of Multi-Source Oil Drilling Data"],"prefix":"10.1177","volume":"28","author":[{"given":"Clement Ting Pek","family":"Wen","sequence":"first","affiliation":[{"name":"Centre for Digital Futures, Swinburne University of Technology Sarawak Campus, Kuching, Malaysia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Patrick Then Hang","family":"Hui","sequence":"additional","affiliation":[{"name":"Centre for Digital Futures, Swinburne University of Technology Sarawak Campus, Kuching, Malaysia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Man Fai","family":"Lau","sequence":"additional","affiliation":[{"name":"Department of Computing Technologies, Swinburne University of Technology, Hawthorn, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2024,8,27]]},"reference":[{"key":"e_1_3_3_2_1","article-title":"Correcting classified activities with natural language processing","author":"AbouLaban S.","year":"2022","unstructured":"AbouLaban S., AlAwami N., Herve P., Amrite J. (2022). Correcting classified activities with natural language processing. International Petroleum Technology Conference, D021S042R002. doi: https:\/\/doi.org\/10.2523\/IPTC-22057-EA","journal-title":"International Petroleum Technology Conference"},{"key":"e_1_3_3_3_1","article-title":"Wear analysis and optimization on impregnated diamond bits in vibration assisted rotary drilling (Vard)","author":"Abtahi A.","year":"2011","unstructured":"Abtahi A., Butt S., Molgaard J., Arvani F. (2011). Wear analysis and optimization on impregnated diamond bits in vibration assisted rotary drilling (Vard). ARMA US Rock Mechanics\/Geomechanics Symposium, ARMA-11-266.","journal-title":"ARMA US Rock Mechanics\/Geomechanics Symposium"},{"key":"e_1_3_3_4_1","doi-asserted-by":"publisher","DOI":"10.2118\/119287-PA"},{"key":"e_1_3_3_5_1","article-title":"Improving drilling performance through systematic analysis of historical data: Case study of a canadian field.","author":"Adeleye A.","year":"2004","unstructured":"Adeleye A., Virginillo B., Iyoho A., Parenteau K., Licis H. (2004). Improving drilling performance through systematic analysis of historical data: Case study of a canadian field. . SPE\/IADC Drilling Conference and Exhibition, SPE\u201387177\u2013MS. doi: https:\/\/doi.org\/10.2118\/87177-MS","journal-title":"SPE\/IADC Drilling Conference and Exhibition"},{"key":"e_1_3_3_6_1","doi-asserted-by":"publisher","DOI":"10.3390\/su11236776"},{"key":"e_1_3_3_7_1","doi-asserted-by":"publisher","DOI":"10.3390\/su11236861"},{"key":"e_1_3_3_8_1","first-page":"5","article-title":"Rate of Penetration Prediction and Optimization Using Advances in Artificial Neural Networks, a Comparative Study","author":"Amar K.","year":"2012","unstructured":"Amar K., Ibrahim A. (2012). Rate of Penetration Prediction and Optimization Using Advances in Artificial Neural Networks, a Comparative Study. Proceedings of the 4th International Joint Conference on Computational Intelligence, Barcelona, Spain, 5\u20137.","journal-title":"Proceedings of the 4th International Joint Conference on Computational Intelligence"},{"key":"e_1_3_3_9_1","first-page":"675","article-title":"A statistical solution for cost estimation in oil well drilling","volume":"72","author":"Amorim D.S.","year":"2019","unstructured":"Amorim D.S., Santos O.L.A., Azevedo R.C.D. (2019). A statistical solution for cost estimation in oil well drilling. REM-International Engineering Journal, 72, 675\u2013683. doi: https:\/\/doi.org\/10.1590\/0370-44672018720183","journal-title":"REM-International Engineering Journal"},{"key":"e_1_3_3_10_1","doi-asserted-by":"publisher","DOI":"10.1088\/1742-2140\/aaac5d"},{"key":"e_1_3_3_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.petrol.2018.12.013"},{"key":"e_1_3_3_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13202-020-01066-1"},{"key":"e_1_3_3_13_1","doi-asserted-by":"publisher","DOI":"10.2118\/15362-MS"},{"key":"e_1_3_3_14_1","unstructured":"Byrom T.G. (2014) Casing and Liners for Drilling and Completion: Design and Application: Elsevier."},{"key":"e_1_3_3_15_1","article-title":"Machine learning and natural language processing for automated analysis of drilling and completion data","author":"Casti\u00f1eira D.","year":"2018","unstructured":"Casti\u00f1eira D., Toronyi R., Saleri N. (2018). Machine learning and natural language processing for automated analysis of drilling and completion data. SPE Kingdom of Saudi Arabia Annual Technical Symposium and Exhibition, SPE\u2013192280\u2013MS. doi: https:\/\/doi.org\/10.2118\/192280-MS","journal-title":"SPE Kingdom of Saudi Arabia Annual Technical Symposium and Exhibition"},{"key":"e_1_3_3_16_1","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyg.2021.667802"},{"key":"e_1_3_3_17_1","doi-asserted-by":"publisher","DOI":"10.1080\/00273171.2023.2193600"},{"key":"e_1_3_3_18_1","article-title":"Probabilistic well time estimation using operations reporting data","author":"Codling J.","year":"2013","unstructured":"Codling J., Leatherby J. (2013). Probabilistic well time estimation using operations reporting data. SPE Digital Energy Conference and Exhibition, SPE\u2013163687\u2013MS. doi: https:\/\/doi.org\/10.2118\/163687-MS","journal-title":"SPE Digital Energy Conference and Exhibition"},{"key":"e_1_3_3_19_1","first-page":"6","article-title":"Risk assessment of drilling and completion operations in petroleum wells using a monte carlo and a neural network approach","author":"Coelho D.K.","year":"2005","unstructured":"Coelho D.K., Roisenberg M., F Filho P.J., Jacinto C.M.C. (2005). Risk assessment of drilling and completion operations in petroleum wells using a monte carlo and a neural network approach. Proceedings of the Winter Simulation Conference, 2005, p. 6. doi: https:\/\/doi.org\/10.1109\/WSC.2005.1574466","journal-title":"Proceedings of the Winter Simulation Conference"},{"key":"e_1_3_3_20_1","doi-asserted-by":"publisher","DOI":"10.1037\/1082-989X.6.4.330"},{"key":"e_1_3_3_21_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0157077"},{"key":"e_1_3_3_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11336-023-09918-5"},{"key":"e_1_3_3_23_1","first-page":"4256","article-title":"Structured Feature Selection","author":"Gao T.","year":"2015","unstructured":"Gao T., Wang Z., Ji Q. (2015). Structured Feature Selection. Proceedings of the IEEE International Conference on Computer Vision, pp. 4256\u20134264.","journal-title":"Proceedings of the IEEE International Conference on Computer Vision"},{"key":"e_1_3_3_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jrmge.2017.02.002"},{"key":"e_1_3_3_25_1","doi-asserted-by":"publisher","DOI":"10.21817\/indjcse\/2022\/v13i6\/221306103"},{"key":"e_1_3_3_26_1","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.psych.58.110405.085530"},{"key":"e_1_3_3_27_1","first-page":"1","article-title":"On the performance of multiple imputation for multivariate data with small sample size","volume":"50","author":"Graham J.W.","year":"1999","unstructured":"Graham J.W., Schafer J.L. (1999). On the performance of multiple imputation for multivariate data with small sample size. Statistical Strategies for Small Sample Research, 50, 1\u201327.","journal-title":"Statistical Strategies for Small Sample Research"},{"key":"e_1_3_3_28_1","doi-asserted-by":"crossref","unstructured":"Hamrick T.R. (2011). Optimization of Operating Parameters for Minimum Mechanical Specific Energy in Drilling: West Virginia University.","DOI":"10.2172\/1060223"},{"key":"e_1_3_3_29_1","doi-asserted-by":"publisher","DOI":"10.7569\/JSEE.2014.629520"},{"key":"e_1_3_3_30_1","doi-asserted-by":"publisher","DOI":"10.2118\/135166-PA"},{"issue":"1","key":"e_1_3_3_31_1","first-page":"1","article-title":"A survey of drilling cost and complexity estimation models","volume":"1","author":"Kaiser M.J.","year":"2007","unstructured":"Kaiser M.J. (2007). A survey of drilling cost and complexity estimation models. International Journal of Petroleum Science and Technology, 1(1), 1\u201322.","journal-title":"International Journal of Petroleum Science and Technology"},{"key":"e_1_3_3_32_1","first-page":"1","article-title":"Multi-source synthesis, harmonization, and inventory of critical infrastructure and human-impacted areas in permafrost regions of alaska (Sirius)","volume":"2024","author":"Kaiser S.","year":"2024","unstructured":"Kaiser S., Boike J., Grosse G., Langer M. (2024). Multi-source synthesis, harmonization, and inventory of critical infrastructure and human-impacted areas in permafrost regions of alaska (Sirius). Earth System Science Data Discussions, 2024, 1\u201353. doi: https:\/\/doi.org\/10.5194\/essd-2023-393","journal-title":"Earth System Science Data Discussions"},{"key":"e_1_3_3_33_1","doi-asserted-by":"crossref","first-page":"430","DOI":"10.1007\/3-540-36182-0_45","article-title":"Feature Selection for Propositionalization.beck, Germany, Proceedings","volume":"5","author":"Krogel M.-A.","year":"2002","unstructured":"Krogel M.-A., Wrobel S. (2002). Feature Selection for Propositionalization.beck, Germany, Proceedings. Discovery Science: 5th International Conference, DS 2002 L\u00fcbeck, Germany, Proceedings 5, pp. 430\u2013434.","journal-title":"Discovery Science: 5th International Conference, DS 2002 L\u00fcbeck, Germany, Proceedings"},{"key":"e_1_3_3_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11121-016-0644-5"},{"key":"e_1_3_3_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2017.07.012"},{"key":"e_1_3_3_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2019.07.010"},{"key":"e_1_3_3_37_1","article-title":"The how\u2019s and why\u2019s of probabilistic well cost estimation","author":"L\u00f8berg T.","year":"2008","unstructured":"L\u00f8berg T., Arild \u00d8., Merlo A., D\u2019Alesio P. (2008). The how\u2019s and why\u2019s of probabilistic well cost estimation. IADC\/SPE Asia Pacific Drilling Technology Conference and Exhibition?, SPE\u2013114696\u2013MS. doi: https:\/\/doi.org\/10.2118\/114696-MS","journal-title":"IADC\/SPE Asia Pacific Drilling Technology Conference and Exhibition?"},{"key":"e_1_3_3_38_1","first-page":"1","article-title":"To impute or not impute: That\u2019s the question","author":"Lodder P.","year":"2013","unstructured":"Lodder P. (2013). To impute or not impute: That\u2019s the question. Advising on Research Methods: Selected Topics, 1\u20137.","journal-title":"Advising on Research Methods: Selected Topics"},{"key":"e_1_3_3_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.petrol.2014.03.012"},{"key":"e_1_3_3_40_1","doi-asserted-by":"publisher","DOI":"10.3390\/en12050942"},{"key":"e_1_3_3_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jclinepi.2019.02.016"},{"key":"e_1_3_3_42_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.petrol.2020.107338"},{"key":"e_1_3_3_43_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs12040601"},{"key":"e_1_3_3_44_1","article-title":"An investigation of different approaches for probabilistic cost and time estimation of rigless P&a in subsea multi-well campaign","author":"Moeinikia F.","year":"2014","unstructured":"Moeinikia F., Fjelde K.K., Saasen A., Vr\u00e5lstad T. (2014)An investigation of different approaches for probabilistic cost and time estimation of rigless P&a in subsea multi-well campaign. SPE Norway Subsurface Conference?, SPE\u2013169203\u2013MS. doi: https:\/\/doi.org\/10.2118\/169203-MS","journal-title":"SPE Norway Subsurface Conference?"},{"issue":"8","key":"e_1_3_3_45_1","first-page":"567","article-title":"Current trends and future development in casing drilling","volume":"2","author":"Mohammed A.","year":"2012","unstructured":"Mohammed A., Okeke C.J., Abolle-Okoyeagu I. (2012). Current trends and future development in casing drilling. International Journal of Science and Technology, 2(8), 567\u2013582.","journal-title":"International Journal of Science and Technology"},{"key":"e_1_3_3_46_1","first-page":"44","article-title":"Do we need to observe features to perform feature selection?","author":"Motl J.","year":"2018","unstructured":"Motl J., Kord\u00edk P. (2018). Do we need to observe features to perform feature selection? ITAT, 44\u201351.","journal-title":"ITAT"},{"key":"e_1_3_3_47_1","doi-asserted-by":"publisher","DOI":"10.1093\/acprof:oso\/9780199672547.003.0005"},{"key":"e_1_3_3_48_1","article-title":"Development of well complexity index to improve risk and cost assessments of oil and gas wells","author":"Nzeda B.G.","year":"2014","unstructured":"Nzeda B.G., Schamp J.H., Schmitt T. (2014). Development of well complexity index to improve risk and cost assessments of oil and gas wells. SPE\/IADC Drilling Conference and Exhibition, SPE\u2013167932\u2013MS.","journal-title":"SPE\/IADC Drilling Conference and Exhibition"},{"key":"e_1_3_3_49_1","article-title":"An overview to applicability of multilateral drilling in the middle east fields","author":"Paiaman A.M.","year":"2009","unstructured":"Paiaman A.M., Moghadasi J. (2009). An overview to applicability of multilateral drilling in the middle east fields. SPE Offshore Europe Conference and Exhibition, SPE\u2013123955\u2013MS. doi: https:\/\/doi.org\/10.2118\/123955-MS","journal-title":"SPE Offshore Europe Conference and Exhibition"},{"key":"e_1_3_3_50_1","first-page":"98","article-title":"Lazybum: Decision tree learning using lazy propositionalization","author":"Schouterden J.","year":"2019","unstructured":"Schouterden J., Davis J., Blockeel H. (2019). Lazybum: Decision tree learning using lazy propositionalization. International Conference on Inductive Logic Programming, pp.98\u2013113. doi: https:\/\/doi.org\/10.1007\/978-3-030-49210-6_9","journal-title":"International Conference on Inductive Logic Programming"},{"key":"e_1_3_3_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2018.10.085"},{"key":"e_1_3_3_52_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.petrol.2017.08.064"},{"key":"e_1_3_3_53_1","article-title":"Casing drilling Vs. liner drilling: Critical analysis of an operation in the gulf of mexico","author":"Steppe R.","year":"2005","unstructured":"Steppe R., Clark L., Johns R. (2005). Casing drilling Vs. liner drilling: Critical analysis of an operation in the gulf of mexico. SPE Annual Technical Conference and Exhibition?SPE\u201396810\u2013MS. doi: https:\/\/doi.org\/10.2118\/96810-MS","journal-title":"SPE Annual Technical Conference and Exhibition?"},{"key":"e_1_3_3_54_1","article-title":"Application of neural network to the determination of well-test interpretation model for horizontal wells","author":"Sultan M.A.","year":"2002","unstructured":"Sultan M.A., Al-Kaabi A.U. (2002). Application of neural network to the determination of well-test interpretation model for horizontal wells. SPE Asia Pacific Oil and Gas Conference and Exhibition, SPE\u201377878\u2013MS. doi: https:\/\/doi.org\/10.2118\/77878-MS","journal-title":"SPE Asia Pacific Oil and Gas Conference and Exhibition"},{"key":"e_1_3_3_55_1","first-page":"45","article-title":"Feature reduction of relational oil drilling data before propositionalization and harmonization by measuring relational data missingness","author":"Ting C.P.W.","year":"2022","unstructured":"Ting C.P.W., Then P.H.H. (2022). Feature reduction of relational oil drilling data before propositionalization and harmonization by measuring relational data missingness. ASEAN Australian Engineering Congress, 45\u201355. doi: https:\/\/doi.org\/10.1007\/978-981-99-5547-3_4","journal-title":"ASEAN Australian Engineering Congress"},{"key":"e_1_3_3_56_1","unstructured":"Veeningen D. Givens K. Ravichandran G. Jeffers J. (2009). Method system and program storage device for automatically calculating and displaying time and cost data in a well planning system using a monte carlo simulation software: Google patents."},{"key":"e_1_3_3_57_1","article-title":"An automated system for predicting drilling performance","author":"Whelehan O.","year":"1994","unstructured":"Whelehan O., Thorogood J. (1994). An automated system for predicting drilling performance. SPE\/IADC Drilling Conference and Exhibition, SPE\u201327487\u2013MS. doi: https:\/\/doi.org\/10.2118\/27487-MS","journal-title":"SPE\/IADC Drilling Conference and Exhibition"},{"key":"e_1_3_3_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2017.2685597"},{"key":"e_1_3_3_59_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuroimage.2013.08.015"},{"key":"e_1_3_3_60_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.2020.1751176"},{"key":"e_1_3_3_61_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs11111266"},{"key":"e_1_3_3_62_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2021.10.017"},{"key":"e_1_3_3_63_1","article-title":"Multi-source learning via completion of block-wise overlapping noisy matrices","author":"Zhou D.","year":"2021","unstructured":"Zhou D., Cai T., Lu J. (2021). Multi-source learning via completion of block-wise overlapping noisy matrices. arXiv preprint arXiv:2105.10360. doi: https:\/\/doi.org\/10.48550\/arXiv.2105.10360","journal-title":"arXiv preprint arXiv:2105.10360"},{"key":"e_1_3_3_64_1","doi-asserted-by":"publisher","DOI":"10.1093\/biostatistics\/kxy052"}],"container-title":["Journal of Integrated Design and Process Science: Transactions of the SDPS, Official Journal of the Society for Design and Process Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JID-230026","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.3233\/JID-230026","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JID-230026","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T22:56:36Z","timestamp":1777503396000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.3233\/JID-230026"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,27]]},"references-count":63,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,5]]}},"alternative-id":["10.3233\/JID-230026"],"URL":"https:\/\/doi.org\/10.3233\/jid-230026","relation":{},"ISSN":["1092-0617","1875-8959"],"issn-type":[{"value":"1092-0617","type":"print"},{"value":"1875-8959","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,27]]}}}