{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:34:17Z","timestamp":1760240057023,"version":"build-2065373602"},"reference-count":39,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,1,31]],"date-time":"2019-01-31T00:00:00Z","timestamp":1548892800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>E-commerce businesses employ recommender models to assist in identifying a personalized set of products for each visitor. To accurately assess the recommendations\u2019 influence on customer clicks and buys, three target areas\u2014customer behavior, data collection, user-interface\u2014will be explored for possible sources of erroneous data. Varied customer behavior misrepresents the recommendations\u2019 true influence on a customer due to the presence of B2B interactions and outlier customers. Non-parametric statistical procedures for outlier removal are delineated and other strategies are investigated to account for the effect of a large percentage of new customers or high bounce rates. Subsequently, in data collection we identify probable misleading interactions in the raw data, propose a robust method of tracking unique visitors, and accurately attributing the buy influence for combo products. Lastly, user-interface issues discuss the possible problems caused due to the recommendation widget\u2019s positioning on the e-commerce website and the stringent conditions that should be imposed when utilizing data from the product listing page. This collective methodology results in an exact and valid estimation of the customer\u2019s interactions influenced by the recommendation model in the context of standard industry metrics, such as Click-through rates, Buy-through rates, and Conversion revenue.<\/jats:p>","DOI":"10.3390\/data4010023","type":"journal-article","created":{"date-parts":[[2019,2,1]],"date-time":"2019-02-01T03:08:05Z","timestamp":1548990485000},"page":"23","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Data Preprocessing for Evaluation of Recommendation Models in E-Commerce"],"prefix":"10.3390","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2568-0199","authenticated-orcid":false,"given":"Namrata","family":"Chaudhary","sequence":"first","affiliation":[{"name":"Boxx.ai | AI for E-commerce, Data Science dept., Bengaluru 560095, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9336-8650","authenticated-orcid":false,"given":"Drimik","family":"Roy Chowdhury","sequence":"additional","affiliation":[{"name":"University of Michigan, Department of Mathematics, Ann Arbor, USA 48109 &amp; Boxx.ai | AI for E-commerce, Data Science dept., Bengaluru 560095, India"}]}],"member":"1968","published-online":{"date-parts":[[2019,1,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1007\/s11257-011-9112-x","article-title":"Recommender systems: From algorithms to user experience","volume":"22","author":"Konstan","year":"2012","journal-title":"User Model. User-Adapt. Interact."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Pu, P., Chen, L., and Hu, R. (2011, January 23\u201327). A user-centric evaluation framework for recommender systems. Proceedings of the Fifth ACM Conference on Recommender Systems (RecSys\u201911), Chicago, IL, USA.","DOI":"10.1145\/2043932.2043962"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"32","DOI":"10.4018\/ijeei.2013100103","article-title":"Recommender Systems: The Importance of Personalization in E-Business Environments","volume":"4","author":"Polatidis","year":"2013","journal-title":"Int. J. E-Entrep. Innov."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ricci, F., Rokach, L., Shapira, B., and Kantor, P. (2011). Evaluating Recommendation Systems. Recommender Systems Handbook, Springer.","DOI":"10.1007\/978-0-387-85820-3"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1145\/963770.963772","article-title":"Evaluating collaborative filtering recommender systems","volume":"22","author":"Herlocker","year":"2004","journal-title":"ACM Trans. Inf. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2151163.2151166","article-title":"Impact of data characteristics on recommender systems performance","volume":"3","author":"Adomavicius","year":"2012","journal-title":"ACM Trans. Manag. Inf. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Amatriain, X., Pujol, J., and Oliver, N. (2009, January 22\u201326). I like it... I like it not: Evaluating user ratings noise in recommender systems. Proceedings of the 17th International Conference on User Modeling, Adaptation and Personalization (UMAP), Trento, Italy.","DOI":"10.1007\/978-3-642-02247-0_24"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Tan, P.N., and Kumar, V. (2004). Discovery of Web Robot Sessions Based on Their Navigational Patterns. Intelligent Technologies for Information Analysis, Springer.","DOI":"10.1007\/978-3-662-07952-2_9"},{"key":"ref_9","unstructured":"Kohavi, R., and Parekh, R. (2003, January 24\u201327). Ten Supplementary Analyses to Improve E-commerce Web Sites. Proceedings of the WebKDD Workshop: Web Mining as a Premise to Effective and Intelligent Web Applications, International Conference on Knowledge Discovery and Data Mining (KDD), Washington, DC, USA."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1145\/846183.846188","article-title":"Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data","volume":"1","author":"Srivastava","year":"2000","journal-title":"SIGKDD Explor."},{"key":"ref_11","unstructured":"Kaushik, A. (2019, January 15). Bounce Rate as Sexiest Web Metric Ever. Available online: http:\/\/www.marketingprofs.com\/7\/bounce-rate-sexiest-web-metric-ever-kaushik.asp?sp=1."},{"key":"ref_12","unstructured":"Kaushik, A. (2019, January 15). Excellent Analytics Tip 11: Measure Effectiveness of Your Web Pages. Available online: https:\/\/www.kaushik.net\/avinash\/excellent-analytics-tip-11-measure-effectiveness-of-your-web-pages\/."},{"key":"ref_13","unstructured":"Robinson, B. (2019, January 15). B2C and B2B Ecommerce: What\u2019s the Difference Anyway?. Available online: https:\/\/www.business.com\/articles\/b2c-and-b2b-ecommerce-whats-the-difference-anyway\/."},{"key":"ref_14","unstructured":"Hellerstein, J.M. (2008, February 27). Quantitative Data Cleaning for Large Databases, EECS Computer Science Division, UC Berkeley. Available online: http:\/\/db.cs.berkeley.edu\/jmh."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1023\/B:MACH.0000035473.11134.83","article-title":"Lessons and Challenges from Mining Retail E-Commerce Data","volume":"57","author":"Kohavi","year":"2004","journal-title":"Mach. Learn."},{"key":"ref_16","unstructured":"Campbel, K. (2019, January 15). How to Track Ecommerce Shoppers Across Devices. Available online: https:\/\/www.practicalecommerce.com\/How-to-Track-Ecommerce-Shoppers-Across-Devices."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1080\/10864415.2001.11044222","article-title":"Economics and Electronic Commerce: Survey and Directions for Research","volume":"5","author":"Kauffman","year":"2001","journal-title":"Int. J. Electron. Commer."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1108\/10662240010342577","article-title":"Developing usable Web sites \u00b1 a review and model","volume":"10","author":"Cunliffe","year":"2000","journal-title":"Internet Res. Electron. Netw. Appl. Policy"},{"key":"ref_19","unstructured":"Deshmukh, R., and Wangikar, V. (2011, January 8\u201310). Data Cleaning: Current Approaches and Issues. Proceedings of the Conference: IEEE International Conference on Knowledge Engineering, Aurangabad, India."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Xu, H., Li, Z., Chu, C., Chen, Y., Yang, Y., Lu, H., Wang, H., and Stavrou, A. (2018, January 3\u20137). Detecting and Characterizing Web Bot Traffic in a Large E-commerce Marketplace. Proceedings of the 23rd European Symposium on Research in Computer Security, ESORICS 2018, Barcelona, Spain.","DOI":"10.1007\/978-3-319-98989-1_8"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Suchacka, G. (2014, January 7\u201310). Analysis of aggregated bot and human traffic on e-commerce site. Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, Warsaw, Poland.","DOI":"10.15439\/2014F346"},{"key":"ref_22","first-page":"279","article-title":"An Algorithmic Approach to Data Preprocessing in Web Usage Mining","volume":"2","author":"Tyagi","year":"2010","journal-title":"Int. J. Inf. Technol. Knowl. Manag."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Kohavi, R. (2001). Mining E-Commerce Data: The Good, the Bad, and the Ugly. Advances in Knowledge Discovery and Data Mining. PAKDD 2001, Springer. Lecture Notes in Computer Science.","DOI":"10.1145\/502512.502518"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Suchacka, G., and Sobk\u00f3w, M. (2015, January 24\u201326). Detection of Internet robots using a Bayesian approach. Proceedings of the 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF), Gdynia, Poland.","DOI":"10.1109\/CYBConf.2015.7175961"},{"key":"ref_25","first-page":"532","article-title":"Bootlier-Plot: Bootstrap Based Outlier Detection Plot","volume":"65","author":"Singh","year":"2003","journal-title":"Sankhy\u0101: Indian J. Stat. (2003\u20132007)"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Candelon, B., and Metiu, N. (2019, January 15). A Distribution-Free Test for Outliers (2013). Available online: https:\/\/ssrn.com\/abstract=2796894.","DOI":"10.2139\/ssrn.2796894"},{"key":"ref_27","unstructured":"Genson, R. (2019, January 15). The B2B Ecommerce Trends Report: Millennial Buyers, Payment Options and a Maturing Market. Available online: https:\/\/www.bigcommerce.com\/blog\/b2b-ecommerce-trends\/#b2b-customer-acquisition-trends."},{"key":"ref_28","unstructured":"Oro Team (2019, January 15). B2C vs B2B Customers: How to Handle the Difference, B2B eCommerce Tips & Trends. Available online: https:\/\/oroinc.com\/b2b-ecommerce\/blog\/b2c-vs-b2b-customers-how-handle-difference."},{"key":"ref_29","unstructured":"Bailey, M. (2019, January 15). 7 Wholescale eCommerce Features You Cannot Overlook. Available online: https:\/\/www.handshake.com\/blog\/wholesale-ecommerce."},{"key":"ref_30","unstructured":"Aalberg, T., Papatheodorou, C., Dobreva, M., Tsakonas, G., and Farrugia, C.J. (2013). Persistence in Recommender Systems: Giving the Same Recommendations to the Same Users Multiple Times. Research and Advanced Technology for Digital Libraries. TPDL 2013, Springer. Lecture Notes in Computer Science."},{"key":"ref_31","unstructured":"Kurosu, M. (2009). Using Google Analytics to Evaluate the Usability of E-Commerce Sites. Human Centered Design. HCD 2009, Springer. Lecture Notes in Computer Science."},{"key":"ref_32","unstructured":"Sculley, D., Malkin, R.G., Basu, S., and Bayardo, R.J. (July, January 28). Predicting bounce rates in sponsored search advertisements. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France."},{"key":"ref_33","unstructured":"Mikalef, P., Giannakos, M.N., and Pateli, A.G. (2012, January 17\u201320). Exploring the Business Potential of Social Media: An Utilitarian and Hedonic Motivation Approach. Proceedings of the Bled eConference, Bled, Slovenia."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1007\/s10618-010-0180-z","article-title":"Web robot detection techniques: Overview and limitations","volume":"22","author":"Doran","year":"2011","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_35","unstructured":"Koshy, J. (2019, January 15). Web Scraping: Challenges and Roadblocks. Available online: https:\/\/www.promptcloud.com\/blog\/challenges-roadblocks-in-web-scraping."},{"key":"ref_36","unstructured":"Seroussi, Y. (2019, January 15). The Wonderful World of Recommender Systems. Available online: https:\/\/yanirseroussi.com\/2015\/10\/02\/the-wonderful-world-of-recommender-systems\/."},{"key":"ref_37","unstructured":"Episerver (2019, January 15). A Guide to Intelligent Personalization: Part 2\u2014Personalizing Product and Basket Pages. Available online: https:\/\/www.episerver.com\/learn\/guides\/guide-to-intelligent-ecommerce-personalization\/personalized-product-recommendations-and-product-pages\/."},{"key":"ref_38","unstructured":"Tackett, J. (2019, January 15). Online Testing: How to Use A\/A Testing to Break through the Noise. Available online: https:\/\/marketingexperiments.com\/a-b-testing\/use-variance-testing-break-through-noise."},{"key":"ref_39","unstructured":"(2019, January 15). A\/A Testing. Available online: https:\/\/www.optimizely.com\/optimization-glossary\/aa-testing\/."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/4\/1\/23\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:30:01Z","timestamp":1760185801000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/4\/1\/23"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,31]]},"references-count":39,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["data4010023"],"URL":"https:\/\/doi.org\/10.3390\/data4010023","relation":{},"ISSN":["2306-5729"],"issn-type":[{"type":"electronic","value":"2306-5729"}],"subject":[],"published":{"date-parts":[[2019,1,31]]}}}