{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T09:04:39Z","timestamp":1762074279358,"version":"build-2065373602"},"reference-count":55,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,11,15]],"date-time":"2022-11-15T00:00:00Z","timestamp":1668470400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["JTAER"],"abstract":"<jats:p>A typical retailer carries 10,000 stock-keeping units (SKUs). However, these numbers may exceed hundreds of millions for giants such as Walmart and Amazon. Besides the volume, SKU data can also be high-dimensional, which means that SKUs can be segmented on the basis of various attributes. Given the data volumes and the multitude of potentially important dimensions to consider, it becomes computationally impossible to individually manage each SKU. Even though the application of clustering for SKU segmentation is common, previous studies do not address the problem of parametrization and model finetuning, which may be extremely tedious and time-consuming in real-world applications. Our work closes the research gap by proposing a solution that leverages automated machine learning for the automated cluster analysis of SKUs. The proposed framework for automated SKU segmentation incorporates minibatch K-means clustering, principal component analysis, and grid search for parameter tuning. It operates on top of the Apache Parquet file format, an efficient, structured, compressed, column-oriented, and big-data-friendly format. The proposed solution was tested on the basis of a real-world dataset that contained data at the pallet level.<\/jats:p>","DOI":"10.3390\/jtaer17040076","type":"journal-article","created":{"date-parts":[[2022,11,15]],"date-time":"2022-11-15T02:36:40Z","timestamp":1668479800000},"page":"1512-1528","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["AutoML Approach to Stock Keeping Units Segmentation"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7457-6040","authenticated-orcid":false,"given":"Ilya","family":"Jackson","sequence":"first","affiliation":[{"name":"Center for Transportation & Logistics, Massachusetts Institute of Technology, 1 Amherst Street, Cambridge, MA 02142, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,11,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Phadnis, S.S., Sheffi, Y., and Caplice, C. (2022). Scenario Creation in Supply Chain Contexts. Strategic Planning for Dynamic Supply Chains, Springer International Publishing.","DOI":"10.1007\/978-3-030-91810-1"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Jackson, I. (2022, January 4\u20137). Deep Reinforcement Learning for Supply Chain Synchronization. Proceedings of the Annual Hawaii International Conference on System Sciences, Maui, HI, USA.","DOI":"10.24251\/HICSS.2022.246"},{"key":"ref_3","unstructured":"US Department of Commerce (2022, October 15). Manufacturing and Trade Inventories and Sales, Main Page, US Census Bureau, Available online: https:\/\/www.census.gov\/mtis\/index.html."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"652","DOI":"10.3390\/jtaer17020034","article-title":"Effect of Interactivity Level and Price on Online Purchase Intention","volume":"17","author":"Summerlin","year":"2022","journal-title":"J. Theor. Appl. Electron. Commer. Res."},{"key":"ref_5","first-page":"748","article-title":"Universals in Management Planning and Control","volume":"43","author":"Juran","year":"1954","journal-title":"Manag. Rev."},{"key":"ref_6","unstructured":"Fisher, M., and Raman, A. (2010). The New Science of Retailing, Harvard Business Review Press."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1016\/0272-6963(90)90010-B","article-title":"Operations related groups (ORGs): A clustering procedure for production\/inventory systems","volume":"9","author":"Ernst","year":"1990","journal-title":"J. Oper. Manag."},{"key":"ref_8","unstructured":"Kabashkin, I., Yatskiv (Jackiva), I., and Prentkovskis, O. (2019). Unsupervised Learning-Based Stock Keeping Units Segmentation. Reliability and Statistics in Transportation and Communication, Springer International Publishing."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1111\/j.1937-5956.1997.tb00427.x","article-title":"Configuring a Supply Chain to Reduce the Cost of Demand Uncertainty","volume":"6","author":"Fisher","year":"1997","journal-title":"Prod. Oper. Manag."},{"key":"ref_10","first-page":"465","article-title":"Pack size effects on retail store inventory and storage space needs","volume":"59","author":"Das","year":"2021","journal-title":"INFOR Inf. Syst. Oper. Res."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Guin\u00e9e, J., and Heijungs, R. (2016). Introduction to Life Cycle Assessment. Sustainable Supply Chains, Springer International Publishing.","DOI":"10.1007\/978-3-319-29791-0_2"},{"key":"ref_12","unstructured":"Thomson Reuters Streetevents (2022, October 15). WMT\u2014Q4 2018 Wal Mart Stores Inc Earnings Call. Available online: https:\/\/corporate.walmart.com\/media-library\/document\/q4fy18-earnings-webcast-transcript\/_proxyDocument?id=00000161-d2c0-dfc5-a76b-f3f01e430000."},{"key":"ref_13","unstructured":"Big Commerce (2022, October 15). Amazon Statistics You Should Know: Opportunities to Make the Most of America\u2019s Top Online Marketplace. Available online: https:\/\/www.bigcommerce.com\/blog\/amazon-statistics\/#a-shopping-experience-beyond-compare."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1108\/eb054765","article-title":"Multiple Criteria ABC Analysis","volume":"6","author":"Flores","year":"1986","journal-title":"Int. J. Oper. Prod. Manag."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Jackson, I. (2020). Neuroevolutionary Approach to Metamodeling of Production-Inventory Systems with Lost-Sales and Markovian Demand. Lecture Notes in Networks and Systems, Springer International Publishing.","DOI":"10.1007\/978-3-030-44610-9_10"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Cohen, M.C., Gras, P.E., Pentecoste, A., and Zhang, R. (2022). Clustering Techniques. Demand Prediction in Retail, Springer International Publishing.","DOI":"10.1007\/978-3-030-85855-1"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"615","DOI":"10.1016\/S0360-8352(99)00155-2","article-title":"A comprehensive clustering algorithm for strategic analysis of supply chain networks","volume":"36","author":"Srinivasan","year":"1999","journal-title":"Comput. Ind. Eng."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1016\/j.ijpe.2012.06.010","article-title":"Empirically-driven hierarchical classification of stock keeping units","volume":"143","author":"Bacchetti","year":"2013","journal-title":"Int. J. Prod. Econ."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Hennig, C., Meila, M., Murtagh, F., and Rocci, R. (2015). Handbook of Cluster Analysis, Chapman and Hall\/CRC.","DOI":"10.1201\/b19706"},{"key":"ref_20","unstructured":"Wierzchon, S., and Klopotek, M. (2019). Modern Algorithms of Cluster Analysis, Studies in Big Data; Springer International Publishing."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"774","DOI":"10.1080\/09537280500180949","article-title":"Applying two-stage SOM-based clustering approaches to industrial data analysis","volume":"16","author":"Canetta","year":"2005","journal-title":"Prod. Plan. Control."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1287\/inte.1050.0195","article-title":"Managing Short Life-Cycle Technology Products for Agere Systems","volume":"36","author":"Wu","year":"2006","journal-title":"Interfaces"},{"key":"ref_23","unstructured":"Egas, C., and Masel, D.T. (2010, January 21\u201324). Determining Warehouse Storage Location Assignments Using Clustering Analysis. Proceedings of the 11th IMHRC Conference, Milwaukee, WI, USA."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ozturk, Z.K., Cetin, Y., Isik, Y., and Cicek, Z.E. (2021). Demand Forecasting with Clustering and Artificial Neural Networks Methods: An Application for Stock Keeping Units. Springer Proceedings in Mathematics and Statistics, Springer International Publishing.","DOI":"10.1007\/978-3-030-78163-7_15"},{"key":"ref_25","first-page":"1","article-title":"Multiple Criteria ABC Analysis with FCM Clustering","volume":"2013","author":"Keskin","year":"2013","journal-title":"J. Ind. Eng."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Kucukdeniz, T., and Erkal, S. (2022, January 19\u201321). Integrated Warehouse Layout Planning with Fuzzy C-Means Clustering. Proceedings of the International Conference on Intelligent and Fuzzy Systems\u2014INFUS 2022, \u0130zmir, Turkey.","DOI":"10.1007\/978-3-031-09173-5_24"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1049\/iet-ipr.2016.0282","article-title":"Generalised fuzzy c-means clustering algorithm with local information","volume":"11","author":"Memon","year":"2017","journal-title":"IET Image Process."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"667","DOI":"10.1108\/IMDS-09-2015-0361","article-title":"Constrained clustering method for class-based storage location assignment in warehouse","volume":"116","author":"Yang","year":"2016","journal-title":"Ind. Manag. Data Syst."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1007\/s11334-020-00372-5","article-title":"Product recommendation for e-commerce business by applying principal component analysis (PCA) and K-means clustering: Benefit for the society","volume":"17","author":"Bandyopadhyay","year":"2020","journal-title":"Innov. Syst. Softw. Eng."},{"key":"ref_30","first-page":"2349","article-title":"Orange: Data Mining Toolbox in Python","volume":"14","author":"Gorup","year":"2013","journal-title":"J. Mach. Learn. Res."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, B. (2011). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer. [2nd ed.]. Data-Centric Systems and Applications.","DOI":"10.1007\/978-3-642-19460-3"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1109\/34.1000236","article-title":"Mean shift: A robust approach toward feature space analysis","volume":"24","author":"Comaniciu","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1016\/j.chemolab.2012.11.006","article-title":"Revised DBSCAN algorithm to cluster data with dense adjacent clusters","volume":"120","author":"Tran","year":"2013","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Vohra, D. (2016). Apache Parquet. Practical Hadoop Ecosystem, Apress.","DOI":"10.1007\/978-1-4842-2199-0"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Floratou, A. (2019). Columnar Storage Formats. Encyclopedia of Big Data Technologies, Springer International Publishing.","DOI":"10.1007\/978-3-319-77525-8_248"},{"key":"ref_36","unstructured":"Han, J., Pei, J., Kamber, M., and Safari, A.O.M.C. (2011). Data Mining: Concepts and Techniques, Elsevier. [3rd ed.]. OCLC: 1112917381."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Sculley, D. (2010, January 26\u201330). Web-scale k-means clustering. Proceedings of the 19th International Conference on World Wide Web\u2014WWW\u201910, Raleigh, NC, USA.","DOI":"10.1145\/1772690.1772862"},{"key":"ref_38","unstructured":"van der Maaten, L., Postma, E.O., and van den Herik, J. (2009). Dimensionality Reduction: A Comparative Review, Tilburg University."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Lippi, V., and Ceccarelli, G. (2019, January 29\u201331). Incremental Principal Component Analysis: Exact Implementation and Continuity Corrections. Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics, Prague, Czech Republic.","DOI":"10.5220\/0007743604730480"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1016\/j.ins.2015.06.039","article-title":"Recovering the number of clusters in data sets with noise features using feature rescaling factors","volume":"324","author":"Hennig","year":"2015","journal-title":"Inf. Sci."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: A graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J. Comput. Appl. Math."},{"key":"ref_42","unstructured":"Pelleg, D., and Moore, A.W. (July, January 29). X-Means: Extending K-Means with Efficient Estimation of the Number of Clusters. Proceedings of the Seventeenth International Conference on Machine Learning (ICML\u201900), Standord, CA, USA."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1186\/s13040-017-0155-3","article-title":"Ten quick tips for machine learning in computational biology","volume":"10","author":"Chicco","year":"2017","journal-title":"BioData Min."},{"key":"ref_44","first-page":"281","article-title":"Random Search for Hyper-Parameter Optimization","volume":"13","author":"Bergstra","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_45","unstructured":"McKinney, W. (July, January 28). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference, Austin, TX, USA."},{"key":"ref_46","unstructured":"Pandas Development Team (2022, November 01). pandas-dev\/pandas: Pandas. Available online: https:\/\/zenodo.org\/record\/7223478#.Y3HSduRBxPY."},{"key":"ref_47","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_48","unstructured":"Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., and Grobler, J. (2013, January 23). API design for machine learning software: Experiences from the scikit-learn project. Proceedings of the ECML PKDD Workshop: Languages for Data Mining and Machine Learning, Prague, Czech Republic."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1007\/s11263-007-0075-7","article-title":"Incremental Learning for Robust Visual Tracking","volume":"77","author":"Ross","year":"2007","journal-title":"Int. J. Comput. Vis."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"3021","DOI":"10.21105\/joss.03021","article-title":"seaborn: Statistical data visualization","volume":"6","author":"Waskom","year":"2021","journal-title":"J. Open Source Softw."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1038\/s41592-019-0686-2","article-title":"SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python","volume":"17","author":"Virtanen","year":"2020","journal-title":"Nat. Methods"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/4235.585893","article-title":"No free lunch theorems for optimization","volume":"1","author":"Wolpert","year":"1997","journal-title":"IEEE Trans. Evol. Comput."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1007\/s00357-001-0004-3","article-title":"K-modes Clustering","volume":"18","author":"Chaturvedi","year":"2001","journal-title":"J. Classif."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1016\/j.neucom.2013.04.011","article-title":"An improved k-prototypes clustering algorithm for mixed numeric and categorical data","volume":"120","author":"Ji","year":"2013","journal-title":"Neurocomputing"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Guo, X., Liu, X., Zhu, E., and Yin, J. (2017). Deep Clustering with Convolutional Autoencoders. Neural Information Processing, Springer International Publishing.","DOI":"10.1007\/978-3-319-70096-0_39"}],"container-title":["Journal of Theoretical and Applied Electronic Commerce Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/0718-1876\/17\/4\/76\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:18:11Z","timestamp":1760145491000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/0718-1876\/17\/4\/76"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,15]]},"references-count":55,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["jtaer17040076"],"URL":"https:\/\/doi.org\/10.3390\/jtaer17040076","relation":{},"ISSN":["0718-1876"],"issn-type":[{"type":"electronic","value":"0718-1876"}],"subject":[],"published":{"date-parts":[[2022,11,15]]}}}