{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T21:26:19Z","timestamp":1780521979814,"version":"3.54.1"},"reference-count":28,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T00:00:00Z","timestamp":1754092800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>I address the challenge of extracting reliable insights from large datasets using a simplified model that illustrates how hierarchical classification can distort outcomes. The model consists of discrete pixels labeled red, blue, or white. Red and blue indicate distinct properties, while white represents unclassified or ambiguous data. A macro-color is assigned only if one color holds a strict majority among the pixels. Otherwise, the aggregate is labeled white, reflecting uncertainty. This setup mimics a percolation threshold at fifty percent. Assuming that directly accessing the various proportions from the data of colors is infeasible, I implement a hierarchical coarse-graining procedure. Elements (first pixels, then aggregates) are recursively grouped and reclassified via local majority rules, ultimately producing a single super-aggregate for which the color represents the inferred macro-property of the collection of pixels as a whole. Analytical results supported by simulations show that the process introduces additional white aggregates beyond white pixels, which could be present initially; these arise from groups lacking a clear majority, requiring arbitrary symmetry-breaking decisions to attribute a color to them. While each local resolution may appear minor and inconsequential, their repetitions introduce a growing systematic bias. Even with complete data, unavoidable asymmetries in local rules are shown to skew outcomes. This study highlights a critical limitation of recursive data reduction. Insight extraction is shaped not only by data quality but also by how local ambiguity is handled, resulting in built-in biases. Thus, the related flaws are not due to the data but to structural choices made during local aggregations. Although based on a simple model, these findings expose a high likelihood of inherent flaws in widely used hierarchical classification techniques.<\/jats:p>","DOI":"10.3390\/info16080661","type":"journal-article","created":{"date-parts":[[2025,8,5]],"date-time":"2025-08-05T10:50:21Z","timestamp":1754391021000},"page":"661","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Ambiguities, Built-In Biases, and Flaws in Big Data Insight Extraction"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9003-5420","authenticated-orcid":false,"given":"Serge","family":"Galam","sequence":"first","affiliation":[{"name":"CEVIPOF \u2014Centre for Political Research, SciencesPo and CNRS, 1, Place Saint-Thomas d\u2019Aquin, 75007 Paris, France"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"20150202","DOI":"10.1098\/rsta.2015.0202","article-title":"Principal Component Analysis: A Review and Recent Developments","volume":"374","author":"Jolliffe","year":"2016","journal-title":"Philos. Trans. R. Soc. A"},{"key":"ref_2","first-page":"2579","article-title":"Visualizing Data Using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_3","unstructured":"Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996, January 2\u20134). A Density-Based Algorithm for Discovering Aggregates in Large Spatial Databases with Noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, OR, USA."},{"key":"ref_4","unstructured":"(2025, April 05). Coarse-Grained Modeling. Available online: https:\/\/en.wikipedia.org\/wiki\/Coarse-grained_modeling."},{"key":"ref_5","unstructured":"Goldenfeld, N. (1992). Lectures on Phase Transitions and the Renormalization Group, CRC Press."},{"key":"ref_6","unstructured":"Barocas, S., Hardt, M., and Narayanan, A. (2025, April 05). Fairness and Machine Learning: Limitations and Opportunities. Available online: http:\/\/fairmlbook.org."},{"key":"ref_7","unstructured":"Doshi-Velez, F., and Kim, B. (2017). Towards A Rigorous Science of Interpretable Machine Learning. arXiv."},{"key":"ref_8","unstructured":"Lipton, Z.C. (2016, January 23). The Mythos of Model Interpretability. Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY, USA."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"113119","DOI":"10.1016\/j.chaos.2023.113119","article-title":"Identifying a would-be terrorist: An ineradicable error in the data processing?","volume":"168","author":"Galam","year":"2023","journal-title":"Chaos Solitons Fractals"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Galam, S., and Cheon, T. (2020). Tipping points in opinion dynamics: A universal formula in five dimensions. Front. Phys., 8.","DOI":"10.3389\/fphy.2020.566580"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"3619","DOI":"10.1016\/j.physa.2010.04.039","article-title":"Public debates driven by incomplete scientific data: The cases of evolution theory, global warming and H1N1 pandemic influenza","volume":"389","author":"Galam","year":"2010","journal-title":"Phys. A"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1016\/j.chaos.2016.03.011","article-title":"The invisible hand and the rational agent are behind bubbles and crashes","volume":"88","author":"Galam","year":"2016","journal-title":"Chaos Solitons Fractals"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.chaos.2016.03.022","article-title":"Preface: Complexity in quantitative finance and economics","volume":"88","author":"Anteneodo","year":"2016","journal-title":"Chaos Solitons Fractals"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Alencar, D.S.M., Alves, T.F.A., Alves, G.A., Macedo-Filho, A., Ferreira, R.S., Lima, F.W.S., and Plascak, J.A. (2023). Opinion Dynamics Systems on Barab\u00e1si-Albert Networks: Biswas-Chatterjee-Sen Model. Entropy, 25.","DOI":"10.3390\/e25020183"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2450125","DOI":"10.1142\/S0129183124501250","article-title":"Phase transition and universality of the majority-rule model on complex networks","volume":"35","author":"Mulya","year":"2024","journal-title":"Int. J. Mod. Phys. C"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"101667","DOI":"10.1016\/j.jocs.2022.101667","article-title":"On reaching the consensus by disagreeing","volume":"61","author":"Weron","year":"2022","journal-title":"J. Comput. Sci."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"128714","DOI":"10.1016\/j.physa.2023.128714","article-title":"Exploring the foundation of social diversity and coherence with a novel attraction-repulsion model framework","volume":"618","author":"Cui","year":"2023","journal-title":"Phys. A"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"468","DOI":"10.3390\/physics6020031","article-title":"A Theory of Best Choice Selection through Objective Arguments Grounded in Linear Response Theory Concepts","volume":"6","author":"Ausloos","year":"2024","journal-title":"Physics"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1013","DOI":"10.3390\/physics6030062","article-title":"Agent Mental Models and Bayesian Rules as a Tool to Create Opinion Dynamics Models","volume":"6","author":"Martins","year":"2024","journal-title":"Physics"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Dworak, M., and Malarz, K. (2023). Vanishing Opinions in Latan\u00e9 Model of Opinion Formation. Entropy, 25.","DOI":"10.3390\/e25010058"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Devia, C.A., and Giordano, G. (2023). Probabilistic analysis of agent-based opinion formation models. Sci. Rep., 13.","DOI":"10.1038\/s41598-023-46789-3"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Devia, C.A., and Giordano, G. (2024). Graphical analysis of agent-based opinion formation models. PLoS ONE, 19.","DOI":"10.1371\/journal.pone.0303204"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2450201","DOI":"10.1142\/S0129183124502012","article-title":"Dynamics of drug trafficking: Results from a simple compartmental model","volume":"36","author":"Crokidakis","year":"2025","journal-title":"Int. J. Mod. Phys. C"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"114544","DOI":"10.1016\/j.chaos.2024.114544","article-title":"Breaking the symmetry neutralizes the extremization under the repulsion and higher order interactions","volume":"180","author":"Huang","year":"2024","journal-title":"Chaos Solitons Fractals"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"pgaf082","DOI":"10.1093\/pnasnexus\/pgaf082","article-title":"How out-group animosity can shape partisan divisions: A model of affective polarization","volume":"4","author":"Nettasinghe","year":"2025","journal-title":"PNAS Nexus"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Maksymov, I.S., and Pogrebna, G. (2024). The Physics of Preference: Unravelling Imprecision of Human Preferences through Magnetisation Dynamics. Information, 15.","DOI":"10.3390\/info15070413"},{"key":"ref_27","first-page":"129254","article-title":"A mathematical model for the bullying dynamics in schools","volume":"492","author":"Crokidakis","year":"2025","journal-title":"Appl. Math. Comput."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1142\/S0218202517400012","article-title":"Geometric vulnerability of democratic institutions against lobbying: A sociophysics approach","volume":"27","author":"Galam","year":"2017","journal-title":"Math. Model. Methods Appl. Sci."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/8\/661\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:21:49Z","timestamp":1760034109000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/8\/661"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,2]]},"references-count":28,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,8]]}},"alternative-id":["info16080661"],"URL":"https:\/\/doi.org\/10.3390\/info16080661","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,2]]}}}