TY  - JOUR
T1  - Data Sovereignty and the Myth of the Universal Dataset: A Critical Review of Benchmarking in Machine Learning
AU - Rimban, Erwin 
JO  - International Journal of Soft Computing
VL  - 19
IS  - 1
SP  - 1
EP  - 8
PY  - 2024
DA  - 2001/08/19
SN  - 1816-9503
DO  - makijsc.2024.1.8
UR  - https://makhillpublications.co/view-article.php?doi=makijsc.2024.1.8
KW  - Artificial intelligence
KW  - benchmark datasets
KW  - data sovereignty
KW  - decolonial theory
KW  - machine learning
KW  - epistemic justice
KW  - indigenous data governance
KW  - data pluriverses
AB  - <p style="text-align:justify">This paper presents a critical review of the concept of universal benchmark datasets in machine learning through the lens of data sovereignty and decolonial theory. While benchmark datasets like Image Net, COCO and GLUE have become standard tools for evaluating model performance, they often reflect Western cultural norms, linguistic biases, and geopolitical priorities. Drawing on theoretical frameworks from Walter Mignolo&#39;s epistemic disobedience, Boaventura de Sousa Santos&#39;s epistemologies of the South, Miranda Fricker&#39;s epistemic injustice and Philip Alston&#39;s digital colonialism, this paper critically examines the historical development, construction politics and universality claims of benchmark datasets. The analysis reveals how these datasets marginalize non‐Western knowledge systems and perpetuate colonial power dynamics in data practices. As alternatives, this paper proposes data pluriverses, co‐design frameworks for localized benchmarking, decentralized dataset stewardship and integration of Indigenous data governance principles like CARE (Collective Benefit, Authority to Control, Responsibility, Ethics). The paper concludes by emphasizing the urgent need to dismantle universalist assumptions in AI development and calls for more ethical and pluralistic data practices in machine learning research.</p>

ER  -