TY - JOUR T1 - Data Sovereignty and the Myth of the Universal Dataset: A Critical Review of Benchmarking in Machine Learning AU - Rimban, Erwin JO - International Journal of Soft Computing VL - 19 IS - 1 SP - 1 EP - 8 PY - 2024 DA - 2001/08/19 SN - 1816-9503 DO - makijsc.2024.1.8 UR - https://makhillpublications.co/view-article.php?doi=makijsc.2024.1.8 KW - Artificial intelligence KW - benchmark datasets KW - data sovereignty KW - decolonial theory KW - machine learning KW - epistemic justice KW - indigenous data governance KW - data pluriverses AB -
This paper presents a critical review of the concept of universal benchmark datasets in machine learning through the lens of data sovereignty and decolonial theory. While benchmark datasets like Image Net, COCO and GLUE have become standard tools for evaluating model performance, they often reflect Western cultural norms, linguistic biases, and geopolitical priorities. Drawing on theoretical frameworks from Walter Mignolo's epistemic disobedience, Boaventura de Sousa Santos's epistemologies of the South, Miranda Fricker's epistemic injustice and Philip Alston's digital colonialism, this paper critically examines the historical development, construction politics and universality claims of benchmark datasets. The analysis reveals how these datasets marginalize non‐Western knowledge systems and perpetuate colonial power dynamics in data practices. As alternatives, this paper proposes data pluriverses, co‐design frameworks for localized benchmarking, decentralized dataset stewardship and integration of Indigenous data governance principles like CARE (Collective Benefit, Authority to Control, Responsibility, Ethics). The paper concludes by emphasizing the urgent need to dismantle universalist assumptions in AI development and calls for more ethical and pluralistic data practices in machine learning research.
ER -