files/journal/2022-09-02_12-20-40-000000_622.png

International Journal of Soft Computing

ISSN: Online
ISSN: Print 1816-9503
3
Views
3
Downloads

Data Sovereignty and the Myth of the Universal Dataset: A Critical Review of Benchmarking in Machine Learning

Erwin L. Rimban
Page: 1-8 | Received 05 Feb 2024, Published online: 29 Apr 2024

Full Text Reference XML File PDF File

Abstract

This paper presents a critical review of the concept of universal benchmark datasets in machine learning through the lens of data sovereignty and decolonial theory. While benchmark datasets like Image Net, COCO and GLUE have become standard tools for evaluating model performance, they often reflect Western cultural norms, linguistic biases, and geopolitical priorities. Drawing on theoretical frameworks from Walter Mignolo's epistemic disobedience, Boaventura de Sousa Santos's epistemologies of the South, Miranda Fricker's epistemic injustice and Philip Alston's digital colonialism, this paper critically examines the historical development, construction politics and universality claims of benchmark datasets. The analysis reveals how these datasets marginalize non‐Western knowledge systems and perpetuate colonial power dynamics in data practices. As alternatives, this paper proposes data pluriverses, co‐design frameworks for localized benchmarking, decentralized dataset stewardship and integration of Indigenous data governance principles like CARE (Collective Benefit, Authority to Control, Responsibility, Ethics). The paper concludes by emphasizing the urgent need to dismantle universalist assumptions in AI development and calls for more ethical and pluralistic data practices in machine learning research.


How to cite this article:

Erwin L. Rimban. Data Sovereignty and the Myth of the Universal Dataset: A Critical Review of Benchmarking in Machine Learning.
DOI: https://doi.org/10.36478/makijsc.2024.1.8
URL: https://www.makhillpublications.co/view-article/1816-9503/makijsc.2024.1.8