Emerging Data Practices: Data Work in the Era of Large Language Models


Abstract:

Data is one of the foundational aspects of making Artificial Intelligence (AI) work as intended. As large language models (LLMs) become the epicenter of AI, it is crucial to understand better how the datasets that maintain such models are created. The emergent nature of LLMs makes it critical to understand the challenges practitioners developing Gen AI technologies face to design alternatives for better responding to Gen AI's ethical issues. In this paper, we provide such understanding by reporting on 25 interviews with practitioners who handle data in three distinct development stages of different LLMs. Our contributions are (1) empirical evidence of how uncertainty, data practices, and reliance mechanisms change across LLMs' development cycle; (2) how the unique qualities of LLMs impact data practices and their implications for the future of Gen AI technologies; and (3) provide three opportunities for HCI researchers interested in supporting practitioners developing Gen AI technologies.

Año de publicación:

2025

Keywords:

  • Ai
  • AI practitioners
  • Data governance
  • data practices
  • Data work
  • GenAI
  • generative AI
  • LLMs
  • synthetic data

Fuente:

scopusscopus

Tipo de documento:

Other

Estado:

Acceso restringido

Áreas de conocimiento:

  • Aprendizaje automático
  • Comunicación
  • Tecnologías de la información y la comunicación

Áreas temáticas de Dewey:

  • Métodos informáticos especiales
  • Programación informática, programas, datos, seguridad
  • Interacción social
Procesado con IAProcesado con IA

Objetivos de Desarrollo Sostenible:

  • ODS 9: Industria, innovación e infraestructura
  • ODS 12: Producción y consumo responsables
  • ODS 17: Alianzas para lograr los objetivos
Procesado con IAProcesado con IA