CLexIS<sup>2</sup>: A New Corpus for Complex Word Identification Research in Computing Studies
Abstract:
Reading is a complex process not only because of the words or sections that are difficult for the reader to understand. Complex word identification (CWI) is the task of detecting in the content of documents the words that are difficult or complex to understand by the people of a certain group. Annotated corpora for English learners are widely available, while they are less common for the Spanish language. In this article, we present CLexIS2, a new corpus in Spanish to contribute to the advancement of research in the area of Lexical Simplification, specifically in the identification and pbkp_rediction of complex words in computing studies. Several metrics used to evaluate the complexity of texts in Spanish were applied, such as LC, LDI, ILFW, SSR, SCI, ASL, CS. Furthermore, as a baseline of the primer, two experiments have been performed to pbkp_redict the complexity of words: one using a supervised learning approach and the other using an unsupervised solution based on the frequency of words on a general corpus.
Año de publicación:
2021
Keywords:
Fuente:

Tipo de documento:
Conference Object
Estado:
Acceso abierto
Áreas de conocimiento:
Áreas temáticas:
- Programación informática, programas, datos, seguridad
- Cultura e instituciones
- Filosofía y teoría