Crowdsourced corpus with entity salience annotations
Abstract:
In this paper, we present a crowdsourced dataset which adds entity salience (importance) annotations to the Reuters-128 dataset, which is subset of Reuters-21578. The dataset is distributed under a free license and publish in the NLP Interchange Format, which fosters interoperability and re-use. We show the potential of the dataset on the task of learning an entity salience classifier and report on the results from several experiments.
Año de publicación:
2016
Keywords:
- Named entities
- Entity importance
- text analysis
- Entity salience
- Document aboutness
Fuente:

Tipo de documento:
Conference Object
Estado:
Acceso restringido
Áreas de conocimiento:
- Aprendizaje automático
- Ciencias de la computación
Áreas temáticas:
- Funcionamiento de bibliotecas y archivos
- Miscelánea filosófica
- Lingüística