Crowdsourced corpus with entity salience annotations


Abstract:

In this paper, we present a crowdsourced dataset which adds entity salience (importance) annotations to the Reuters-128 dataset, which is subset of Reuters-21578. The dataset is distributed under a free license and publish in the NLP Interchange Format, which fosters interoperability and re-use. We show the potential of the dataset on the task of learning an entity salience classifier and report on the results from several experiments.

Año de publicación:

2016

Keywords:

  • Named entities
  • Entity importance
  • text analysis
  • Entity salience
  • Document aboutness

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Aprendizaje automático
  • Ciencias de la computación

Áreas temáticas:

  • Funcionamiento de bibliotecas y archivos
  • Miscelánea filosófica
  • Lingüística