AL4LA: Active Learning for Text Labeling Based on Paragraph Vectors
Abstract:
Nowadays, despite the huge amount of digitized information, the biggest drawback to use machine learning in text mining is the lack of availability of a set of tagged data due to mainly, that it requires a great user effort that it is not always viable. In this paper, with the aim of reducing the great workload required to manually processing the contents of large volumes of documents, we present a methodology based on probabilistic inference and active learning to label documents in Spanish using a semi-supervised approach. First, a vector representation of the documents is generated, and then an interactive learning process to apply both, automatic and manual labeling is proposed. To evaluate the accuracy of the pbkp_redictions and the efficiency of the methodology, different configurations regarding the automatic and manual labeling processes have been studied. The proposed methodology reduces the need for a large corpus of manually labeled texts by introducing a self-labeling process during training. We have shown that both tagging approaches can be combined maintaining accuracy and reducing user intervention.
Año de publicación:
2019
Keywords:
- Paragraph vectors
- Text labeling
- Text categorization
- Active learning
Fuente:
Tipo de documento:
Conference Object
Estado:
Acceso restringido
Áreas de conocimiento:
- Aprendizaje automático
- Ciencias de la computación
Áreas temáticas:
- Funcionamiento de bibliotecas y archivos