Comparison and evaluation of different methods for the feature extraction from educational contents


Abstract:

This paper analyses the capabilities of different techniques to build a semantic representation of educational digital resources. Educational digital resources are modeled using the Learning Object Metadata (LOM) standard, and these semantic representations can be obtained from different LOM fields, like the title, description, among others, in order to extract the features/characteristics from the digital resources. The feature extraction methods used in this paper are the Best Matching 25 (BM25), the Latent Semantic Analysis (LSA), Doc2Vec, and the Latent Dirichlet allocation (LDA). The utilization of the features/descriptors generated by them are tested in three types of educational digital resources (scientific publications, learning objects, patents), a paraphrase corpus and two use cases: in an information retrieval context and in an educational recommendation system. For this analysis are used unsupervised metrics to determine the feature quality proposed by each one, which are two similarity functions and the entropy. In addition, the paper presents tests of the techniques for the classification of paraphrases. The experiments show that according to the type of content and metric, the performance of the feature extraction methods is very different; in some cases are better than the others, and in other cases is the inverse.

Año de publicación:

2020

Keywords:

  • content analysis
  • Educational contents
  • Semantic representation
  • Recommendation system
  • Information Retrieval
  • Feature Extraction

Fuente:

scopusscopus

Tipo de documento:

Article

Estado:

Acceso abierto

Áreas de conocimiento:

  • Tecnología educativa
  • Ciencias de la computación

Áreas temáticas:

  • Funcionamiento de bibliotecas y archivos