Building a Gold Standard Dataset to Identify Articles About Geographic Information Science


Abstract:

To know the overall regional or international scientific production is of vital importance to many areas of knowledge. Nevertheless, in interdisciplinary areas such as Geographic Information Science (GISc) it is not enough to just count papers published in specific journals. Most of them, as is the case of the International Journal of Remote Sensing (IJRS), welcome GISc papers but are not exclusive to that area so the production assignable to authors in the region must consider not only affiliation but also whether or not each paper falls into the theme of GISc. IJRS publishes far more papers than any other GISc journal, so it is important to assess quantitatively how many of them are of GISc. In this work, a representative sample of IJRS articles published over a period of almost 30 years was analyzed using a specific GISc definition. With these data, a manual classification methodology through a set of experts was carried out, and a dataset was built, analyzed, and statistically tested. As a result we estimate that between 47 and 76% of the IJRS articles can be considered from GISc, with a confidence level of 95%. Aside from the primary goal, this set could be used as a gold standard for future classification tasks. It constitutes the first GISc dataset of this kind, that may be used to train artificial intelligence systems capable of performing the same classification automatically and in a massive way. A similar procedure could be applied to other interdisciplinary fields of knowledge as well.

Año de publicación:

2022

Keywords:

  • manual classification
  • indexer consistency
  • geographic information science
  • Gold standard

Fuente:

scopusscopus

Tipo de documento:

Article

Estado:

Acceso abierto

Áreas de conocimiento:

  • Base de datos
  • Base de datos
  • Geografía

Áreas temáticas:

  • Geografía y viajes
  • Funcionamiento de bibliotecas y archivos
  • Biblioteconomía y Documentación informatica