Neural Machine Translation tool from Spanish to English in the Medical Domain


Abstract:

In Natural Language Processing (NLP), the scarcity of linguistic resources (labeled corpus, parallel corpus, pre-trained models, etc.) can lead to poor performance when applying machine learning models, however, this can be solved by applying cross-lingual approaches (machine translation, word alignment, multilingual embedding, etc.), which is a paradigm for transferring knowledge from one language with resources to another language with fewer resources. In the medical domain, there are also few resources in Spanish compared to English, due to economic, legal, and ethical issues. In this regard, there is little evidence of evaluation and optimization of machine translations from Spanish to English in the medical domain. For this purpose, a neural machine translation tool with an induced word alignment is generated in this research, on which different optimization parameters have been experimented with and applying various parallel corpora within the medical domain, as reference results with the corpora EMA with 15 epochs, a BLUE of 88.55 in English-Spanish and Scielo Spanish - English with 25 epochs, a BLEU of 53.74, being a differential in evaluation results to convolutional translators and even greatly outperforming the pre-trained Fairseq results.

Año de publicación:

2022

Keywords:

  • Natural Language processing
  • medical domain
  • word alignment
  • machine translation
  • NLP
  • Cross-Lingual

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Inteligencia artificial
  • Ciencias de la computación

Áreas temáticas de Dewey:

  • Medicina y salud