'Spanish Políglota': An automatic Speech Recognition system based on HMM
Abstract:
The goal of this ASR system is to be able to recognize audio queries that request static translation of a given Spanish word into a specified language. We call this ASR system as the Spanish Políglota. The pronunciation dictionary for the language model is obtained by applying grapheme to phoneme conversion. It was developed via Festival Speech Synthesis Scheme scripts and the SPPAS Spanish lexicon. The possible audio queries are restricted by a BNF grammar we designed for this project. A triphone acoustic model was generated from a set of 1621 words audio recordings. This acoustic model is based on a N-gram model that estimates its probabilities based on the maximum likelihood estimation MLE. We evaluated the pbkp_rediction of individual words, as well as of synthetic phrases. We generated 1577 synthetic phrases concatenating the words of our audio set. The performance was also measured over a new set of audio recordings from a different speaker. Evaluation of isolated word recognition achieved 77.91% of correct pbkp_redictions. Nevertheless, the performance dropped when evaluating the synthetic phrases as well as the second speaker's speech. We consider it is an initial step towards the development of a fully functional automatic speech recognition system.
Año de publicación:
2021
Keywords:
- Spanish
- Grapheme-to-Phoneme Conversion
- HMM
- HTK
- Automatic speech recognition
- ASR
- Voxforge
- Julius
Fuente:

Tipo de documento:
Conference Object
Estado:
Acceso restringido
Áreas de conocimiento:
- Inteligencia artificial
- Ciencias de la computación
Áreas temáticas:
- Métodos informáticos especiales
- Lingüística