Estudio sobre el impacto del corpus de entrenamiento del modelo de lenguaje en las prestaciones de un reconocedor de habla

  1. Docío Fernández, Laura
  2. Regueira, Xosé Luis
  3. Piñeiro Martín, Andrés
  4. García Mateo, Carmen
Revue:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Année de publication: 2018

Número: 61

Pages: 75-82

Type: Article

D'autres publications dans: Procesamiento del lenguaje natural

Résumé

Within the automatic speech recognition, statistical language models based on the probability of word sequences (n-grams) represent one of the two pillars on which its correct functioning is based. In this paper, the impact they have on the recognition result is exposed as these models are improved with more text of better quality, when these are adjusted to the final application of the system, and therefore, when the number out of vocabulary (OOV) words is reduced. The recognizer with the different language models has been applied to audio cuts corresponding to three experimental frames: formal orality, talk on newscasts, and TED talks in Galician. The results obtained clearly show an improvement over the experimental frameworks proposed.

Références bibliographiques

  • Peddinti, Vijayaditya, D. Povey y S. Khudanpur. 2015. A time delay natural network architecture for efficient modeling of long temporal context. En Proceedings of INTERSPEECH.
  • Stolcke, Andreas. 2002. SRILM An extensible language modeling toolkit. En Proceedings of the International Conference on Statistical Language Processing. Denver, Colorado.
  • García, Carmen, J. Tirado, L. Docío y A. Cardenal. 2004. Transcrigal: A bilingual system for automatic indexing of broadcast news. IV International Conference on Language Resources and Evaluation.
  • Docío, Laura, A. Cardenal y C. García. 2006. TC-STAR 2006 automatic speech recognition evaluation: The uvigo system. En Proc. Of TC-STAR Workshop on Speechto-Speech Translation. ELRA, París, France.
  • Jurafsky, Daniel, y J.H. Martin. 2008. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.
  • Vicente, Marta, C. Barros, F. Peregrino, F. Agulló y E. Lloret. 2015. La generación de lenguaje natural: análisis del estado actual. Computación y Sistemas. Volumen: 9, n.º 4.
  • Povey, Daniel, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlícek, Y. Quian, P. Schwarz, J. Silovský, G. Stemmer y K. Veselý. 2011. The Kaldi Speech Recognition Toolkit. En ASRU Workshop.
  • Campillo, Francisco y E. Rodríguez. 2005. Evaluación del modelado acústico y prosódico del sistema de conversión textovoz Cotovía. En Procesamiento del Lenguaje Natural. Volumen 35, páginas 512.
  • Alegría, Iñaki, I. Arantzabal, M. Forcada, X. Gómez, L. Padró, J.R. Pichel y J. Waliño. 2006. OpenTrad: Traducción automática de código abierto para las lenguas del estado Español. En Procesamiento del Lenguaje Natural. Volumen: 37, páginas 356-358.
  • Mikolov, Tomas, S. Kombrink, A. Deoras, L. Bruget y J. Cernocky. 2011. Rnnlmrecurrent neuronal network language modeling toolkit. En Proc. of ASRU Workshop.
  • Xu, Hainan, T. Chen, D. Gao, Y. Wang, K. Li, N. Goel, Y. Carmiel, D. Povey y S. Khudanpur. 2018. A pruned rnnlm latticerescoring algorithm for automatic speech recognition. En ICASSP.
  • Sundermeyer, Martin, Z. Tüske, R. Schlüter y H. Ney. 2014. Lattice decoding and rescoring with long-span neural network language models. En Fifteenth Annual Conference of the International Speech Communication Association.
  • Chen, Xie, X. Liu, A. Ragni, Y. Wang y M. Gales. 2017. Future word contexts in neuroal network language models. ArXiv preprint arXiv:170805592.