Multilingual and crosslingual acoustic modelling for automatic speech recognition

  1. Diehl, Frank
Dirixida por:
  1. María Asunción Moreno Bilbao Director

Universidade de defensa: Universitat Politècnica de Catalunya (UPC)

Fecha de defensa: 18 de maio de 2007

Tribunal:
  1. José Bernardo Mariño Acebal Presidente/a
  2. Enric Monte Moreno Secretario/a
  3. Daniel Tapias Merino Vogal
  4. Zdravco Kacic Vogal
  5. Carmen García Mateo Vogal

Tipo: Tese

Teseo: 137977 DIALNET

Resumo

This thesis studies the definition, implementation and validation of multilingual and crosslingual acoustic models for automatic speech recognition (ASR), The acoustic model constitutes one of the basic building blocks of an automatic speech recognition system. In today's state-of-the-art ASR systems it is common practise to extract the parameters of the acoustic model from a acoustic template database. It has been shown that this methodology results in high performance ASR systems. However, a principal drawback of this procedure consist in its dependency on suitable speech databases to train the models, and the inevitable dependency of the final target system on the language used for training the models. That is, in case of acoustic model training, a acoustic model can hardly be build if no or only a limited amount of speech material of a target language is available, and, during recognition, the ASR system is fixed to the language which was used to train it. Multilingual and crosslingual acoustic modelling is seen as a potential way to overcome these drawbacks at least partly. The basic idea consists in sharing acoustic knowledge between languages, or to reuse already available acoustic knowledge from one or more source languages for a target language. The thesis on hand thus focuses on two major aspects of multilingual and crosslingual acoustic modelling: acoustic model definition and acoustic model adaptation. In case of acoustic model definition the stress lies on the definition of suitable linguistic features. Linguistic features constitute the input domain of the phonetic-acoustic decision tree which is used to define context dependent acoustic models. Usually such features are derived knowledge-based by a linguistic expert which is familiar with both, the source and the target language. However, linguistic experts which are familiar with all concerned languages might be hard to find. Thus, in a multilingual but also in the crosslingual enviro