Automatic voice pleasantness classification and intensity estimation for speech synthesis

Martins Pinto-Coelho, Luis Filipe

Automatic voice pleasantness classification and intensity estimation for speech synthesis

Martins Pinto-Coelho, Luis Filipe

Dirigée par:

Carmen García Mateo Directrice

Université de défendre: Universidade de Vigo

Fecha de defensa: 23 mars 2012

Jury:

Inmaculada Hernáez Rioja President
Eduardo Rodríguez Banga Secrétaire
José Luis Alba Castro Rapporteur
Francesc Josep Ferri Rabasa Rapporteur
António Joaquim da Silva Teixeira Rapporteur

Département:

Teoría do sinal e comunicacións

Type: Thèses

Teseo: 322070 DIALNET

Résumé

Speech synthesis systems based on hidden Markov models (HMMs) have defined the beginning of a new generation of Text-to-Speech systems (TTS) technology. The stochastic-based models can simultaneously describe time and frequency domain events, while maintaining a powerful and highly flexible synthesis framework. Despite the several recognized advantages, some authors report a background buzz or a muffled voice, among other issues, which shows the need for improvements on the speech description/generation model. Since there are already several adaptations of vocoding technologies to the HMM synthesis framework and none could provide an entirely satisfying result, in this work a different approach is proposed. With the objective of improving syntactic voice quality, we propose the development of a perceptually weighted adaptive filter technique that can enhance parameter generation ability on time and frequency domains and on an intra-segmental basis. The adaptation strategy will be based on prosodic correlates of voice preference in contextualized TTS applications for maximizing voice intelligibility and overall naturalness. The proposed work will be entirely dedicated to the European Portuguese language which still lacks several resources and tools.