Automatic voice pleasantness classification and intensity estimation for speech synthesis

Martins Pinto-Coelho, Luis Filipe

Automatic voice pleasantness classification and intensity estimation for speech synthesis

Martins Pinto-Coelho, Luis Filipe

Dirixida por:

Carmen García Mateo Director

Universidade de defensa: Universidade de Vigo

Fecha de defensa: 23 de marzo de 2012

Tribunal:

Inmaculada Hernáez Rioja Presidente/a
Eduardo Rodríguez Banga Secretario
José Luis Alba Castro Vogal
Francesc Josep Ferri Rabasa Vogal
António Joaquim da Silva Teixeira Vogal

Departamento:

Teoría do sinal e comunicacións

Tipo: Tese

Teseo: 322070 DIALNET

Resumo

Speech synthesis systems based on hidden Markov models (HMMs) have defined the beginning of a new generation of Text-to-Speech systems (TTS) technology. The stochastic-based models can simultaneously describe time and frequency domain events, while maintaining a powerful and highly flexible synthesis framework. Despite the several recognized advantages, some authors report a background buzz or a muffled voice, among other issues, which shows the need for improvements on the speech description/generation model. Since there are already several adaptations of vocoding technologies to the HMM synthesis framework and none could provide an entirely satisfying result, in this work a different approach is proposed. With the objective of improving syntactic voice quality, we propose the development of a perceptually weighted adaptive filter technique that can enhance parameter generation ability on time and frequency domains and on an intra-segmental basis. The adaptation strategy will be based on prosodic correlates of voice preference in contextualized TTS applications for maximizing voice intelligibility and overall naturalness. The proposed work will be entirely dedicated to the European Portuguese language which still lacks several resources and tools.