Análisis morfosintáctico estadístico en lengua gallega

  1. Méndez Pazó, Francisco
  2. Fernández Rei, Elisa
  3. Rodríguez Banga, Eduardo
  4. Campillo Díaz, Francisco
Aldizkaria:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Argitalpen urtea: 2003

Zenbakia: 31

Orrialdeak: 159-166

Mota: Artikulua

Beste argitalpen batzuk: Procesamiento del lenguaje natural

Laburpena

This paper describes a morphosyntactic analyzer in Galician which, apart from its obvious linguistic interest, can be easily applied to speech recognition and speech synthesis systems. While rule-driven models produce the better performance, stochastic models have shown a comparable accuracy when properly designed. Moreover, rule-driven models are based on a complex set of linguistic rules, quite difficult to maintain and not directly extensible to other languages. On the contrary, stochastic models allow a quick design, if a training corpus is available, and are extremely flexible as they can be adapted to other languages with minor changes in their source code. In order to train the statistic models we began to collect a Galician corpus which, at this time, consists of about 400,000 words with morphosyntactic annotations.