Lingüística de corpusde los datos textuales a la teoría lingüística

  1. García-Miguel, José M. 1
  1. 1 Universidade de Vigo
    info

    Universidade de Vigo

    Vigo, España

    ROR https://ror.org/05rdf8595

Revista:
Estudios de Lingüística del Español (ELiEs)

ISSN: 1139-8736

Ano de publicación: 2022

Título do exemplar: Metodologías lingüísticas: de los datos empíricos a la teoría del lenguaje

Número: 45

Páxinas: 11-42

Tipo: Artigo

Outras publicacións en: Estudios de Lingüística del Español (ELiEs)

Resumo

In this paper a general presentation of Corpus Linguistics is provided by explaining whata linguistic corpus is, how it is related to other types of data, why it is necessary toannotate it, and what the annotation process is like. Some of the more common tasks incorpus-based linguistic research are also reviewed, such as obtaining frequency lists,exploring concordances, or finding co-occurrences (collocations) and other types ofcontextual information. Throughout the text, an attempt is made to show the relevance ofthis type of data for linguistic theory, in particular for use-based models, such as thecognitive and functional ones.

Referencias bibliográficas

  • Aarts, Bas. 2007. Syntactic gradience: The nature of grammatical indeterminacy. Oxford: Oxford University Press.
  • Arppe, Antti; Gilquin, Gaëtanelle; Glynn, Dylan; Hilpert, Martin; Zeschel, Arne. 2010. Cognitive Corpus Linguistics: five points of debate on current theory and methodology. Corpora 5.1: 1–27.
  • Baker, Paul; Gabrielatos, Costas; McEnery, Tony. 2013. Sketching Muslims: A corpus driven analysis of representations around the word “Muslim” in the British press 1998-2009. Applied Linguistics 34.3: 255–278.
  • Barlow, Michael. 2011. Corpus linguistics and theoretical linguistics. International Journal of Corpus Linguistics 16: 3–44.
  • Barlow, Michael; Kemmer, Suzanne, eds. 2000. Usage-based models of language. Stanford: CSLI.
  • Biber, Douglas. 1993. Representativeness in corpus design. Literary and Linguistic Computing 8.4: 243–257.
  • Bresnan, Joan; Hay, Jennifer. 2008. Gradient grammar: An effect of animacy on the syntax of give in New Zealand and American English. Lingua 118.2: 245–259.
  • Bosque, Ignacio. 2001. Sobre el concepto de “colocación” y sus límites. Lingüística Española Actual 23.1: 9–40.
  • Butler, Christopher. S. 2001. A matter of GIVE and TAKE: corpus linguistics and the predicate frame. Revista Canaria de Estudios Ingleses 42: 55–78.
  • Bybee, Joan. 2003. Mechanism of change in grammaticization: the role of frequency. En B. Joseph y R. Janda, eds. The Handbook of Historical Linguistics. Oxford: Blackwell, pp. 602–623.
  • Bybee, Joan. 2007. Frequency of use and the organization of language. New York: Oxford University Press.
  • Bybee, Joan; Hopper, Paul, eds. 2001. Frequency and the emergence of Linguistic structure. Amsterdam: John Benjamins.
  • Church, Kenneth W.; Hanks, Patrick. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16.1: 22–29.
  • Davies, Mark. 2005. The advantage of using relational databases for large corpora: Speed, advanced queries, and unlimited annotation. International Journal of Corpus Linguistics 10.3: 307–334.
  • De Benito Moreno, Carlota. 2019. Los corpus del español desde la perspectiva del usuario lingüista. Scriptum digital 8: 1–21.
  • Divjak, Dagmar; Gries, Stefan. 2006. Ways of trying in Russian: clustering behavioral profiles. Corpus Linguistics and Linguistic Theory 2.1: 23–60.
  • Dyson, Freeman J. 1997. Imagined worlds. Harvard University Press.
  • Eddington, David. 2017. Nominalized adverbs in Spanish: The intriguing case of detrás mío and its cohorts. Research in Corpus Linguistics 5: 47–55.
  • Fillmore, Charles J. 1992. “Corpus linguistics” or “Computer-aided armchair linguistics”. En J. Svartvik, ed. Directions in Corpus Linguistics. Berlin: Mouton de Gruyter, pp. 35–60.
  • Firth, John. 1957. Papers in Linguistics. Oxford University Press.
  • Gabrielatos, Costas. 2021. Bibliography of discourse-oriented corpus studies. http://ehu.ac.uk/docsbiblio.
  • García-Miguel, José M. 2005. Aproximación empírica a la interacción de verbos y esquemas construccionales, ejemplificada con los verbos de percepción. Estudios de Lingüística 19: 169–191.
  • García-Miguel, José M. 2015. Variable coding and object alignment in Spanish: A corpus-based approach. Folia Linguistica 49.1: 205–256.
  • García-Miguel, José M.; Vaamonde, Gael; González Domínguez, Fita. 2010. ADESSE, a Database with Syntactic and Semantic Annotation of a Corpus of Spanish. En LREC2010 - Proceedings of the Seventh International Conference on Language Resources and Evaluation. Valletta (Malta): ELRA, pp. 1903–1910.
  • Gilquin, Gaëtanelle; Gries, Stefan. 2009. Corpora and experimental methods: A state-ofthe-art review. Corpus Linguistics and Linguistic Theory 5.1: 1–26.
  • Glynn, Dylan; Robinson, Justyna A., eds. 2014. Corpus methods for semantics: Quantitative studies in polysemy and synonymy. Amsterdam: John Benjamins.
  • Goldberg, Adele. 2003. Constructions: A new theoretical approach to language. Trends in Cognitive Sciences 7.5: 219–224.
  • Gries, Stefan. 2009. Quantitative Corpus Linguistics with R: A practical introduction. Londres: Routledge.
  • Gries, Stefan. 2010a. Behavioral profiles: a fine-grained and quantitative approach in corpus-based lexical semantics. The Mental Lexicon 5.3: 323–346.
  • Gries, Stefan. 2010b. Corpus linguistics and theoretical linguistics: A love–hate relationship? Not necessarily… International Journal of Corpus Linguistics 15.3: 327–343.
  • Halliday, M.A.K. 1991. Corpus studies and probabilistic grammar. En K. Aijmer y B. Altenberg, eds. English corpus linguistics: Studies in honour of Jan Svartvik. London: Longman, pp. 30–43
  • Hanks, Peter; Pustejovsky, James. 2005. A pattern dictionary for natural language processing. Revue Française de Linguistique Appliquée, 10.2: 63–82.
  • Haspelmath, Martin. 2011. The indeterminacy of word segmentation and the nature of morphology and syntax. Folia Linguistica 45.1: 31–80.
  • Heylen, Kris; Tummers, José; Geeraerts, Dirk. 2008. Methodological issues in corpusbased Cognitive Linguistics. En G. Kristiansen, ed. Cognitive Sociolinguistics: Language variation, cultural models, social systems. Berlin: Mouton de Gruyter, pp. 91–128.
  • Hunston, Susan; Francis, Gill. 1998. Verbs observed: A corpus-driven pedagogic Grammar. Applied Linguistics 19.1: 45–72.
  • Hunston, Susan; Francis, Gill. 2000. Pattern grammar: A corpus-driven approach to the lexical grammar of English. Amsterdam: John Benjamins.
  • Ide, Nancy; Pustejovsky, James, eds. 2017. Handbook of Linguistic annotation. Dordrecht: Springer.
  • Johnston, Trevor. 2010. From archive to corpus: Transcription and annotation in the creation of signed language corpora. International Journal of Corpus Linguistics 15.1: 106–131.
  • Kipp, Michael; Martin, Jean-Claude; Paggio, Patrizia; Heylen, Dirk, eds. 2009. Multimodal corpora: From models of natural interaction to systems and applications. Berlin: Springer.
  • Knight, Dawn; Adolphs, Svenja. 2020. Multimodal corpora. En M. Paquot y S. Gries, eds. A practical handbook of Corpus Linguistics. Springer, pp. 353–370.
  • Labov, William. 1972. The study of language in its social context. En J.A. Fishman, ed. Advances in the Sociology of language, v. 1. The Hague: Mouton, pp. 152–216.
  • Langacker, Ronald W. 1987. Foundations of Cognitive Grammar, Volume 1: Theoretical Prerequisites. Stanford: Stanford University Press.
  • Leech, Geoffrey. 1992. Corpora and theories of linguistic performance. En J. Svartvik, ed. Directions in Corpus Linguistics. Berlin: Mouton de Gruyter, pp. 105–122.
  • Manning, Christopher. 2003. Probabilistic syntax. En R. Bod, J. Hay y S. Jannedy, eds. Probabilistic Linguistics. Cambridge: MIT Press, pp. 289–341.
  • Martí Antonín, Antonia M. 2018. Modelos de semántica distribucional. En M. Diaz-Ferro et al., eds. Actas do XIII Congreso Internacional de Lingüistica Xeral. Universidade de Vigo, pp. 16–22.
  • McEnery, Tony; Xiao, Richard; Tono, Yukio. 2006. Corpus-based language studies: An advanced resource book. London: Routledge.
  • Molina Salinas, Claudio; Sierra Martínez, Gerardo. 2015. Hacia una normalización de la frecuencia de los corpus CREA y CORDE. Revista signos 48.89: 307–331.
  • Mukherjee, Joybrato. 2004. Corpus data in a usage-based cognitive grammar. En K. Aijmer y B. Altenberg, eds. Advances in Corpus Linguistics. Amsterdam: Brill Rodopi, pp. 83–100.
  • Pérez, Ania; García-Miguel, José M.; Cabeza, Carmen. 2019. Anotación de corpus para o estudo da expresión gramatical de eventos: notas sobre o deseño do proxecto RADIS. Sensos-e 6.1: 40–61.
  • Rojo, Guillermo. 2016. Los corpus textuales del español. En J. Gutiérrez-Rexach, ed. Enciclopedia de Lingüística Hispánica. Oxford: Routledge, pp. 285–296.
  • Rojo, Guillermo. 2021. Introducción a la lingüística de corpus en español. London: Routledge.
  • Sinclair, John. 1991. Corpus, concordance, collocation. Oxford: Oxford University Press.
  • Sinclair, John. 2004. Trust the text: Language, corpus and discourse. Londres: Routledge.
  • Stefanowitsch, Anatol. 2020. Corpus linguistics: A guide to the methodology. Berlin: Language Science Press.
  • Stefanowitsch, Anatol; Gries, Stefan. 2003. Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics 8.2: 209– 243.
  • Stubbs, Michael. 1996. Text and corpus analysis. Oxford: Blackwell.
  • Tesnière, Lucien. 1959. Éléments de syntaxe structurale. Paris: Klincksieck.
  • Tognini-Bonelli, Elena. 2001. Corpus linguistics at work. Amsterdam: Benjamins.
  • Torruella, Joan. 2017. Lingüística de corpus: génesis y bases metodológicas de los corpus (históricos) para la investigación en lingüística. Frankfurt am Main: Peter Lang.
  • Torruella, Joan; Llisterri, Joaquim. 1999. Diseño de corpus textuales y orales. En J.M. Blecua, G. Clavería, C. Sánchez y J. Torruella, eds. Filología e informática: Nuevas tecnologías en los estudios filológicos. Barcelona: UAB / Ed. Milenio, pp. 45–81.
  • Tummers, Jose; Heylen, Kris; Geeraerts, Dirk. 2005. Usage-based approaches in Cognitive Linguistics: A technical state of the art. Corpus Linguistics and Linguistic Theory 1.2: 225–261.
  • Valenzuela, Javier. 2022. El big data en los estudios del lenguaje. Estudios de Lingüística del Español 45: 241–260.