How complex is professional academic writing? A corpus-based analysis of research articles in 'hard' and 'soft' disciplines

  1. Pérez-Guerra, Javier 1
  2. A. Smirnova, Elizaveta 2
  1. 1 Universidade de Vigo
    info

    Universidade de Vigo

    Vigo, España

    ROR https://ror.org/05rdf8595

  2. 2 Foreign Languages DepartmentHSE University / Universidade de Vigo
Revista:
VIAL, Vigo international journal of applied linguistics

ISSN: 1697-0381

Ano de publicación: 2023

Número: 20

Páxinas: 149-183

Tipo: Artigo

DOI: 10.35869/VIAL.V0I20.4357 DIALNET GOOGLE SCHOLAR lock_openAcceso aberto editor

Outras publicacións en: VIAL, Vigo international journal of applied linguistics

Resumo

This study focuses on the analysis of linguistic complexity in professional academic writing in light of the empirical evidence provided by a 1,597,000-word corpus of ‘hard’ (life and physical sciences) and ‘soft’ (arts and social) scientific research articles published in leading peer-review journals. Specifically, this investigation aims both to describe the complexity features of texts written by professional authors and to test the hypothesis that linguistic complexity varies across disciplines. Since previous studies have revealed that automatic complexity indices do not sufficiently succeed in providing a comprehensive description of complexity of texts, in this paper complexity has been measured in two ways: quantitatively through the indexes provided by Lu’s (2010) L2 Syntactic Complexity Analyser, and through the more qualitative analysis of a selection of metrics associated with clausal and phrasal complexity in seminal studies. The data show, first, that syntactic complexity indices (basically, strategies of coordination and subordination) are statistically relevant to the characterisation of specifically the soft-science disciplines; second, that there is a continuum across subdisciplines within the broad distinction of soft versus hard genres; and, third, that the soft genre demonstrates a more stable productivity of clausal-complexity strategies, while phrasal-complexity features are more pervasive in the hard-science subcorpus.

Referencias bibliográficas

  • Ai, H., & Lu, X. (2013). A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing. In A. Díaz-Negrillo, N. Ballier & P. Thompson (eds), Automatic treatment and analysis of learner corpus data (pp. 249-264). Amsterdam: John Benjamins.
  • Anthony, L. (2014). AntConc (Version 3.4.4) [Computer Software]. Tokyo: Waseda University.
  • Anthony, L. (2015). TagAnt (Version 1.2.0) [Computer Software]. Tokyo: Waseda University.
  • Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixedeffects models using lme4. Journal of Statistical Software, 67(1), 1-48.
  • Becher, T., & Trowler, P. R. (2001). Academic tribes and territories (2nd ed.). Philadelphia: Open University Press.
  • Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.
  • Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E., & Quirk, R. (1999). Longman grammar of spoken and written English. London: Longman.
  • Biber, D. & Gray, B. (2011). Is conversation more grammatically complex than academic writing? In M. Konopka, J. Kubczak, Ch. Mair, F. Å tícha & U.H. Waßner (eds), Grammatik und Korpora 2009: Dritte Internationale Konferenz. Grammar & Corpora 2009: Third International Conference (pp.7-61). Tübingen: Narr Verlag.
  • Biber, D., & Gray, B. (2016). Grammatical complexity in academic English: Linguistic change in writing. Cambridge: Cambridge University Press.
  • Biber, D., Gray, B., & Poonpon, K. (2011). Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? TESOL Quarterly, 45(1), 5-35.
  • Biber, D., Gray, B., & Poonpon, K. (2013). Pay attention to the phrasal structures: Going beyond T-units – A response to WeiWei Yang. TESOL Quarterly, 47(1), 192-201.
  • Biber, D., Gray, B., & Staples, S. (2016). Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics, 37(5), 639-668.
  • Biber, D., Gray, B., Staples, S., & Egbert, J. (2020). Investigating grammatical complexity in L2 English writing research: Linguistic description versus predictive measurement. Journal of English for Academic Purposes, 46, Article 100869.
  • Biber, D., Gray, B., Staples, S., & Egbert, J. (2021). The register-functional approach to grammatical complexity: Theoretical foundation, descriptive research findings, application. New York: Routledge.
  • Bulté, B., & Housen, A. (2012). Defining and operationalising L2 complexity. In A. Housen, F. Kuiken & I. Vedder (eds), Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA (pp. 21-46). Amsterdam: John Benjamins.
  • Casal, J.E., & Lee, J.J. (2019). Syntactic complexity and writing quality in assessed first year L2 writing. Journal of Second Language Writing, 44, 51-62.
  • Casal, J.E., Lu, X., Qiu, X., Wang, Y., & Zhang, G. (2021). Syntactic complexity across academic research article part-genres: A cross-disciplinary perspective. Journal of English for Academic Purposes, 52, Article 100996.
  • Chen, D., & Manning, C.D. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 740-750). Doha, Qatar.
  • Chen, X., & Meurers, D. (2016). CTAP: A web-based tool supporting automatic complexity analysis. In Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity at COLING, Osaka, Japan, 11th December. (pp. 113-119). Osaka, Japan: The International Committee on Computational Linguistics.
  • Crossley, S.A., & McNamara, D.S. (2012). Predicting second language writing proficiency: The role of cohesion, readability, and lexical difficulty. Journal of Research in Reading, 35(2), 115-135.
  • Crossley, S.A., & McNamara, D.S. (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing, 26, 66-79.
  • Crossley, S.A., Allen, L.K., & McNamara, D.S. (2014). A Multi-Dimensional Analysis of essay writing. What linguistic features tell us about situational parameters and the effects of language functions on judgements of quality. In T. Berber Sardinha & M. Veirano Pinto (eds), Multi-Dimensional Analysis, 25 years on: A tribute to Douglas Biber (pp.197-238). Amsterdam: John Benjamins.
  • Dai, H.J., Lai, P.T., Chang, Y.C., & Tsai, R. T.H. (2015). Enhancing of chemical compound and drug name recognition using representative tag scheme and finegrained tokenisation. Journal of Cheminformatics, 7, 1-14.
  • Dang, T.N.Y. (2018). The nature of vocabulary in academic speech of hard and soft-sciences. English for Specific Purposes, 51, 69-83.
  • Davis, P.J., & Hersh, R. (1981). The mathematical experience. Boston: Birkhauser.
  • Fox, J., & Weisberg, S. (2019). An R companion to applied regression. Thousand Oaks: Sage.
  • Gardner, S., Nesi, H., & Biber, D. (2019). Discipline, level, genre: Integrating situational perspectives in a new MD analysis of university student writing. Applied Linguistics, 40(4), 646-674.
  • Gray, B. (2013). More than discipline: Uncovering multi-dimensional patterns of variation in academic research articles. Corpora, 8(2), 153-181.
  • Gray, B. (2015). On the complexity of academic writing: Disciplinary variation and structural complexity. In V. Cortes & E. Csomay (eds), Corpus-based research in applied linguistics: Studies in honor of Doug Biber (pp. 49-78). Amsterdam: John Benjamins.
  • Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193-202.
  • Hardy, J.A., & Friginal, E. (2016). Genre variation in student writing: A MultiDimensional Analysis. Journal of English for Academic Purposes, 22, 119-131.
  • Harrell, F.E.Jr. (2021). Regression Modeling Strategies. https://github.com/ harrelfe/rms
  • Harrell, F.E.Jr. with contributions from Charles, D. et al. (2020). Hmisc version 4.3-1. https://CRAN.R-project.org/package=Hmisc
  • Hinkel, E. (2003). Simplicity without elegance: Features of sentences in L1 and L2 academic texts. TESOL Quarterly, 37(2), 275-301.
  • Hothorn, T., Buehlmann, P., Dudoit, S., Molinaro, A., & Van Der Laan, M. (2006). Survival ensembles. Biostatistics, 7(3), 355-373.
  • Hunt, K.W. (1964). Differences in grammatical structures written at three grade levels: The structures to be analysed by transformational methods. Report no. CRP-1998. Tallahasser: Florida State University.
  • Hunt, K.W. (1965). Grammatical structures written at three grade levels. NCTE Research Report No. 3. Champaign, IL: National Council of Teachers of English.
  • Hunt, K.W. (1970). Recent measures in syntactic development. In M. Lester (ed), Readings in applied transformational grammar (pp. 179-192). New York: Holt, Rinehert and Winston.
  • Hyland, K. (2004). Disciplinary discourses: Social interactions in academic writing. Ann Arbor, MI: University of Michigan Press.
  • Kelly-Laubscher, R. F., Muna, N., & van der Merwe, M. (2017). Using the research article as a model for teaching laboratory report writing provides opportunities for development of genre awareness and adoption of new literacy practices. English for Specific Purposes, 48, 1-16.
  • Klein, D., & Manning, Ch.D. (2003). Fast exact inference with a factored model for Natural Language Parsing. In S. Becker, S. Thrun & K. Obermayer (eds), Advances in neural information processing systems (pp. 3-10). Cambridge, MA: MIT Press.
  • Kosem, I. (2010). Designing a model for a corpus-driven dictionary of academic English. PhD, Aston University.
  • Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication. PhD, Georgia State University, Atlanta, GA.
  • Kyle, K., & Crossley, S.A. (2018). Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. The Modern Language Journal, 102(2), 333–349.
  • Lambert, C., & Kormos, J. (2014). Complexity, accuracy, and fluency in taskbased L2 research: Toward more developmentally based measures of second language acquisition. Applied Linguistics, 35(5), 607-614.
  • Lambert, C., & Nakamura, S. (2019). Proficiency‐related variation in syntactic complexity: A study of English L1 and L2 oral descriptive discourse. International Journal of Applied Linguistics, 29(2), 1-17.
  • Levshina, N. (2015). How to do linguistics with R: Data exploration and statistical analysis. Amsterdam: John Benjamins.
  • Levy, R., & Andrew, G. (2006). Tregex and Tsurgeon: Tools for querying and manipulating tree data structures. In Proceedings of the Fifth International Conference on Language Resources and Evaluation. (pp. 2231-2234). Genoa: ELRA.
  • Lintunen, P., & Mäkilä, M. (2014). Measuring syntactic complexity in spoken and written learner language: Comparing the incomparable? Research in Language, 12(4), 377-399.
  • Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474-496.
  • Lu, X. (2011). A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly, 45(1), 36-62.
  • Lu, X. (2017). Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment. Language Testing, 34(4), 493-511.
  • Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., & Hornik Maechler, K. (2019). cluster: Cluster analysis basics and extensions. R package version 2.1.0.
  • Mazgutova, D., & Kormos, J. (2015). Syntactic and lexical development in an intensive English for Academic Purposes programme. Journal of Second Language Writing, 29, 3-15.
  • McNamara, D.S., Louwerse, M.M., McCarthy, P.M., & Graesser, A.C. (2010). CohMetrix: Capturing linguistic features of cohesion. Discourse Processes, 47(4), 292-330.
  • Nesi, H. (2002). An English spoken academic wordlist. In A. Braasch & C. Povlsen (eds) Proceedings of the Tenth EURALEX International Congress. Vol. 1. (pp. 351- 358). Copenhagen.
  • Nesi, H., & Gardner, S. (2019). Complex, but in what way? A step towards greater understanding of academic writing proficiency. In C. Danjo, I. Meddegama, D. O’Brien, J. Prudhoe, L. Walz & R. Wicaksono (eds.), Online Proceedings of the 51st Annual Meeting of the British Association for Applied Linguistics: Taking Risks in Applied Linguistics, 6-8 September, 2018.
  • Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24, 492- 518.
  • R Core Team. (2022). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org
  • Rayson, P. (2003). Matrix: A statistical method and software tool for linguistic analysis through corpus comparison. PhD thesis, Lancaster University. https://eprints.lancs. ac.uk/id/eprint/12287/1/phd2003.pdf (accessed on 20.12.2020)
  • Ruan, Z. (2018). Structural compression in academic writing: An English-Chinese comparison study of complex noun phrases in research article abstracts. Journal of English for Academic Purposes, 36, 37-47.
  • Staples, S., Egbert, J., Biber, D., & Gray, B. (2016). Academic writing development at the university level: Phrasal and clausal complexity across level of study, discipline, and genre. Written Communication, 33(2), 149-183.
  • Storer, N.W. (1967). The hard sciences and the soft: Some sociological observations. Bulletin of the Medical Library Association, 55(1), 75-84.
  • Suzuki, R., Terada, Y., & Shimodaira, H. (2019). pvclust: Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling. R package version 2.2-0.
  • Swales, J.M. (1990). Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press.
  • Tagliamonte, S.A., & Baayen, R.H. (2012). Models, forests and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change, 24, 135-178.
  • Wijers, M. (2018). The role of variation in L2 syntactic complexity: A case study on subordinate clauses in Swedish as a foreign language. Nordic Journal of Linguistics, 41(1), 75-116.
  • Wolfe-Quintero, K., Inagaki, S., & Kim, H.Y. (1998). Second language development in writing: Measures of fluency, accuracy, and complexity. Honolulu, HI: University of Hawaii Press.
  • Wu, X., Mauranen, A., & Lei, L. (2020). Syntactic complexity in English as a lingua franca academic writing. Journal of English for Academic Purposes, 43, Article 100798.
  • Yin, S., Gao, Y., & Lu, X. (2021). Syntactic complexity of research article partgenres: Differences between emerging and expert international publication writers. System, 97, Article 102427.