The use of artificial intelligence in the assessment of live subtitling quality: the NER Buddy
- Pablo Romero-Fresco 1
- Óscar Alonso Amigo
- Luis Alonso Bacigalupe 1
- 1 Department of Translation and Linguistics, Universidade de Vigo
ISSN: 1578-7559
Ano de publicación: 2024
Título do exemplar: Interacció persona-ordinador en traducció i interpretació: programes i aplicacions
Número: 22
Páxinas: 450-470
Tipo: Artigo
Outras publicacións en: Revista tradumàtica: traducció i tecnologies de la informació i la comunicació
Resumo
Translation quality assessment has been subject to high levels of subjectivity. However, in areas such as audiovisual translation it has become common practice to objectively evaluate the quality of the captions of live TV broadcasts. In intralingual live subtitling — an accessibility service for people with hearing loss where captions are in the same language as the original — the NER model was proposed by Romero-Fresco and Martínez (2015). However, it is complex and time-consuming. The purpose of this contribution is to present the results of our research on the development of an AI-based application for the (semi-)automatic assessment of live captions using the NER methodology. International TV broadcasters are testing this app.
Referencias bibliográficas
- Bender, E. M., Gebru, T., MacMillan-Major, A. and Schmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp 610–623. https://doi.org/10.1145/3442188.3445922. [Accessed: 20240825].
- Brown, T. et al. (2020). Language Models are Few-Shot Learners. https://doi.org/10.48550/arXiv.2005.14165. [Accessed: 20240815].
- CRTC: Broadcasting Notice of Consultation CRTC 2019-9. Ottawa. https://crtc.gc.ca/eng/archive/2019/2019-9.htm. (2019a). [Accessed 20240625].
- CRTC: Broadcasting Regulatory Policy CRTC 2019-308. Ottawa. https://crtc.gc.ca/eng/archive/2019/2019-308.htm. (2019b). [Accessed 20240625].
- Devlin, J., Chang, M., Kenton, L., Toutanova, K. (2019). ]BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805. [Accessed 20240621].
- Dumouchel, P., Boulianne, G. and Brousseau, J. (2011). Measures for quality of closed captioning, in: A. Şerban, A. Matamala and J. M. Lavaur (eds.). Audiovisual translation in closeup: Practical and theoretical approaches. Bern: Peter Lang, pp. 161–172.
- HLAA (Hearing Loss Association of America): Hearing Loss: Facts and Statistics (2018). https://www.hearingloss.org/wpcontent/uploads/HLAA_HearingLoss_Facts_Statistics.pdf?pdf=FactStats. [Accessed 20240511].
- Hughes, J., (2023). Introducing Ursa from Speechmatics. Speechmatics. https://www.speechmatics.com/company/articles-and-news/introducing-ursa-the-worlds-most-accurate-speech-to-text. [Accessed 20240908].
- Kocmi, T. and Federmann, C. (2023). Large language models are state-of-the-art evaluators of translation quality, in European Association for Machine Translation (EAMT). https://arxiv.org/abs/2302.14520. [Accessed 20240807].
- Lambourne, A. (2006). Subtitle Respeaking, in Carlo Eugeni and Gabriele Mack (eds.). Intralinea, Special Issue on Respeaking. https://www.intralinea.org/specials/article/1686. [Accessed 20240906].
- Marsh, A. (2006). Respeaking for the BBC, in Carlo Eugeni and Gabriele Mack (eds.). Intralinea, Special Issue on Respeaking. https://www.intralinea.org/specials/article/Respeaking_for_the_BBC. [Accessed 20240906].
- Mykhalevych, N. (2022). Survey: Why America is obsessed with subtitles, https://preply.com/en/blog/americas-subtitles-use/. [Accessed 20222010].
- Papers with Code (2024). Multi-task Language Understanding on MMLU. https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu. [Accessed 20240908].
- Pezeshkpour, P. and Hruschka, E. (2023). Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions. https://doi.org/10.48550/arXiv.2308.11483. [Accessed 20240525].
- Radford, A., Narasimhan, K., Salimans, T. and Sutskever, I. (2018). Improving Language Understanding by Generative Pre-Training. Open AI. https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf. [Accessed 20240908].
- Radford, A., Wook Kim, J., Xu, T., Brockman, G., McLeavey, C. and Sutskever, I. (2022). Robust Speech Recognition via Large-Scale Weak Supervision. https://doi.org/10.48550/arXiv.2212.04356. [Accessed 20240910].
- Romero-Fresco, P. (2011). Subtitling Through Speech Recognition: Respeaking. Routledge: Manchester.
- Romero-Fresco, P. (2020). Negotiating quality assessment in media accessibility: the case of live subtitling. Universal Access in the Information Society 20, pp. 741–751. https://doi.org/10.1007/s10209-020-00735-6. [Accessed 20240602].
- Romero-Fresco, P. and Martínez, J. (2015). Accuracy rate in live subtitling: the NER model, in Díaz-Cintas, J., Baños, R. (eds.). Audiovisual Translation in a Global Context: Mapping an Ever-changing Landscape. London: Palgrave MacMillan, pp. 28–50. https://doi.org/10.1057/9781137552891_3. [Accessed 20240602].
- Romero-Fresco, P. and Eugeni, C. (2020). Live subtitling through respeaking, in Bogucki, Ł. and Deckert, M. (eds.). Handbook of Audiovisual Translation and Media Accessibility. London: Palgrave MacMillan, pp. 269–297. https://doi.org/10.1007/978-3-030-42105-2_14. [Accessed 20240602]..
- Romero-Fresco, P., & Fresno, N. (2023). The accuracy of automatic and human live captions in English. Linguistica Antverpiensia, New Series – Themes in Translation Studies, 22. https://doi.org/10.52034/lans-tts.v22i.774. [Accessed 20240715].
- Stinson, M. S. (2015). Speech-to-text interpreting, in Pöchhacker, F. (ed.), Routledge Encyclopedia of Interpreting Studies. Manchester: Routledge, pp. 399-40.
- Stureborg, R., Alikaniotis, D. and Suhara, Y. (2024). Large Language Models are Inconsistent and Biased Evaluators. https://doi.org/10.48550/arXiv.2405.01724. [Accessed 20240908].
- Tang, L., Shalyminov, I., Wing-mei Wong, A., Burnsky, J., Vincent, J.W., Yang, Y., Singh, S., Feng, S., Song, H., Su, H., Sun, L., Zhang, Y., Mansour, S. and McKeown, K. (2024). TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization. https://paperswithcode.com/paper/tofueval-evaluating-hallucinations-of-llms-on. [Accessed 20240525].
- UNE (2012). Subtitulado para personas sordas y personas con discapacidad auditiva. Madrid: UNE. https://www.une.org/encuentra-tu-norma/busca-tu-norma/norma?c=N0049426. [Accessed 20240521].
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017). Attention Is All You Need. https://doi.org/10.48550/arXiv.1706.03762. [Accessed 20240502].
- Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. (2023). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. https://doi.org/10.48550/arXiv.2201.11903. [Accessed 20240716].
- Wells, T., Christoffels, D., Vogler, C., Kushalnagar, R. (2022). Comparing the Accuracy of ACE and WER Caption Metrics When Applied to Live Television Captioning, in Miesenberger, K., Kouroupetroglou, G., Mavrou, K., Manduchi, R., Covarrubias Rodriguez, M., Penáz, P. (eds.). Computers Helping People with Special Needs. ICCHP-AAATE 2022. Lecture Notes in Computer Science, vol 13341. Springer, Cham. https://doi.org/10.1007/978-3-031-08648-9_61. [Accessed 20240602].