Talk 3 | IT:U

Title: Literary Translation in the Era of Large Language Models

Abstract: Automatic literary translation presents a unique challenge due to cultural nuances and the intricacies of literary language. Recently developed large language models (LLMs), such as GPT-4o, promise a new quality level for literary translation, however. This talk explores this central question: How good are LLMs at translating literature, really? To answer this question, I examine their performance through both human judgment and automatic evaluation metrics. The findings reveal that results vary significantly depending on the annotators (e.g. student vs. professionals) and the evaluation criteria (e.g. MQM annotation vs. pairwise preferences) used. Moreover, existing automatic metrics, such as XComet or Gemba, are largely inadequate for literary texts. I will then discuss how these metrics can be improved—through fine-tuning on human-annotated data and carefully designed prompts—to better assess the quality of automatic literary translations.