Unbabel has unveiled its new open-source Large Language Model (LLM), CometKiwi XL, and XXL, models that are uniquely tailored for predicting translation quality. CometKiwi provides outstanding performance and unmatched quality estimation capabilities, contributing to more precise translations, cost-effective operations, and supporting businesses in their global expansion efforts.
Despite significant progress in AI translation, achieving perfect accuracy remains challenging, especially in complex scenarios. Moreover, the proliferation of machine translation products, each trained on distinct datasets, results in varying performance depending on the specific use case. This poses obstacles for businesses seeking efficient content translation and localization for a global audience. Unbabel's CometKiwi addresses these issues by identifying and assessing translation quality, enabling businesses to decide when human involvement is essential and when machine automation can enhance efficiency.
"By making these models available to the public, our goal is to promote collaboration, facilitate knowledge sharing, and drive further advancements in quality estimation techniques. We firmly believe that CometKiwi will make a significant contribution to the growth and innovation of the machine translation field as a whole," said João Graça, Co-Founder and Chief Technology Officer of Unbabel.
CometKiwi XL and XXL are the latest additions to the field of Quality Estimation (QE) systems, and they support an impressive 100 languages each. These models are remarkable for their massive scale, boasting 3.5 billion and 10.7 billion parameters, respectively. They take their names from their open-source predecessors, OpenKiwi and COMET.
These Large Language Models (LLMs) demonstrated their excellence by securing the first position in the WMT 2023 QE shared task. This task encompassed a diverse range of language pairs, including widely-resourced ones like Chinese-English and English-German, and less-resourced pairs such as Hebrew-English, English-Tamil, and English-Telugu.
In July this year, Unbabel launched its new LangOps Platform to provide businesses with a comprehensive solution for managing multilingual content and communication.