EuroLLM: This is how the European open-source AI performs in translation tasks

New LLMs pop up like mushrooms after the rain. 🍄

ChatGPT, Claude, and Gemini are not the only players in the spotlight. From Deepseek to Bloom, Gemma 3, and CroissantLLM — new open-source LLMs tempt curious users around every corner.

Recently, EuroLLM joined the game, too. This European language model was developed as an open-source alternative to the US-focused giants. ChatGPT & co usually work well for English but tend to struggle with high accuracy and cultural sensitivity in less common languages. That's where this initiative comes in.

It strives to shift attention from the dominant US English toward underrepresented European languages while promoting diversity, equality, and open access. This noble mission also inspired another AI project called OpenEuroLLM, which has been recently announced.

Trained on European datasets, the model promises strong cultural adaptation and native fluency in 24 official European languages. It also aims to outperform US-centric models in areas such as gender neutrality, alignment with EU regulations (such as the EU AI Act), accurate idiomatic expressions, and sensitivity to local spelling and formal/informal nuances.

These are the model's big promises.

Does it live up to them?

It's time to put EuroLLM's capabilities to the test.

🔦 Looking for a comparison that includes a bigger set of LLMs? Check out our global analysis including seven different AI models

👾 The rules of the game 🔗

Four languages, three text types, one task. That's our test in a nutshell.

We'll find out how the model translates marketing copy, user manuals, and simple conversation. All these experiments will be performed in four language combinations:

🇵🇱 English to Polish
🇪🇸 English to Spanish
🇨🇿 English to Czech
🇬🇷 English to Greek

Next, we'll feed the same prompts with the same source texts and languages to ChatGPT to compare the results. All translations will be evaluated by professional linguists based on the following metrics:

1️⃣ Fluency
2️⃣ Accuracy
3️⃣ Cultural adaptation
4️⃣ Consistency in long texts
5️⃣ Handling of gender-neutral language

EuroLLM is available in two versions: EuroLLM-1.7B, containing 1.7 billion parameters and introduced in September 2024, and EuroLLM-9B, released in December 2024. We'll use the newest version and access it via Python.

👩‍💻 The standard command to call for translation recommended on the model's page on HuggingFace will be spiced up by three extra parameters:

max_new_tokens=200: To ensure full translation in case the model decides to stop halfway.
eos_token_id=tokenizer.eos_token_id: To tell the model to stop generating the content at a logical point. Otherwise, it keeps adding extra, unrelated text at the end of the output.
do_sample=False: To ensure more consistency in translation.

💬 Both EuroLLM and ChatGPT will receive the same prompt that highlights the test metrics:

“Translate the following English text into XYZ. Ensure that the translation is fluent, culturally appropriate, and maintains the original meaning.”

📝 The source texts from three different categories will be the same for all languages and models:

General text (conversation) 🔗

"Hey Alex, are you coming to the party tonight? Julia said she's bringing that new Spanish wine she got last week, and I heard it's amazing. Also, remember to bring some snacks. Last time, we ran out way too early! Oh, and don't forget, Lisa wants to talk to you about that job offer. She said it might be a great opportunity for you. Anyway, let me know what time you're planning to come, and if you need a ride, I can pick you up. Looking forward to it!"

Technical content (user manual) 🔗

"Device Setup Instructions
1. Unbox the device and ensure all components are included:
- Main unit
- Power adapter
- User manual
- USB cable
2. Connect the power adapter to the device and plug it into a power outlet. The LED indicator should turn green. If it remains off, check the power source.
3. Press and hold the power button for 3 seconds until the startup sound plays.
4. To connect to Wi-Fi, navigate to Settings > Network > Wi-Fi. Select your network and enter the password.
5. Download and install the companion app from the App Store or Google Play. Follow the on-screen instructions to complete the setup.
6. If the device is unresponsive, reset it by holding the reset button for 10 seconds.
Warning: Do not expose the device to water or extreme temperatures.
For troubleshooting, visit www.support.example.com."

Marketing copy (slogans) 🔗

"Unleash your potential with cutting-edge technology. Savor the taste of tradition in every bite. Designed for those who demand more. Your future, your way. Experience luxury, redefined. Innovation that moves you forward. Effortless style, undeniable confidence. Because you deserve the best. Power up your day with lasting energy. Where comfort meets performance. Crafted for perfection, built for life. Transform your space, transform your life. Timeless elegance for the modern world. Bold flavors, unforgettable experiences. Think smarter, live better. The ultimate upgrade for your lifestyle. More than a product, it's a statement. Quality you can trust, performance you can feel. Go beyond the ordinary. Excellence in every detail."

The text samples fall within the 85-150 word range. The idea is to test fluency and consistency beyond a single sentence.

Disclaimer: The translation quality assessments and comparisons presented in this article are based on a limited set of tests perfomed by language professionals and should not be considered exhaustive or definitive. Due to the broad range of potential test conditions and other constraints, our benchmarking efforts may not fully capture the capabilities of the models used in the test, which might be updated and improved in the near future. Our benchmarking criteria might also be limited.

🔍 The grand model battle in overview 🔗

Translating informal content with EuroLLM 🔗

First comes the informal conversation. What can go wrong with such a short, innocent text?

Nearly everything.

The conversation posed quite a big challenge for EuroLLM. The translated texts contained grammar mistakes, awkward phrases, and mistranslations. In every single language.

🇬🇷 For example, "Hey" in Greek was translated as "You said":

The Spanish translation contained punctuation mistakes, such as missing upside-down exclamation marks:

And both Polish and Czech versions included incorrectly inflected words in the phrase "new Spanish wine":

Not a great start for EuroLLM.

Next up is the technical manual.

Translating technical content with EuroLLM 🔗

You'd think this would be the easy part, right? Well, not quite.

EuroLLM dealt slightly better with this text, although not without its fair share of hiccups. Across all language combinations, there were inconsistencies, mistranslations, and overly literal phrases.

For example, in Spanish, the model mixed formal and informal tones:

In Czech, it transformed the phrase "the LED indicator should turn green" into "the LED indicator should remain turned off":

In Greek, it chose the wrong term for "password", and in Polish, the translation of "until the startup sound plays" was so literal that it felt more robotic than the device being described:

That's not a perfect result either.

What about the marketing copy?

Translating marketing copy with EuroLLM 🔗

To put it mildly, creativity is not the strongest point of EuroLLM. As expected, the marketing slogans were the biggest obstacle for the model. The texts sounded very literal, contained numerous mistranslations, and kept some words in English (e.g., "upgrade" in Spanish or "timeless elegance" in Czech):

As if this wasn't enough, EuroLLM failed to deliver on one of its biggest promises: using local vocabulary. For example, in Spanish, the model showed a tendency toward vocabulary and expressions closer to Latin American variants, rather than the Castilian Spanish from Spain (e.g., "desembala" vs "desempaqueta", "temprano" vs "pronto", the use of the simple past tense instead of the past perfect).

Overall, EuroLLM scored decent notes for gender-neutral language, but accuracy, fluency, and consistency remain serious downsides.

How does EuroLLM compare to ChatGPT? 🔗

In comparison, ChatGPT scored the highest in nearly all categories. It nailed the technical content, kept informal conversation sounding natural, and gracefully navigated the minefield of ambiguous marketing slogans across all languages. It managed to avoid mistranslations, overly literal phrases, and grammar and punctuation mistakes. In Spanish, it even favored the Castilian locale, which EuroLLM failed to achieve.

Below, you can see the metrics for all languages. In each case, the metric is an average for three different texts. This means that even if the model dealt well with the technical content, the overall fluency or accuracy is lower due to the poor results for informal or marketing texts.

LLMs	Consistency in long texts	Gender-neutral language	Cultural adaptation	Accuracy	Fluency	Total Score
ChatGPT	5	5	5	5	4	24
EuroLLM	2	2	3	2	1	10

Table 1. Results for translation into Czech.

LLMs	Consistency in long texts	Gender-neutral language	Cultural adaptation	Accuracy	Fluency	Total Score
ChatGPT	5	5	5	4	5	24
EuroLLM	5	5	3	2	2	17

Table 2. Results for translation into Greek.

LLMs	Consistency in long texts	Gender-neutral language	Cultural adaptation	Accuracy	Fluency	Total Score
ChatGPT	5	5	5	5	4	20
EuroLLM	5	4	3	3	1	16

Table 3. Results for translation into Polish.

LLMs	Consistency in long texts	Gender-neutral language	Cultural adaptation	Accuracy	Fluency	Total Score
ChatGPT	5	4	4	5	3	21
EuroLLM	2	4	3	3	3	15

Table 4. Results for translation into Spanish.

🥇 And the winner is… 🔗

When I first heard of EuroLLM, my expectations were unreasonably high. Finally, a European open-source initiative bold enough to take on the US giants. 👏 Sure, prompting via Command Panel instead of a slick interface wasn't exactly user-friendly, but I was ready to put up with it in the name of groundbreaking results.

But those never happened.

EuroLLM isn't bad. But it's not there yet, either. The model still has a long way to go before it can genuinely rival the industry giants. As the Spanish test results showed, EuroLLM doesn't always prioritize European vocabulary or cultural nuance. It stumbles in other areas, too. The ambition is great, but for it's more of a work in progress than a game-changing solution — at least when it comes to translation.

Hopefully, the model will improve with time, and one day it will offer a real, European-focused alternative. Until then, the industry giants remain giants for a reason, so if you're looking for an LLM-powered translation, turn your focus there.