Multilingual Search
Query in any of 24 EU languages. Find documents in any language. Semantic search understands meaning across languages.
How Cross-Lingual Search Works
Traditional keyword search requires you to know the exact words used in the target document. If a regulation is published in German and you search in Finnish, a keyword engine will find nothing.
Pauhu uses semantic search powered by multilingual embeddings. Every document is converted into a 1024-dimensional vector that captures its meaning, not just its words. Because the embedding model was trained on 100+ languages simultaneously, documents with similar meaning get similar vectors regardless of what language they are written in.
This means you can:
- Search in Finnish and find German regulations about the same topic
- Search in English and find French court judgments interpreting the same directive
- Search in any language and find results across all 24 EU official languages
Embedding Model: BGE-M3
| Property | Value |
|---|---|
| Model | BGE-M3 (BAAI General Embedding, Multilingual) |
| Dimensions | 1024 |
| Similarity metric | Cosine similarity (0.0–1.0) |
| Training languages | 100+ (includes all 24 EU official languages) |
| Max sequence length | 8,192 tokens |
| Format | ONNX (browser-native inference supported) |
BGE-M3 is specifically designed for multilingual and cross-lingual retrieval. It outperforms monolingual models on cross-lingual benchmarks and handles code-mixed text well (e.g., an English query with a German legal term).
Using the lang Parameter
The lang parameter on search endpoints controls result filtering, not query language. The search engine always understands your query regardless of what language you write it in.
| Usage | Behaviour |
|---|---|
lang omitted | Returns results in all available languages, ranked by semantic similarity |
lang=fi | Returns only Finnish-language results, still ranked by semantic similarity to your query (which can be in any language) |
lang=en | Returns only English-language results |
Key insight: You can write your query in Finnish and set lang=de to find German documents that match your Finnish query. The embedding model bridges the language gap.
Cross-Lingual Example
The same concept — the EU AI Act — searched in three different languages, all returning the same document:
Query in English
curl "https://staging.pauhu.eu/v1/search/eurlex?q=artificial+intelligence+high-risk+systems&limit=1" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"product": "eurlex",
"results": [{
"id": "32024R1689",
"title": "Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act)",
"score": 0.96
}]
}
Query in Finnish
curl "https://staging.pauhu.eu/v1/search/eurlex?q=tekoaly+korkean+riskin+jarjestelmat&limit=1" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"product": "eurlex",
"results": [{
"id": "32024R1689",
"title": "Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act)",
"score": 0.93
}]
}
Query in German
curl "https://staging.pauhu.eu/v1/search/eurlex?q=kunstliche+Intelligenz+Hochrisiko-Systeme&limit=1" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"product": "eurlex",
"results": [{
"id": "32024R1689",
"title": "Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act)",
"score": 0.94
}]
}
All three queries return the same AI Act regulation with high confidence scores, despite being written in different languages. The slight score variation reflects natural differences in how the embedding model represents each language.
24 Supported Languages
Pauhu supports all 24 official languages of the European Union:
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
bg | Bulgarian | fi | Finnish | mt | Maltese |
cs | Czech | fr | French | nl | Dutch |
da | Danish | ga | Irish | pl | Polish |
de | German | hr | Croatian | pt | Portuguese |
el | Greek | hu | Hungarian | ro | Romanian |
en | English | it | Italian | sk | Slovak |
es | Spanish | lt | Lithuanian | sl | Slovenian |
et | Estonian | lv | Latvian | sv | Swedish |
Language Support Per Product
Most products contain documents in all 24 EU languages. Some specialised products have narrower coverage:
| Product | Languages | Notes |
|---|---|---|
| EUR-Lex | 24 | All EU official languages. Most documents available in all languages. |
| TED | 24 | Notices in the language of the contracting authority, summaries in English. |
| IATE | 24 | Terminology in all EU languages. Coverage varies by term. |
| Consilium | 24 | Council conclusions typically in all official languages. |
| Commission | 24 | Major communications in all languages; working documents often English/French/German only. |
| OEIL | 24 | Procedure summaries in the language of the rapporteur plus English. |
| National Law (lex) | 23 | National language per country. Malta uses both Maltese and English. |
| CURIA | 24 | Language of the case plus French (working language of the Court). |
| EPO | 3 | English, French, German (official languages of the EPO). |
| CORDIS | 2 | English plus the project coordinator’s language. |
| OSM | Multilingual | Names in local language plus English/international variants. |
| Wiki | 24 | Curated articles in all EU languages where available. |
| Code | 2 | Primarily English. Some projects include localised documentation. |
| All others | 24 | ECB, ECHA, EMA, Eurostat, Publications, data.europa.eu, Who is Who, DPP, European Parliament. |
Search Tips
- Use your native language. The embedding model handles all 24 EU languages equally well. You do not need to translate your query to English.
- Omit
langfor the broadest results. Without the parameter, you get results across all languages, which is ideal for comprehensive research. - Use
langto focus. Setlang=fiif you only want Finnish-language documents, even if your query is in English. - Mix languages freely. A query like
GDPR tietosuoja high-risk(mixing English and Finnish) works because the model understands both. - Use legal terms from any language. Searching
Datenschutz-Grundverordnung(German for GDPR) will find the same results as searchingGeneral Data Protection Regulation. - Prefer meaning over keywords.
rules for selling chemicals in the EUwill find REACH regulation results just as well asREACH regulation 1907/2006.