API Reference
IATE terminology (2.4M terms), neural translation (1,440+ language pairs), semantic search (20 semantic indexes), document annotation. All services run in EU jurisdiction.
Architecture
Pauhu® EU runs as a fleet of services in EU jurisdiction. Each service has its own domain. There is no single unified gateway URL - each API is accessed at its own endpoint.
| Service | Worker | Purpose |
|---|---|---|
| Terminology | terminology.pauhu.eu | IATE term lookup, search, TBX/TMX export |
| Translation | translate.pauhu.eu | Neural translation (1,440+ language pairs) |
| Search | search.pauhu.eu | Semantic search across 20 data sources |
| Annotation | annotate.pauhu.eu | Topic + legal modality classification |
| Models | models.pauhu.ai | Optimized model CDN (2,342 models) |
| Gateway | pauhu.eu | Gate orchestration |
All services return JSON by default. CORS is enabled for pauhu.ai, pauhu.eu, and localhost development origins.
Authentication
API keys
Generate a self-service API key via the Pauhu search service. Keys follow the format pk_*. Each key is linked to a seat tier that determines data access entitlements.
Create an API key. Requires an email address. New keys start as live tier (activated by Stripe subscription).
curl -X POST https://pauhu.eu/keys/generate \
-H "Content-Type: application/json" \
-d '{"email": "you@company.com"}'
Response:
{
"api_key": "pk_...",
"tier": "live",
"entitlements": {
"raw_feeds": true,
"annotated_feeds": false,
"training_export": false,
"pauhu_ai": false,
"search": true,
"terminology": true,
"translation": true,
"rerank": true
},
"created_at": "2026-02-24T12:00:00.000Z"
}
Check current entitlements and burst limits. Requires Authorization: Bearer <api_key>.
{
"tier": "live",
"entitlements": {
"raw_feeds": true,
"annotated_feeds": false,
"training_export": false,
"pauhu_ai": false,
"search": true,
"terminology": true,
"translation": true,
"rerank": true
},
"burst_limit": "10 req/sec sustained, 50 req/sec peak",
"active": true,
"org_id": null
}
Request authentication
Include your API key as a Bearer token:
Authorization: Bearer pk_...
Without a key, requests run in trial mode: 3 requests/day (IP-based), search + terminology + translation only. No access to raw feeds, annotations, or training export.
Terminology API LIVE
Serves 2,456,445 IATE terms across 24 EU languages with exact lookup and semantic search (multilingual embeddings, 1024 dimensions).
Exact lookup
Exact match against IATE terminology database.
| Parameter | Type | Description |
|---|---|---|
| term | string | Term to look up (required) |
| lang | string | ISO 639-1 language code (optional, searches all if omitted) |
{
"query": "data protection",
"found": true,
"count": 3,
"results": [...],
"source": "IATE",
"stage": "LOOKUP"
}
Semantic search
Multilingual embedding search via semantic index. Returns semantically similar terms ranked by semantic similarity.
curl -X POST https://pauhu.eu/search \
-H "Content-Type: application/json" \
-d '{"query": "personal data processing", "lang": "en", "limit": 10}'
{
"query": "personal data processing",
"count": 10,
"results": [...],
"source": "IATE Pauhu Search",
"stage": "MODEL"
}
Statistics
Term counts by language.
{
"total": 2456445,
"languages": 24,
"byLanguage": [{"lang": "en", "count": 312847}, ...],
"source": "IATE",
"reliability": "4-star"
}
TBX export (ISO 30042)
Export terminology in TermBase eXchange format. Returns application/x-tbx+xml.
TMX export
Export translation pairs in Translation Memory eXchange format. Returns application/x-tmx+xml.
Batch export
Paginated export for embedding generation pipelines.
{
"lang": "en",
"offset": 0,
"limit": 1000,
"count": 1000,
"hasMore": true,
"nextOffset": 1000,
"terms": [...],
"embedding_model": "multilingual-1024d"
}
Custom glossaries (tenant)
Upload a custom glossary (CSV or TBX). Terms are merged with IATE at lookup time, with tenant terms taking priority.
Merged lookup: tenant glossary first, then IATE fallback.
List all glossaries for a tenant.
Delete a tenant glossary.
Translation API LIVE
Neural translation models running in the browser. 1,440+ language pairs.
Translate text
6-stage translation cascade: KV cache → IATE terminology → rules engine → (reserved) → browser inference → semantic verification.
| Parameter | Type | Description |
|---|---|---|
| text | string | Text to translate (required) |
| source_lang | string | Source language (ISO 639-1) |
| target_lang | string | Target language (required) |
curl -X POST https://pauhu.eu/translate \
-H "Content-Type: application/json" \
-d '{"text": "General Data Protection Regulation", "source_lang": "en", "target_lang": "fi"}'
{
"source_lang": "en",
"target_lang": "fi",
"text": "General Data Protection Regulation",
"translation": "Yleinen tietosuoja-asetus",
"model_id": "en-fi"
}
Full cascade (verbose)
Returns all 6 cascade stages with timing and provenance for each step.
Batch segment translation
Translate an array of segments in a single request.
Supported languages
List all supported language codes and available pairs.
Search API LIVE
Fan-out semantic search across 20 indexes (multilingual embeddings, 1024 dimensions, semantic similarity). Powered by semantic ranking.
Semantic search
Search across all 20 data source semantic indexes simultaneously.
| Parameter | Type | Description |
|---|---|---|
| q | string | Search query (required) |
| limit | integer | Max results (default: 20) |
| domain | string | EuroVoc domain filter (1-21) |
{
"query": "digital product passport ESPR",
"limit": 20,
"results": [
{"product": "eurlex", "id": "32024R1781", "title": "...", "score": 0.89, "url": "..."},
{"product": "commission", "id": "...", "title": "...", "score": 0.84, "url": "..."}
]
}
Instant answers
Knowledge panels from IATE, EUR-Lex, and Wikidata. Returns structured answer snippets.
Web proxy
CORS proxy for institutional search sources. Normalizes results into a common schema.
| Source | Description |
|---|---|
arxiv | arXiv academic papers |
eurostat | Eurostat datasets |
ted | TED procurement notices |
Cross-language siblings
Find all language versions of a EUR-Lex document by CELEX number.
Semantic reranking
Rerank a set of results using cross-encoder scoring.
DLC packs (browser models)
Signed manifest of downloadable model packs with Ed25519 signatures and SHA-256 checksums.
Core DLC pack: optimized models and terminology for browser-native inference.
Delta pack: incremental updates since last core download.
Annotation API LIVE
Classifies documents with topic annotations, legal modalities, and language detection. Split across two service instances (A–E and E–W) for the 20 data sources.
Annotate document
Full annotation pipeline: language detection, topic classification, legal modality (obligation/prohibition/permission/exemption), word count, product-specific metadata.
curl -X POST https://annotate.pauhu.eu/annotate \
-H "Content-Type: application/json" \
-d '{"text": "Member States shall ensure...", "product": "eurlex"}'
{
"original_path": "...",
"organized_path": "...",
"language": "en",
"legal obligation": {"modality": "obligation", "confidence": 0.95},
"product": "eurlex",
"topic_domain": "law",
"word_count": 847,
"char_count": 5231
}
Batch annotate
Annotate up to 50 documents in a single request.
Legal modality classification only
Lightweight endpoint: returns only legal modality classification.
{
"language": "en",
"annotation": {
"modality": "prohibition",
"confidence": 0.92
}
}
Legal modalities
| Modality | Meaning | Example |
|---|---|---|
| Prohibition | Action is forbidden | "Member States shall not permit..." |
| Obligation | Action is required | "Member States shall ensure..." |
| Permission | Action is allowed | "Member States may designate..." |
| Exemption | No requirement applies | "This Regulation shall not apply to..." |
Service metadata
List all registered data source annotators and their product codes.
Annotation counts from sidecar metadata, grouped by product.
Full provenance audit: total annotated documents, per-product breakdown, provenance tier distribution (NATIVE 1.0, PARSED 0.95, KEYWORD ≤0.9).
Indexing API LIVE
Hybrid semantic + keyword search, alert monitoring, and document health checks across all 20 data sources. Split across two service instances (A–E and E–W).
Health check
Binding smoke test. Returns status of storage, database, vector index, and cache bindings for the service’s product set.
{
"service": "index",
"status": "healthy",
"bindings": {
"STORAGE_COMMISSION": "ok",
"DB_COMMISSION": "ok",
"RECIPE_ALERTS": "bound",
"AI_TOKEN": "set"
},
"timestamp": "2026-03-03T09:00:00Z"
}
Alerts
Query stored regulatory alerts from KV. Filter by recipe name and severity level.
| Parameter | Type | Description |
|---|---|---|
| recipe | string | Recipe name filter (optional, defaults to all) |
| severity | string | Severity filter: critical, high, medium, low (optional) |
| limit | integer | Max results (default: 20) |
{
"count": 5,
"filters": { "recipe": "*", "severity": "*" },
"alerts": [...]
}
Hybrid query
Hybrid semantic (70%) + keyword (30%) search within a single product index. Includes DSA Article 27 ranking transparency metadata.
| Parameter | Type | Description |
|---|---|---|
| product | string | Product code, e.g. COMMISSION, EURLEX (required) |
| q | string | Search query (required) |
| lang | string | ISO 639-1 language filter (optional) |
| domain | string | EuroVoc domain ID (optional) |
| limit | integer | Max results (default: 10) |
Backfill ADMIN
Admin-only endpoint. Index unprocessed sidecars for a product. Useful after initial data seeding or to recover from indexing gaps. Requires infrastructure-level access (not available via public API keys).
| Parameter | Type | Description |
|---|---|---|
| product | string | Product code, e.g. COMMISSION, EURLEX (required) |
| limit | integer | Max documents to index in one batch (default: 2000) |
| prefix | string | Key prefix filter, e.g. en/ for English documents only (optional) |
{
"product": "COMMISSION",
"limit": 2000,
"prefix": "en/",
"indexed": 500,
"errors": 3
}
Cross-language siblings
Find all language versions of a EUR-Lex document. Returns available languages and document paths.
Ranking methodology
DSA Article 27 ranking transparency. Returns algorithm weights (0.7 semantic, 0.3 keyword), manipulation resistance details, and update frequency. Cached for 24 hours.
Statistics
Document counts per product. The secondary index instance includes IATE term counts.
IATE legal modality distribution
Distribution of legal modalities across IATE terminology entries.
IATE cross-lingual translation
Look up a term in one language and retrieve translations across all 24 EU languages via IATE concept IDs.
{
"concept_id": "C12345",
"source_term": "tietosuoja",
"source_language": "fi",
"languages": 24,
"translations": {
"en": [{ "term": "data protection", "reliability": 4 }],
"de": [{ "term": "Datenschutz", "reliability": 4 }]
}
}
Model CDN LIVE
Serves optimized models from EU storage (2,342 models). Supports HTTP range requests for large files.
Model manifest with all available models and SHA-256 checksums.
Serve model files with Accept-Ranges: bytes for resumable downloads. CORS enabled for browser-native inference.
Model categories
| Category | Models | Format |
|---|---|---|
| Translation | 1,440+ language pairs | Optimized |
| Embeddings | Multilingual (1024d) | Optimized |
| Speech-to-text | Speech recognition | Optimized |
| Text-to-speech | TTS models | Optimized |
| Classification | Text classifiers, domain classifiers | Optimized |
| Code | Code models | Optimized |
Data infrastructure
20 EU institutional data sources are ingested into per-product storage, annotated via queue-triggered services, and indexed for semantic search. Each source has matching storage, queue, database, and vector index resources.
Data sources (20)
Pipeline
Documents flow through: Ingestion → EU storage (with metadata) → Event notification → Queue → Annotation service (topic + legal modality classification) → Sidecar JSON. Searchable via semantic ranking across all 20 indexes.
National law databases (27 countries)
27 national law adapters with source database links. Connected to EUR-Lex Sector 7 (290,172 national transposition measures linking EU directives to national implementations).
EuroVoc domains (21)
All annotations use the EU Publications Office EuroVoc thesaurus for domain classification:
04 Politics 08 Education & Comms 16 Environment
08 International 10 Business 17 Industry
10 EU Institutions 11 Agriculture 20 Energy
04 Economics 12 Law 24 Production
20 Trade 14 Geography 28 Employment
24 Finance 16 Intl Organisations 32 Information
28 Social Affairs 20 Transport
EU AI Act transparency
All workers expose an Article 52 transparency endpoint:
{
"ai_system": true,
"provider": "Pauhu Ltd",
"eu_ai_act_article": 52,
"purpose": "...",
"risk_category": "limited",
"jurisdiction": "EU"
}
Access control
Pauhu uses entitlement-based access control, not volume-based rate limiting. Your seat tier determines what data you can access, not how many requests you can make.
Seat tiers
| Tier | Data access | Auth |
|---|---|---|
| Trial | Search, terminology, translation (3 req/day) | IP-based (no key needed) |
| Live | Raw feeds from 20 sources + search + terminology + translation + reranking | API key |
| Annotated | Live + annotated feeds (EuroVoc, legal modality) + Pauhu AI platform | API key |
| Training | Live + bulk export for ML training | API key |
Burst protection
Paying seats have no daily request caps. Burst protection prevents abuse:
- Sustained: 3 requests/second sliding window
- Peak: 50 requests/second absolute maximum
Trial tier: 3 requests/day (IP-based), plus burst protection.
Response headers
X-Pauhu-Tier: live
Retry-After: 1 (only if burst limit hit)
Trial tier also receives X-RateLimit-Limit: 50.
Guides
| Guide | Domain | Description |
|---|---|---|
| IATE API Reference | pauhu.eu | Full reference for all 11 terminology endpoints: lookup, search, TBX/TMX export, custom glossaries |
| Recipe Catalog | pauhu.eu | 6 pre-configured monitoring recipes with alert format specification |
| How We Protect Your Data | pauhu.eu | Zone isolation, EU data residency, encryption, access control, audit trails |
| GPU Extensions | pauhu.eu | 6 GPU extension types (LLMs, video, image, real-time video, audio, 3D). Bring your own API keys. |
| Data Source Attributions | pauhu.eu | Licenses, publishers, and modification notices for all 35 data sources |
| Getting Started (Recipe Wizard) | pauhu.ai | Configure your EU regulatory feed in 3 steps |
| Benchmark Guide | pauhu.ai | Interpret browser inference benchmark results |
| Search Guide | pauhu.eu | Query syntax, filters, boolean operators, CELEX lookup, 20 product examples |
| Who is Who Privacy Notice | pauhu.eu | GDPR privacy notice for personal data from the EU Who is Who directory |
| LDS Connector Deployment | pauhu.eu | Deploy and configure the Language Data Space connector |
| LDS Demo Runbook | pauhu.eu | Step-by-step: login, Swagger, certificate upload, data publishing for lds.pauhu.eu |
| Data Catalog | pauhu.eu | 24 data products: source institution, update frequency, record count, languages, license. Machine-readable YAML + /v1/search API reference. |
| Cross-References | pauhu.eu | How EUR-Lex, CURIA, OEIL, TED, and national law link together. Example API responses with linked documents. |
| Data Freshness | pauhu.eu | Sync schedules per product, what “Last updated” means, data currency SLA. |
| Multilingual Search | pauhu.eu | Cross-lingual semantic search. Query in one language, find documents in another. 24 EU languages. |
| Data Pipeline | pauhu.eu | From EU source to grounded answer: ingestion, annotation, paragraph indexing, semantic search, grounded generation. Sovereign deployment data flow. |
| Grounded Generation | pauhu.eu | How retrieval and generation work together to produce grounded answers with citations. Cloud and sovereign deployment modes. |
| AI Transparency (Art. 52) | pauhu.eu | EU AI Act Art. 52 compliance: how Pauhu discloses AI involvement, system classification, user notification, training data sources |
| Pauhu for Government | pauhu.eu | Data sovereignty, GDPR Art. 25/32, NIS2, Traficom compliance, data residency guarantees, procurement compatibility |
| Pauhu julkishallinnolle | pauhu.eu | Overview for government organisations: procurement, legal compliance, terminology, translation |
| API Quickstart (EN) | pauhu.eu | English API quickstart with curl examples, 20 data sources, eForms procurement, translation, security overview |
| eForms-kenttäopas (BT) | pauhu.eu | eForms SDK 1.14 BT field reference for TED procurement data: 40+ Business Terms with Finnish descriptions, CPV codes, API response example |
| Government Procurement Training | pauhu.eu | 6-module training guide for procurement officials: EU law search, TED notices, IATE terminology, cross-references, multilingual search, compliance checklists |
| Demo: Government Procurement | pauhu.eu | Step-by-step walkthrough: search EUR-Lex, translate to Finnish, check national transposition, sovereign deployment |
| Demo: eForms Procurement Search | pauhu.eu | Search TED notices by BT fields, CPV codes, country comparison, monitoring recipes, CSV/JSON export |
| Demo: Pharmaceutical EMA Compliance | pauhu.eu | EMA variation procedures, ECHA substance checks, SmPC translation, CURIA case law monitoring |
| MACC Guide | all | Microsoft Azure Consumption Commitment - hot-swap Azure workloads to Pauhu at identical North Europe EUR rates |
| Changelog | all | Release notes: Document extraction integration, flat-tier pricing, 20 data feeds, browser-native inference |
| Getting Started | pauhu.eu | 7-section guide: signup, first search, filters, products, export, chat, next steps |
| Sovereign AI | pauhu.eu | How Pauhu thinks: dual-hemisphere architecture, browser-native inference, grounded generation |
| Install Sovereign AI | pauhu.eu | 8-container self-hosted deployment guide for air-gapped and on-premises environments |
| Chip-Agnostic Architecture | pauhu.eu | Why Pauhu runs on any device: browser runtime, ARM/x86, browser-native inference |
| Two-Path Pricing | pauhu.eu | Flat-tier data licensing model explained |
| Onboarding Wizard | pauhu.eu | Step-by-step account setup and configuration walkthrough |
| Guide vs. Encyclopedia | pauhu.eu | How Pauhu differs from static reference databases: guided search vs. keyword lookup |
Support
Technical: support@pauhu.eu
Sales: sales@pauhu.eu