Sovereign Deployment
Two specialised engines working together. The search engine comprehends. The answer engine synthesises. Together they answer questions without hallucinating - because every answer is grounded in 4.8 million EU documents.
1. How Pauhu is organised
Pauhu® separates comprehension from synthesis. This is a deliberate engineering decision. We separated these two functions because combining them in a single system is what causes hallucinations. When one AI model both retrieves and generates, it invents things. When two specialised subsystems work together - one that finds evidence and one that writes from that evidence - the output is grounded in fact.
SEARCH ENGINE ANSWER ENGINE
Analytical comprehension Grounded synthesis
Reads your question ───► Reads the evidence
Searches 4.8M documents Generates a grounded answer
Classifies by topic and domain Renders in 24 languages
Applies regulatory rules Presents in your browser
Guards data integrity Produces the response
═══════════════════════
║ CONTROLLED DATA ║
║ FLOW ║
║ (verified evidence ║
║ only crosses) ║
═══════════════════════
INDEX ◄── 4.8M+ documents + 2.4M+ terms (shared) ──►
SYNC ◄── automated sync keeps data fresh ──►
GATEWAY ◄── single entry point, request validation ──►
The gateway
A single entry point receives every query, validates it, and routes it to the correct engine. The gateway decides which data sources, language models, and domain specialists are relevant to your question - before either engine does any work.
2. Search engine - comprehension
The search engine is where understanding happens. When you ask a question, this is the side that reads it, determines what you need, and finds the relevant evidence from nearly five million documents.
What it does
- Comprehends your query - understands what you are asking, even across languages, and finds the most relevant passages in milliseconds
- Classifies by domain - automatically determines whether your query relates to law, environment, procurement, pharmaceuticals, patents, or any of 21 EU policy domains
- Applies regulatory rules - determines what is prohibited, what is mandatory, what is permitted, and what is exempt under the relevant regulation
- Guards data quality - validates sources, checks document integrity, and ensures that only verified institutional data reaches the answer engine
Why it matters
The search engine does four things in sequence: comprehend the question, classify the domain, apply the rules, and protect the integrity. Only after all four steps produce a verified result does the answer engine receive the evidence.
3. Answer engine - synthesis
The answer engine takes the evidence found by the search engine and produces the output you see. It reads multiple passages simultaneously and writes a single coherent answer - always grounded in the source documents, never fabricated.
What it does
- Fuses multiple sources - reads 3 to 10 retrieved passages at once and generates an answer that draws on all of them, with citations
- Renders in your browser - every response is displayed natively, with no plugins or extensions required
- Speaks 24 languages - answers in any EU official language, with terminology validated against 2.4M+ IATE entries
- Drives the interface - search, chat, translation, and document analysis all come from this engine
Why it matters
The answer engine fuses fragments into narrative, renders the layout, and adapts the language to the audience. It can only work with evidence the search engine has already verified - it cannot reach back into raw data or generate from statistical patterns alone.
4. The data flow between them
A dedicated data flow carries verified evidence from the search engine to the answer engine. This is not a single API call - it is a structured pipeline that ensures:
- Only verified evidence crosses. The search engine's quality checks must pass before data reaches the answer engine. No unverified claims, no unchecked sources.
- Every crossing is auditable. Each passage that moves from comprehension to synthesis is logged with a SHA-256 integrity hash. Your compliance team can reconstruct exactly what evidence informed each answer.
- The pipeline is directional. Evidence flows one way only. The answer engine cannot reach back into raw data - it can only work with what the search engine has validated and delivered.
5. How it stays current
EU institutions publish new legislation, court rulings, procurement notices, and regulatory updates continuously. Pauhu's automated synchronisation keeps the data current:
- Daily - all 20 EU institutional sources and 28 national law databases are synced every processing cycle
- Weekly - statistical updates, chemical substance registrations, institutional directory changes
When new data arrives, it flows through the same two-engine pipeline: the search engine indexes, classifies, and validates it. Only then does it become available to the answer engine for synthesis. This means the system never serves an answer based on data it hasn't verified.
6. The sovereign deployment
Everything described above - both engines, the data flow, the index, the automated sync - can run entirely on your hardware. This is the sovereign deployment: the same system, the same models, the same data, with no cloud dependency and no data leaving your premises.
One sentence for the CTO
A Docker container with millions of EU documents, 2.4M+ terminology entries, and 21 domain-specialist AI models - dual-engine architecture on a single server, no internet connection required.
What changes in a sovereign deployment
| Capability | Cloud (pauhu.eu) | Sovereign Deployment |
|---|---|---|
| Architecture | Two engines across EU servers | Same two engines on your server |
| Data sources | 20 EU institutional sources | Same 20 sources (snapshot included) |
| Documents | 4.8M+ | Same (delivered with container) |
| Languages | 24 EU official languages | Same 24 languages |
| Terminology | 2.4M+ IATE terms | Same (bundled locally) |
| AI models | 21 domain specialists + chat | Same models (optimized format) |
| Data leaves your network | Queries go to EU servers (Helsinki) | Never |
| Internet required | Yes | No (after deployment) |
| Data freshness | Real-time (automated synchronisation) | Snapshot at delivery; periodic updates via secure transfer |
| Hardware | Managed by Pauhu | Your server (16 GB RAM minimum) |
Hardware requirements
| Requirement | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores (x86_64 or ARM64) | 8+ cores |
| RAM | 16 GB | 32 GB |
| Storage | 100 GB SSD | 250 GB NVMe |
| GPU | Not required (CPU inference) | NVIDIA GPU for faster inference |
| OS | Any Linux with Docker | Ubuntu 22.04 LTS |
| Network | None (air-gap compatible) | LAN only (no internet) |
Delivery and installation
Three delivery methods: encrypted download via SFTP, physical media for classified environments, or push to your private container registry. Installation is a single command:
# Load and run the sovereign deployment
docker load -i pauhu-sovereign.tar.gz
docker run -d --name pauhu-sovereign --restart unless-stopped \
-e PAUHU_SOVEREIGN=true -p 3000:3000 pauhu/sovereign:latest
# Verify: search for EU AI Act
curl http://localhost:3000/v1/search?q=artificial+intelligence+regulation
No configuration files, no API keys, no cloud accounts, no database setup. The container includes both engines, the data flow pipeline, and the complete data snapshot.
7. What data is included
The sovereign deployment ships with a complete snapshot of all 20 EU institutional data sources - the same data that feeds the cloud version's search engine:
| Source | Documents | What it covers |
|---|---|---|
| EUR-Lex | 1.6M+ | EU legislation, case law, preparatory acts, international agreements |
| TED | 1.6M+ | Public procurement notices from all EU member states |
| National Law | 256k+ | National legislation from 28 countries (transposition tracking) |
| OEIL | 203k+ | Legislative Observatory - procedure files, committee reports |
| Consilium | 199k+ | Council of the EU documents, meeting outcomes |
| Publications Office | 172k+ | Official publications, EU bookshop |
| Who is Who | 161k+ | EU institutional directory (organisational charts) |
| Data Europa | 160k+ | EU Open Data Portal (datasets, metadata) |
| CURIA | 144k+ | Court of Justice of the EU (judgments, opinions) |
| Eurostat | 130k+ | Statistical tables, indicators |
| IATE | 2.4M+ terms | Inter-Active Terminology for Europe (24 languages) |
| ECB | 8,400+ | European Central Bank legal framework, opinions |
| CORDIS | 8,900+ | EU research and innovation projects |
| EMA | 5,200+ | European Medicines Agency (EPARs, product information) |
| EPO | 4,900+ | European Patent Office (patent publications) |
| ECHA | 490+ | Chemical substances (REACH, CLP, biocides) |
| DPP | 250+ | Digital Product Passport requirements (ESPR) |
| Commission | 190+ | European Commission press and decisions |
| Europarl | - | European Parliament plenary proceedings |
| Wiki | 5,400+ | Curated EU entity knowledge base |
Total: 4.8M+ documents plus 2.4M+ terminology entries in 24 languages. This is the shared index that both engines draw from.
8. What it guarantees
Once deployed, the sovereign deployment makes no network calls. It does not contact any external server, cloud API, or telemetry service. Verify this with your network monitoring tools.
All queries, search results, translations, and AI responses are processed locally. Both engines run inside the same container on your server.
Every operation - including every crossing from comprehension to synthesis - produces a SHA-256-signed audit record in a local database. Your compliance team can inspect the complete history.
Works in classified environments and air-gapped networks. The container is delivered via physical media or secure file transfer. No internet required for installation or operation.
9. Supply chain sovereignty
Most AI systems depend on a chain of external providers: cloud compute, proprietary APIs, third-party model hosting, and centralised inference services. Remove any link in the chain and the system stops working. This is a single point of failure - or multiple single points of failure.
Your AI runs in your browser
No cloud provider. No government. No single point of failure. The models run in your browser or on your server. The data sits on your storage. The inference happens on your hardware. You own the entire chain from question to answer.
What supply chain sovereignty means
- No API dependency: Pauhu does not call external AI APIs. The models are optimized files that execute locally - in the browser via browser-native processing, or on the server via a lightweight runtime. If every cloud provider went offline simultaneously, Pauhu would still work.
- No model hosting dependency: The models ship with the container or are downloaded once to the browser cache. No ongoing model-as-a-service subscription. No inference-per-token billing.
- No data dependency: The 4.8 million EU documents are included. You do not need to query an external database. The data is yours, on your volume.
- No vendor lock-in: Open model format (ISO/IEC 17203). Docker containers are OCI-compliant. The REST API follows OpenAPI 3.1. Every component uses open standards.
The geopolitical dimension
Government agencies increasingly recognise that depending on foreign-controlled AI infrastructure creates a strategic vulnerability. Executive orders, sanctions, licensing changes, or corporate acquisitions can cut off access to critical AI services overnight. Pauhu's sovereign deployment eliminates this risk: the entire system - models, data, inference - is under your control, on your soil, subject to your laws.
10. Adaptive model loading
Pauhu adapts to the hardware it runs on. Not every deployment has a GPU server with 32 GB of RAM. A civil servant’s laptop, a ministry’s standard-issue workstation, a dedicated inference server - the same architecture works on all of them, at different performance levels.
Three tiers
< 4 GB memory
Search + multilingual embeddings only, quantized (~80 MB). Paragraph retrieval in the browser. No generation.
4–16 GB memory
Search + grounded generation (~300 MB). Grounded answers with citations. Selected NMT translation pairs.
> 16 GB memory
All models: search, grounded generation, 552 NMT pairs, 21 domain classifiers, NER, specialists. Complete capability.
Why 300 MB matters
The global semiconductor supply chain is under sustained pressure. Memory prices fluctuate, procurement cycles lengthen, and government IT budgets rarely include high-end GPU servers. Pauhu’s grounded generation model fits in 300 MB of DRAM - less than a typical browser tab. This is not a limitation; it is a design decision. A model that fits in commodity hardware is a model that every government agency can deploy without special procurement.
Progressive download
Models are loaded in priority order, not all at once:
- Search models first - paragraph retrieval is available within seconds of start
- Grounded generation second - grounded answers become available in 10–30 seconds
- Translation on demand - only the language pairs you use are loaded. Finnish-English loads on first Finnish query, not at startup
11. Double Anti-Hallucination
Most AI systems rely on a single layer of defence against hallucination: either they check the output after generation, or they constrain the input. Pauhu uses both - simultaneously.
The answer engine can only generate text from passages that the search engine has retrieved and verified. If the evidence does not exist in the corpus, the answer cannot be generated. This is architectural - it is not a filter applied after the fact.
Inside the answer model itself, pathways associated with ungrounded output are identified during training and suppressed during inference. The model is prevented from activating the patterns that produce hallucinated text.
The result: every claim in a Pauhu answer traces back to a specific paragraph in a verified EU document. If the system cannot ground a statement, it says so - rather than inventing a plausible-sounding answer.
12. For procurement officers
Why government agencies choose sovereign deployment
- Data sovereignty: Your queries and results never leave your premises. No cloud processing, no data transfer to third countries, no dependency on foreign infrastructure.
- Classified environments: Works in air-gapped networks, SCIFs, and restricted environments where internet access is not available or permitted.
- GDPR Article 44: No third-country data transfers. All processing happens within your jurisdiction.
- EU AI Act Article 53: Full training data transparency. Every model includes a published summary of its training data. See the AI transparency disclosure.
- Grounded by design: The dual-engine architecture ensures every answer is traceable to verified EU institutional documents. The system cannot hallucinate because the answer engine only works with evidence the search engine has validated.
- No vendor lock-in: Standard Docker container, standard REST API, open-format models. If you stop using Pauhu, your data and audit trail remain on your hardware.
Tender-ready specifications
For public procurement (CPV code 72000000 - IT services):
- On-premises deployment with zero cloud dependency
- EU-origin software (Pauhu Ltd, Helsinki, Finland, Y-tunnus 3425757-6)
- All training data sourced from EU institutional open data
- Zone-based security architecture
- WCAG 2.1 AA compliant web interface
- REST API with OpenAPI specification
- Docker container (OCI-compliant)
- Open model format (ISO/IEC 17203-compliant runtime)
- 24 EU official languages supported
Contract model
| Item | What you get |
|---|---|
| Initial delivery | Sovereign deployment container with both engines, all 4.8M documents, 2.4M terminology entries, 21 AI models, and translation models for 24 languages |
| Data updates | Monthly or quarterly data snapshots delivered via your preferred secure channel |
| Model updates | Updated optimized models when improved versions are available (included in subscription) |
| Support | Helsinki-based technical support team. On-site deployment assistance available for EU government customers. |
| SLA | Custom SLAs available. Because the system runs on your hardware, uptime is under your control. |
Contact for government sales
Email: sales@pauhu.eu
For a demo, see the government procurement walkthrough (10-step guide using Finnish government as an example).
13. Frequently asked questions
Why two engines instead of one AI model?
Single-model systems generate text from statistical patterns. They can produce fluent, confident answers that are completely wrong. By separating comprehension from synthesis into two specialised engines, we ensure that the generation side can only work with evidence the comprehension side has verified. The result: grounded answers with citations, not plausible-sounding fabrications.
Does it really work offline?
Yes. After installation, you can disconnect the server from the network entirely. Both engines, the data flow between them, and all 4.8 million documents run locally. We encourage you to verify this with your network monitoring tools.
How fresh is the data?
The cloud version receives daily updates. The sovereign deployment contains a snapshot at the time of delivery. Data updates are delivered periodically via secure transfer - typically monthly or quarterly. The update process is a single command.
What hardware do we need?
A standard server with 16 GB RAM and 100 GB storage. No GPU required - all models run on CPU. A GPU speeds up inference but is not necessary. See Section 6 for full requirements.
Can we run it in a VM?
Yes. Docker runs in any virtualisation environment: VMware, Hyper-V, KVM, or bare metal. The container has no hardware-specific dependencies.
Is the source code available?
The sovereign deployment is provided as a container image. Source code review is available under NDA for government customers. Contact sales@pauhu.eu.
Can we integrate it with our existing systems?
The sovereign deployment exposes a standard REST API. Any system that can make HTTP requests can use it. API documentation is included in the container.
What is the licensing model?
Annual subscription per deployment. Volume discounts available for multiple installations. Contact sales@pauhu.eu for pricing.
Related documentation
- Grounded Generation Architecture - technical deep-dive for your engineering team
- The Guide vs. the Encyclopedia - why Pauhu exists
- Government Procurement Demo - 10-step walkthrough using Finnish government procurement
- MCP Sovereign Mode - developer reference for IDE integration
- Compass Search - how the 20 data sources are indexed