Install Pauhu Sovereign AI

Eight containers. One system. Each container handles a specific function - from search to speech to answer generation. This guide takes you from a bare server to a fully operational sovereign AI.

1. Prerequisites

Hardware

ProfileCPURAMDiskContainers
Minimum 4 cores (x86_64 or ARM64) 16 GB 100 GB SSD Core 5 containers
Recommended 8 cores 32 GB 250 GB NVMe All 8 containers
Full + GPU 8+ cores + NVIDIA GPU 64 GB 500 GB NVMe All 8 + GPU inference

No GPU required. All models are optimized for CPU inference. GPU accelerates answer generation but is not needed for search, translation, classification, or voice.

Software

License agreement required. Use of the Pauhu Sovereign AI is governed by the Pauhu LDS Connector End User License Agreement. By deploying these containers you agree to the EULA terms. The containers are provided as binary images - source code is not included and reverse engineering is prohibited (EULA §3.2). Underlying EU institutional data is licensed under CC-BY 4.0 per Data Terms.
Air-gapped deployment: No internet is required after installation. The container images are delivered via SFTP, physical media, or your private container registry. See Sovereign AI §6 for delivery methods.

What is NOT required

2. Quickstart

From a clean server to a running system in four commands. Allow 15 minutes on recommended hardware (10 minutes on NVMe with pre-bundled delivery).

# 1. Load the container images (from SFTP download or USB delivery)
docker load -i pauhu-sovereign-ai-v1.tar.gz

# 2. Create the data directory
mkdir -p /opt/pauhu/data

# 3. Start the system
docker compose -f docker-compose1.yaml -f docker-compose-pauhu.yaml \
  --profile production --profile pauhu up -d

# 4. Verify all 8 containers are healthy
docker compose -f docker-compose1.yaml -f docker-compose-pauhu.yaml ps

Expected output after step 4:

NAME               STATUS    PORTS
pauhu-compass      healthy   8060/tcp
pauhu-answer          healthy   8050/tcp
pauhu-nmt          healthy   8080/tcp
pauhu-specialist   healthy   8070/tcp
pauhu-tts          healthy   8000/tcp
pauhu-gateway      healthy   8090/tcp
pauhu-mcp          healthy   3100/tcp
pauhu-llm-adapter  healthy   8001/tcp   (optional, sovereign-llm profile)

Open http://<your-server>:8090 to access the search interface. Navigate to /pauhu for the admin panel.

First-start index build: On first launch, the search index takes 5–10 minutes to build. During this time, search queries may return empty results. Translation, TTS, and classification are available immediately.

Quick verification

# Search for EU AI Act (via nginx reverse proxy)
curl -s http://localhost/api/compass?q=artificial+intelligence+regulation | head -20

# Translate to Finnish
curl -s -X POST http://localhost/api/translate \
  -H "Content-Type: application/json" \
  -d '{"text": "The regulation enters into force", "target": "fi"}'

# Ask a question (grounded answer)
curl -s -X POST http://localhost/api/answer \
  -H "Content-Type: application/json" \
  -d '{"message": "Does the EU AI Act apply to procurement systems?"}'

# Check container health
curl -s http://localhost/health/compass
curl -s http://localhost/health/answer
curl -s http://localhost/health/translate

3. Architecture diagram

Each of the 8 containers handles a specific function. The diagram below shows how they connect.


  RETRIEVAL (comprehension)                GENERATION (synthesis)
  ┌─────────────────────────┐              ┌─────────────────────────┐
  │                         │              │                         │
  │  pauhu-compass          │              │  pauhu-answer              │
  │  (search engine)        │              │  (answer generation)    │
  │  Semantic search,       │              │  Model-agnostic,        │
  │  EU documents in 26ms   │              │  grounded answers       │
  │                         │              │                         │
  │  pauhu-specialist       │              │  pauhu-nmt              │
  │  (classification)       │              │  (translation)          │
  │  Domain classifiers     │              │  552 language pairs     │
  │                         │              │                         │
  │                         │              │  pauhu-tts              │
  │                         │              │  (speech)               │
  │                         │              │  Voice in 24 languages  │
  └────────────┬────────────┘              └────────────┬────────────┘
               │                                        │
               └──────────┐    ┌────────────────────────┘
                          │    │
                    ══════════════════
                    ║ pauhu-gateway  ║
                    ║   API GATEWAY  ║
                    ║ (request relay)║
                    ══════════════════
                          │
          ┌───────┐       │       ┌──────────┐
          │Docker │       │       │ pauhu-mcp│
          │volumes│  DATA STORE   │ (context)│
          │docs + │       │       │ IATE +   │
          │terms  │       │       │ EUR-Lex  │
          └───────┘       │       └──────────┘
                 ┌────────┴────────┐
                 │  VALIDATION     │
                 │  Safety checks  │
                 │  and sequencing │
                 │                 │
                 └────────┬────────┘
                          │
                 ┌────────┴────────┐
                 │  RUNTIME        │
                 │  Docker runtime │
                 └─────────────────┘

The 8 containers

pauhu-compass
Search engine

Semantic search across EU documents from 24 sources. Returns the exact paragraph in 26 milliseconds. Port 8060

pauhu-answer
Answer generation

Retrieval-augmented answer generation. Model-agnostic - swap the model, keep the grounding. Reads 3–10 retrieved passages and produces a grounded answer with citations. Port 8050

pauhu-nmt
Translation

Translation models in optimized format. 552 language pairs across all 24 EU official languages. CPU-only, no external API calls. Port 8080

pauhu-specialist
Domain classification

Domain specialist models for named entity recognition, regulatory classification, and compliance detection. Port 8070

pauhu-tts
Speech production

Text-to-speech engine in optimized format. Text-to-speech for all 24 EU official languages. Read legislation aloud for accessibility compliance. Port 8000

pauhu-gateway
API gateway

Single entry point that routes requests to all services. Validation and safety checks on every request. SHA-256 audit trail. Admin panel at /pauhu. Port 8090

pauhu-mcp
Context server

MCP server with IATE (2.4M terms), EUR-Lex context, eForms BT fields. Powers the pauhu.ai VS Code extension and terminal CLI. Port 3100

pauhu-llm-adapter
Optional - sovereign LLM bridge

Model-agnostic bridge. Connect any LLM - bring your own weights, your own API, your own choice. We provide the grounded context. Port 8001

Container resources

ContainerRAMDiskCPURequired?
pauhu-compass2–4 GB5 GB2 coresYes
pauhu-answer2–6 GB2 GB2 coresYes
pauhu-gateway0.5 GB0.1 GB1 coreYes
pauhu-specialist1–2 GB5 GB1 coreYes
pauhu-mcp0.25 GB0.1 GB0.5 coreYes
pauhu-nmt2–6 GB15 GB1 coreRecommended
pauhu-tts0.5–2 GB3 GB1 coreOptional
pauhu-llm-adapter2–8 GBvaries1+ coreOptional
Minimum viable deployment: 5 core containers (compass + answer + gateway + specialist + mcp) run on an 8-core, 16 GB server. Add NMT for translation, TTS for voice output, and llm-adapter for a sovereign LLM as needed. All 8 containers require 32 GB.

4. Model swap guide

All AI models are stored as optimized model files in the data volume. You can swap any model without rebuilding containers.

Where models live

/opt/pauhu/data/models/
├── answer/                  # Answer generation model
│   ├── encoder.model         # Encoder, INT8 quantized
│   ├── decoder.model         # Decoder with cross-attention
│   ├── tokenizer.model      # SentencePiece tokenizer
│   └── manifest.json        # SHA-256 checksums
├── specialist/              # Domain classifiers
│   ├── law.model
│   ├── environment.model
│   ├── procurement.model
│   ├── ... (18 more)
│   └── manifest.json
├── nmt/                     # Translation models (552 pairs)
│   ├── en-fi.model
│   ├── fi-en.model
│   ├── en-de.model
│   ├── ... (549 more)
│   └── manifest.json
└── tts/                     # Voice models (24 languages)
    ├── en.model
    ├── fi.model
    ├── ... (22 more)
    └── manifest.json

Swap a model

# 1. Stop the container that uses the model
docker compose stop pauhu-answer

# 2. Replace the model file
cp /path/to/new/encoder.model /opt/pauhu/data/models/answer/encoder.model
cp /path/to/new/decoder.model /opt/pauhu/data/models/answer/decoder.model

# 3. Update the manifest with new checksums
sha256sum /opt/pauhu/data/models/answer/*.model > /opt/pauhu/data/models/answer/manifest.json

# 4. Restart the container
docker compose start pauhu-answer

# 5. Verify the new model loads
curl -s http://localhost/health/answer | python3 -m json.tool

Swap a domain specialist

# Replace only the law domain model with a retrained version
docker compose stop pauhu-specialist
cp /path/to/law-v2.model /opt/pauhu/data/models/specialist/law.model
sha256sum /opt/pauhu/data/models/specialist/law.model >> /opt/pauhu/data/models/specialist/manifest.json
docker compose start pauhu-specialist

Add a new translation pair

# Add Irish (ga) ↔ English (en) model
cp ga-en.model /opt/pauhu/data/models/nmt/ga-en.model
cp en-ga.model /opt/pauhu/data/models/nmt/en-ga.model
docker compose restart pauhu-nmt
Integrity check: On startup, each container verifies the SHA-256 checksums in its manifest. If a model file does not match its manifest entry, the container will log a warning and refuse to load that model. Always update the manifest after replacing a model file.

Bring your own LLM

The answer generation container is model-agnostic - the retrieval-grounding-citation pattern is the product, the model is swappable. For an additional sovereign LLM, enable the pauhu-llm-adapter container:

# Enable the sovereign LLM profile
docker compose -f docker-compose1.yaml -f docker-compose-pauhu.yaml \
  --profile production --profile pauhu --profile sovereign-llm up -d

# Configure in .env:
MODEL_PROVIDER=local              # or: openai-compatible
MODEL_NAME=your-model-name
MODEL_PATH=/models/your-model    # volume-mounted

The gateway routes LLM requests through the same evidence bridge - the LLM receives only verified passages from the compass container, not raw user input. Three adapter patterns: OpenAI-compatible API, local model, or edge inference.

5. Admin panel

The admin panel (http://<your-server>/pauhu) is served by pauhu-gateway and provides a web interface for managing your Pauhu installation.

Dashboard

The main dashboard shows real-time status of all 8 containers:

Query logs

Every query is logged locally with a SHA-256 audit hash. The admin panel lets you:

Container management

Access control

The admin panel requires authentication. On first launch, it generates a random admin password and prints it to the container logs:

# View the initial admin password
docker compose logs pauhu-gateway | grep "Admin password"

# Change the admin password
curl -X POST http://localhost/pauhu/api/admin/password \
  -H "Authorization: Bearer <current-password>" \
  -H "Content-Type: application/json" \
  -d '{"new_password": "your-secure-password"}'
Restrict access: The admin panel is served at /pauhu on the gateway (port 8090). In production, use your firewall or nginx rules to restrict the /pauhu path to your management network only.

6. Feed subscriptions

The cloud version of Pauhu receives continuous updates from 20 EU institutional sources via automated sync. In a sovereign deployment, you control when and how data updates arrive.

Update methods

MethodHow it worksBest for
Automatic (connected) Server connects to Pauhu's EU update endpoint on a schedule you define. Downloads only new and changed documents since last sync. Servers with internet access. Recommended for most installations.
Manual (SFTP) Download a data update package from your Pauhu account. Transfer it to the server via SFTP. Apply with one command. Restricted networks where outbound connections are controlled.
Air-gapped (physical) Data update package delivered on encrypted media. Load onto the server via USB. Apply with one command. Classified environments with no network access.

Configure automatic updates

# In /opt/pauhu/.env
PAUHU_LICENSE_KEY=your-license-key
PAUHU_UPDATE_ENDPOINT=https://pauhu.eu/v1/sovereign
PAUHU_UPDATE_SCHEDULE=0 2 * * 1    # Weekly, Monday at 02:00
PAUHU_UPDATE_SOURCES=all            # Or: eurlex,ted,curia (comma-separated)

The compass container checks for updates on the schedule you define. It downloads only delta packages (new and modified documents), verifies SHA-256 checksums, and applies them to the local databases. The search indexes are rebuilt automatically.

Apply a manual update

# Transfer the update package to the server
scp pauhu-update-2026-03.tar.gz admin@your-server:/opt/pauhu/updates/

# Apply the update
docker exec pauhu-compass /update /updates/pauhu-update-2026-03.tar.gz

# Verify the update
docker exec pauhu-compass /health/data-freshness

Subscribe to specific sources

You can subscribe to all 20 sources or a subset. Configure in the admin panel under Settings → Feed Subscriptions, or via the .env file:

SourceIdentifierUpdate frequency (cloud)
EUR-LexeurlexEvery 4 hours (weekdays)
TEDtedEvery 6 hours
National LawlexDaily
CURIAcuriaDaily
OEILoeilEvery 4 hours
IATEiateDaily
ECBecbDaily
EMAemaDaily
EPO*epoDaily
ECHAechaWeekly
All 20 sourcesallMixed (see above)

* EPO patent data requires a separate Data Use Agreement with the European Patent Office. Contact sales@pauhu.eu for status.

Delta updates only. The system never re-downloads the entire dataset. After the initial delivery (4.8M documents), updates contain only new and modified documents. A typical weekly update is 50–200 MB.

7. VS Code extension

The Pauhu VS Code extension connects your IDE to the Pauhu server. It provides EU regulatory context, terminology lookup, and compliance checks directly in your editor - useful for policy drafting, legislative analysis, and procurement document preparation.

Install

# From the VS Code marketplace
code --install-extension pauhu.pauhu-eu

# Or from the .vsix file (air-gapped install)
code --install-extension /path/to/pauhu-eu-1.0.0.vsix

Configure

Open VS Code settings (Ctrl+,) and search for pauhu:

{
  "pauhu.serverUrl": "http://your-server",
  "pauhu.apiKey": "",
  "pauhu.language": "en",
  "pauhu.showTerminology": true,
  "pauhu.showClassification": true
}
SettingDefaultDescription
pauhu.serverUrlhttp://localhostURL of your Pauhu server (nginx reverse proxy)
pauhu.apiKey(empty)API key from the admin panel. Leave empty if your server allows unauthenticated access on LAN.
pauhu.languageenDefault language for terminology lookup. Any of the 24 EU official language codes.
pauhu.showTerminologytrueShow IATE terminology annotations inline
pauhu.showClassificationtrueShow domain classification in the status bar

Features

MCP integration: The VS Code extension also supports the MCP Sovereign Mode protocol. If you use AI coding assistants that support MCP (e.g., Claude Code, GitHub Copilot), Pauhu provides 4 MCP tools for EU regulatory context.

8. Troubleshooting

Container won't start

# Check container logs
docker compose logs pauhu-compass --tail 50

# Check available disk space
df -h /opt/pauhu/data

# Check available memory
free -h

# Verify the container image is loaded
docker images | grep pauhu
SymptomCauseFix
Container exits immediately Insufficient RAM Check docker compose logs <container> for OOM messages. Increase RAM or stop non-essential containers.
Model integrity check failed Model file corrupted or manifest mismatch Re-copy the model file and update the manifest. See Section 4.
Port already in use Another service on the same port Edit port mappings in docker-compose.yml or stop the conflicting service.
Search returns no results Search index still building Wait for pauhu-compass to reach "healthy" status. Initial index build takes 5–10 minutes on first start.
Translation timeout Language pair model not loaded Check docker compose logs pauhu-nmt. Verify the language pair model file exists in /opt/pauhu/data/models/nmt/.

Performance tuning

IssueTuning
Search is slow (>100ms) Increase pauhu-compass RAM to 4 GB. The search index is loaded into memory on startup - more RAM means more of the index is cached.
Answer generation is slow (>5s) Answer generation runs on CPU by default. For faster inference, add a GPU and set PAUHU_ANSWER_DEVICE=cuda in .env. Alternatively, reduce the number of passages with PAUHU_ANSWER_TOP_K=3 (default: 5).
High disk I/O Move /opt/pauhu/data to NVMe storage. The compass container performs frequent reads during search index lookups.
Memory pressure Stop TTS if not needed (docker compose stop pauhu-tts). Translation models can be restricted to a subset of language pairs by setting PAUHU_NMT_LANGUAGES=en,fi,de,fr,sv.

Verify data integrity

# Check all model manifests
docker exec pauhu-answer /verify-integrity
docker exec pauhu-specialist /verify-integrity
docker exec pauhu-nmt /verify-integrity
docker exec pauhu-tts /verify-integrity

# Check data integrity (document count, index status)
docker exec pauhu-compass /verify-integrity

# Full system health report
curl -s http://localhost/health/all | python3 -m json.tool

Reset to factory state

This deletes all query logs and custom configurations. Model files and data are preserved.
# Stop all containers
docker compose down

# Remove configuration (keeps models and data)
rm -rf /opt/pauhu/data/config /opt/pauhu/data/logs

# Restart
docker compose up -d

Get help