Why Pauhu Runs on Any Chip
Browser-native inference with automatic GPU acceleration. No CUDA dependency, no GPU vendor lock-in. From a government server in Helsinki to a smartphone in Lisbon - the same models, the same accuracy, EUR 0 inference cost.
- 1. The runtime: browser-native inference
- 2. Adaptive model loading
- 3. Device detection and progressive download
- 4. No CUDA, no lock-in
- 5. Market impact
- 6. FAQ
1. The Runtime: Browser-Native Inference
Pauhu models are exported in an open, industry-standard format supported by every major ML framework. At inference time, models run inside the browser via an optimized runtime, which supports two execution backends:
| Backend | Technology | Best for |
|---|---|---|
| Cross-platform execution | CPU-based, runs everywhere | Universal compatibility. Any browser, any device. |
| GPU acceleration | GPU-accelerated, shader-based | Faster inference on devices with a GPU. Automatic fallback to CPU if unavailable. |
The runtime selects the best backend automatically. On a laptop with a discrete GPU, GPU acceleration speeds up inference. On a smartphone or a locked-down government workstation without GPU drivers, CPU-based execution provides the same results at a slightly slower speed. The model weights are identical in both cases.
2. Adaptive Model Loading
Not every device has 16 GB of RAM. Pauhu detects available memory and loads the appropriate model tier automatically:
Device memory: ≤4 GB
Model: Encoder-only (lightweight)
Capability: Search and retrieval. Instant lookups across 4.8M documents. No generation.
Use case: Smartphones, tablets, low-spec laptops, embedded kiosks.
Device memory: 4–8 GB
Model: Multilingual model - encoder + decoder
Capability: Search + grounded answer generation with citations. 24 EU languages.
Use case: Office laptops, standard government workstations.
Device memory: 8 GB+
Model: Multilingual model - full encoder + decoder
Capability: Full pipeline: search, generation, topic classification, legal modality analysis, translation.
Use case: Developer machines, servers, self-hosted containers.
Tier selection is automatic but can be overridden. In a self-hosted deployment, you can pin a specific tier via environment variable:
# Force Tier C regardless of detected memory
PAUHU_MODEL_TIER=full docker compose up -d
3. Device Detection and Progressive Download
The loading sequence is designed to minimise time-to-first-result:
1. Detect available device memory
2. Select tier (A, B, or C)
3. Download encoder (search model) → search is available immediately
4. Download decoder (generation model) → generation available when ready
5. Cache both locally → subsequent visits are instant
Encoder first, decoder lazy
The encoder (search/retrieval) downloads first because it is smaller and provides immediate value. Users can search and browse results while the decoder downloads in the background. If a user only needs search, the decoder is never downloaded at all.
Progressive download
| Phase | Tier A | Tier B | Tier C |
|---|---|---|---|
| Encoder | Small | Small | Large |
| Decoder | - | Small | Large |
| Total | Lightweight | Medium | Full |
All models are optimised and compressed to reduce file size and inference latency without measurable accuracy loss on EU legal benchmarks.
4. No CUDA, No Lock-In
Most AI systems require NVIDIA GPUs with CUDA drivers. This creates three problems for government and enterprise buyers:
- Hardware lock-in: You must buy NVIDIA GPUs, which are supply-constrained and expensive.
- Driver dependency: CUDA drivers must be installed and maintained, which requires system-level access that many IT policies restrict.
- Chip sovereignty: NVIDIA is a US company subject to US export controls. Dependency on a single non-EU chip vendor is a supply chain risk.
Pauhu avoids all three. The optimized runtime compiles models to a cross-platform format that runs on any CPU architecture:
| Architecture | Examples | Status |
|---|---|---|
| x86-64 | Intel, AMD (most desktops and servers) | Full support |
| ARM64 | Apple Silicon (M1–M4), Qualcomm Snapdragon, AWS Graviton | Full support |
| ARM32 | Older Android devices, Raspberry Pi | CPU only (Tier A) |
| RISC-V | Emerging open-standard processors | CPU only (Tier A) |
5. Market Impact
There are approximately 3.5 billion smartphone users worldwide. Every one of them has a device capable of running Pauhu Tier A inference - search across 4.8 million EU documents at zero marginal cost.
For government buyers, chip-agnostic inference means:
- No GPU procurement: Run Pauhu on existing hardware. No additional capital expenditure.
- No cloud dependency: Inference happens on-device or on-premises. No data leaves your environment.
- No per-query cost: Once models are downloaded, every query is free. There is no API metering, no token counting, no usage-based billing.
- Future-proof: As new chip architectures emerge (RISC-V, Arm Neoverse, custom EU silicon), Pauhu’s cross-platform runtime runs on them without any changes.
The arithmetic
A cloud LLM charges EUR 0.01–0.03 per query. At 1,000 queries/day across a government ministry, that is EUR 10–30/day, or EUR 3,650–10,950/year - for a single ministry. Pauhu’s on-device inference costs EUR 0 per query after the initial subscription. The models run on hardware you already own.
6. FAQ
Does browser-native inference have good performance?
For the model sizes Pauhu uses, browser-native inference on a modern laptop completes search queries in under 30 ms and generation in 1–3 seconds. This is comparable to cloud API latency when you include network round-trip time.
What browsers are supported?
All modern browsers: Chrome 90+, Firefox 89+, Safari 15+, Edge 90+. GPU acceleration requires Chrome 113+ or Edge 113+. Older browsers fall back to CPU-based execution automatically.
Can I force a specific tier?
Yes. In the VS Code extension: pauhu.model.tier setting. In the container: PAUHU_MODEL_TIER environment variable. In the browser: ?tier=full URL parameter.
What about offline use?
Once models are cached in IndexedDB (browser) or on disk (container), Pauhu works fully offline. No network connection needed for search or generation. See the Data Sovereignty section for details.
Is the model format an open standard?
Yes. The model format Pauhu uses is maintained by the LF AI & Data Foundation (part of the Linux Foundation). It is supported by Microsoft, Meta, Google, Intel, AMD, and others. There is no single-vendor dependency.
Sovereign Architecture · Data Pipeline · Grounded Generation Architecture