Why Pauhu Runs on Any Chip

Browser-native inference with automatic GPU acceleration. No CUDA dependency, no GPU vendor lock-in. From a government server in Helsinki to a smartphone in Lisbon - the same models, the same accuracy, EUR 0 inference cost.

1. The runtime: browser-native inference
2. Adaptive model loading
3. Device detection and progressive download
4. No CUDA, no lock-in
5. Market impact
6. FAQ

1. The Runtime: Browser-Native Inference

Pauhu models are exported in an open, industry-standard format supported by every major ML framework. At inference time, models run inside the browser via an optimized runtime, which supports two execution backends:

Backend	Technology	Best for
Cross-platform execution	CPU-based, runs everywhere	Universal compatibility. Any browser, any device.
GPU acceleration	GPU-accelerated, shader-based	Faster inference on devices with a GPU. Automatic fallback to CPU if unavailable.

The runtime selects the best backend automatically. On a laptop with a discrete GPU, GPU acceleration speeds up inference. On a smartphone or a locked-down government workstation without GPU drivers, CPU-based execution provides the same results at a slightly slower speed. The model weights are identical in both cases.

No installation required. Because the optimized runtime runs inside the browser, there is nothing to install, no driver to configure, and no system-level permissions to request. Open the page, and inference begins.

2. Adaptive Model Loading

Not every device has 16 GB of RAM. Pauhu detects available memory and loads the appropriate model tier automatically:

Tier A - Lite

Device memory: ≤4 GB
Model: Encoder-only (lightweight)
Capability: Search and retrieval. Instant lookups across 4.8M documents. No generation.
Use case: Smartphones, tablets, low-spec laptops, embedded kiosks.

Tier B - Standard

Device memory: 4–8 GB
Model: Multilingual model - encoder + decoder
Capability: Search + grounded answer generation with citations. 24 EU languages.
Use case: Office laptops, standard government workstations.

Tier C - Full

Device memory: 8 GB+
Model: Multilingual model - full encoder + decoder
Capability: Full pipeline: search, generation, topic classification, legal modality analysis, translation.
Use case: Developer machines, servers, self-hosted containers.

Tier selection is automatic but can be overridden. In a self-hosted deployment, you can pin a specific tier via environment variable:

# Force Tier C regardless of detected memory
PAUHU_MODEL_TIER=full docker compose up -d

3. Device Detection and Progressive Download

The loading sequence is designed to minimise time-to-first-result:

  1. Detect available device memory
  2. Select tier (A, B, or C)
  3. Download encoder (search model) → search is available immediately
  4. Download decoder (generation model) → generation available when ready
  5. Cache both locally → subsequent visits are instant

Encoder first, decoder lazy

The encoder (search/retrieval) downloads first because it is smaller and provides immediate value. Users can search and browse results while the decoder downloads in the background. If a user only needs search, the decoder is never downloaded at all.

Progressive download

Phase	Tier A	Tier B	Tier C
Encoder	Small	Small	Large
Decoder	-	Small	Large
Total	Lightweight	Medium	Full

All models are optimised and compressed to reduce file size and inference latency without measurable accuracy loss on EU legal benchmarks.

4. No CUDA, No Lock-In

Most AI systems require NVIDIA GPUs with CUDA drivers. This creates three problems for government and enterprise buyers:

Hardware lock-in: You must buy NVIDIA GPUs, which are supply-constrained and expensive.
Driver dependency: CUDA drivers must be installed and maintained, which requires system-level access that many IT policies restrict.
Chip sovereignty: NVIDIA is a US company subject to US export controls. Dependency on a single non-EU chip vendor is a supply chain risk.

Pauhu avoids all three. The optimized runtime compiles models to a cross-platform format that runs on any CPU architecture:

Architecture	Examples	Status
x86-64	Intel, AMD (most desktops and servers)	Full support
ARM64	Apple Silicon (M1–M4), Qualcomm Snapdragon, AWS Graviton	Full support
ARM32	Older Android devices, Raspberry Pi	CPU only (Tier A)
RISC-V	Emerging open-standard processors	CPU only (Tier A)

GPU acceleration without CUDA: When a GPU is available, Pauhu uses the device’s native graphics API (Vulkan on Linux, Metal on macOS, Direct3D on Windows) - not CUDA. This means any GPU from any vendor (Intel, AMD, Apple, Qualcomm, ARM Mali) accelerates Pauhu inference.

5. Market Impact

There are approximately 3.5 billion smartphone users worldwide. Every one of them has a device capable of running Pauhu Tier A inference - search across 4.8 million EU documents at zero marginal cost.

For government buyers, chip-agnostic inference means:

No GPU procurement: Run Pauhu on existing hardware. No additional capital expenditure.
No cloud dependency: Inference happens on-device or on-premises. No data leaves your environment.
No per-query cost: Once models are downloaded, every query is free. There is no API metering, no token counting, no usage-based billing.
Future-proof: As new chip architectures emerge (RISC-V, Arm Neoverse, custom EU silicon), Pauhu’s cross-platform runtime runs on them without any changes.

The arithmetic

A cloud LLM charges EUR 0.01–0.03 per query. At 1,000 queries/day across a government ministry, that is EUR 10–30/day, or EUR 3,650–10,950/year - for a single ministry. Pauhu’s on-device inference costs EUR 0 per query after the initial subscription. The models run on hardware you already own.

6. FAQ

Does browser-native inference have good performance?

For the model sizes Pauhu uses, browser-native inference on a modern laptop completes search queries in under 30 ms and generation in 1–3 seconds. This is comparable to cloud API latency when you include network round-trip time.

What browsers are supported?

All modern browsers: Chrome 90+, Firefox 89+, Safari 15+, Edge 90+. GPU acceleration requires Chrome 113+ or Edge 113+. Older browsers fall back to CPU-based execution automatically.

Can I force a specific tier?

Yes. In the VS Code extension: pauhu.model.tier setting. In the container: PAUHU_MODEL_TIER environment variable. In the browser: ?tier=full URL parameter.

What about offline use?

Once models are cached in IndexedDB (browser) or on disk (container), Pauhu works fully offline. No network connection needed for search or generation. See the Data Sovereignty section for details.

Is the model format an open standard?

Yes. The model format Pauhu uses is maintained by the LF AI & Data Foundation (part of the Linux Foundation). It is supported by Microsoft, Meta, Google, Intel, AMD, and others. There is no single-vendor dependency.

Sovereign Architecture · Data Pipeline · Grounded Generation Architecture