Compass Integration

How search retrieval feeds grounded AI answers. Left hemisphere meets right.

Architecture overview

Pauhu® uses a two-hemisphere architecture inspired by human neuroanatomy. The left hemisphere retrieves facts; the right hemisphere generates grounded answers from them.

Left hemisphere — Retrieval

The hybrid search searches across 24 EU data sources using hybrid BM25 + semantic similarity. It returns ranked paragraphs with confidence scores in approximately 26 ms. Each paragraph carries full provenance metadata: source document, publication date, language, and institution.

Right hemisphere — Generation

FiD (Fusion-in-Decoder) reads the retrieved paragraphs and generates a grounded answer. The model runs entirely in the browser via ONNX Runtime, producing responses in approximately 3 seconds. No server-side inference is required for the browser-native path.

Bridge

Ranked paragraphs cross from retrieval to generation through a structured bridge. Every generated token traces back to a source paragraph. If the retrieval step returns no relevant sources, the generation step does not produce an answer — this is the core anti-hallucination guarantee.

Data flow

A query passes through the following steps from user input to grounded answer:

  1. Query received — User sends a query via POST /v1/chat.
  2. Intent classification — The query is classified by intent: search, translate, code, app, or chat.
  3. Left hemisphere search — Relevant products are searched based on domain scope (see domain scoping below).
  4. Semantic ranking — Top paragraphs are ranked by the hybrid search using cosine similarity via Born rule. Each paragraph receives a confidence score.
  5. Sources streamed — Paragraphs and metadata are streamed to the client as an SSE sources event.
  6. Right hemisphere reads — The browser-side FiD model reads the retrieved paragraphs and fuses them into context.
  7. Grounded answer generated — The generated answer includes inline citations to source documents. Each citation links to the exact paragraph.
  8. No hallucination — If no relevant sources are found, no answer is generated. The system returns the source results only.

24 products searchable

Compass searches across 24 data products covering EU institutions, national law, terminology, and open knowledge:

ProductSource institution
commissionEuropean Commission
consiliumCouncil of the European Union
cordisCommunity Research and Development Information Service
curiaCourt of Justice of the European Union
dataeuropadata.europa.eu (European Data Portal)
dppDigital Product Passport (ESPR)
ecbEuropean Central Bank
echaEuropean Chemicals Agency
emaEuropean Medicines Agency
epoEuropean Patent Office
europarlEuropean Parliament
eurlexEUR-Lex (Official Journal of the EU)
eurostatEurostat (Statistical Office)
iateInter-Active Terminology for Europe
lexNational legislation (27 EU member states)
newsEU institutional press releases
oeilLegislative Observatory (European Parliament)
osmOpenStreetMap (geospatial)
publicationsEU Publications Office
tedTenders Electronic Daily (public procurement)
weatherMeteorological data
whoiswhoEU institutional directory
wikiWikipedia (multilingual)
codeOpen-source code (GitHub, npm, PyPI, crates.io)

Domain scoping

Each Pauhu domain searches a different subset of products, tailored to its use case:

DomainScope
pauhu.ai / pauhu.eu / pauhu.comAll 24 products
pauhu.deveurlex, iate, wiki, code
pauhu.ioeurostat, echa, ema, dpp, dataeuropa, osm, weather

Domain scoping is enforced server-side. A query on pauhu.dev will never return results from ted or europarl, for example.

Integration example

Stream a grounded answer with source paragraphs using the chat endpoint:

const response = await fetch('https://staging.pauhu.eu/v1/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'What are the GDPR fines for data breaches?',
    language: 'en'
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const text = decoder.decode(value);
  // Parse SSE events: sources, paragraphs, status, done
  for (const line of text.split('\n')) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.slice(6));
      console.log(event);
    }
  }
}

The SSE stream emits the following event types:

EventDescription
sourcesRanked source paragraphs with metadata and confidence scores
paragraphsFull paragraph text for each source
statusGeneration progress updates
doneFinal answer with inline citations

Key properties

PropertyDescription
GroundedEvery answer cites exact source paragraphs. No answer is generated without supporting evidence.
VerifiableClick any citation to view the original document at its source institution.
Offline-capableThe FiD model is cached in the browser. Once loaded, generation works without an internet connection.
Zero inference costThe browser-native path has no per-query server-side charges. Inference runs on the user's device.
24 languagesQuery in any EU official language. The hybrid search and FiD both support all 24.

Support

Technical: support@pauhu.eu

API keys: Get an API key

Full documentation: Documentation index