Data Freshness

How often each data source is synced, what the timestamps mean, and what latency to expect.

How Sync Works

Pauhu runs automated sync jobs for all 24 data products. Each job polls the upstream institutional API or portal for new and updated documents, downloads them into EU storage, annotates them (topic classification, language detection, deontic modality), and indexes them for semantic search.

The pipeline has three stages:

  1. Sync: Fetch new/changed documents from the source institution
  2. Annotate: Classify with topic annotations and deontic modalities (queue-triggered, typically within seconds)
  3. Index: Update the semantic search index (runs every 5 minutes)

End-to-end latency from source publication to searchability is typically the sync interval plus 5–10 minutes for annotation and indexing.

Continuous Sync Every 15 minutes

ProductSourceScheduleTypical Latency
National Law (lex)28 national law portalsEvery 15 minutes15–25 min

National law is synced most frequently because transposition deadlines and national legislative changes are time-sensitive. The sync job rotates through all 28 country adapters on each run.

Frequent Sync Every 4–6 hours

ProductSourceScheduleTypical Latency
EUR-LexPublications Office of the EUEvery 4 hours (weekdays)4–4.5 h
OEILEuropean ParliamentEvery 4 hours4–4.5 h
ConsiliumGeneral Secretariat of the CouncilEvery 6 hours6–6.5 h
CORDISEuropean Commission, DG ResearchEvery 6 hours6–6.5 h
data.europa.euPublications Office of the EUEvery 6 hours6–6.5 h
TEDPublications Office of the EUEvery 6 hours6–6.5 h
CommissionEuropean CommissionEvery 6 hours6–6.5 h
National Law sync28 national portalsEvery 4 hours4–4.5 h

EUR-Lex sync only runs on weekdays because the Publications Office rarely publishes on weekends. TED and other 6-hour products sync around the clock.

Daily Sync Once per day

ProductSourceScheduleTypical Latency
CURIACourt of Justice of the EUDaily (04:00 UTC)< 24 h
DPPEuropean Commission (ESPR)Daily (03:00 UTC)< 24 h
ECBEuropean Central BankDaily (06:00 UTC)< 24 h
EMAEuropean Medicines AgencyDaily (05:00 UTC)< 24 h
EPOEuropean Patent OfficeDaily (07:00 UTC)< 24 h
European ParliamentEuropean ParliamentDaily (03:00 UTC)< 24 h
PublicationsPublications Office of the EUDaily (04:00 UTC)< 24 h
WikiWikimedia FoundationDaily (04:00 UTC)< 24 h

Weekly Sync Once per week

ProductSourceScheduleTypical Latency
ECHAEuropean Chemicals AgencyWeekly (Sunday 00:00 UTC)< 7 days
EurostatEurostatWeekly (Monday 05:00 UTC)< 7 days
Who is WhoPublications Office of the EUWeekly (Monday 02:00 UTC)< 7 days

These products publish infrequently, so weekly sync is sufficient. ECHA substances change only after formal regulatory decisions. Eurostat datasets update on fixed release calendars.

Search Index Updates

After documents are synced and annotated, the search index is updated every 5 minutes. The indexing job processes newly annotated documents across all products and updates the semantic search vectors.

StageFrequencyDescription
SyncPer product (see above)Fetch from source institution
AnnotationQueue-triggeredTopic + deontic classification (typically < 30 seconds)
D1 + Vectorize indexingEvery 5 minutesInsert into database and semantic search index

Understanding Timestamps

API responses include a last_updated field. This represents the time when the document was last synced from the source institution and indexed, not the time when the source institution published the document.

{
  "id": "32024R1689",
  "title": "Regulation (EU) 2024/1689 (AI Act)",
  "date": "2024-07-12",
  "last_updated": "2026-03-12T08:15:00Z",
  "score": 0.96
}

To check when a product was last synced, use the /v1/search/:product endpoint. The response header X-Pauhu-Last-Sync contains the ISO 8601 timestamp of the most recent successful sync run.

Freshness Guarantees

TierGuarantee
Free (3 requests/day)Best effort. No guaranteed sync latency. Data is typically within the published sync interval.
Paid tiersData will be updated within the published sync interval for each product. If a sync job fails, the previous data remains available and the job retries automatically.

Sync failures are rare and typically caused by upstream source outages (e.g., EUR-Lex maintenance windows). Failed syncs retry automatically. No data is lost during temporary outages — the next successful sync picks up all missed updates.

Data Residency

EU jurisdiction only All sync jobs run within the European Union. Source data is fetched from EU institutional APIs and stored in EU-jurisdiction storage. The annotation and indexing pipeline runs entirely within the EU. No data leaves EU jurisdiction at any point. There is no third-country data transfer involved in any sync operation.

All Documentation · Data Catalog · API Reference