Browser Automation Architecture

Screenshot, reason, act, verify. Every browser action passes through safety gates with human-in-the-loop confirmation before form submission.

Overview

The Pauhu Browser Automation system automates browser-based tasks on EU institutional portals - such as TED eProcurement, EUR-Lex, and national law databases - while maintaining strict safety controls. Unlike unguarded browser automation, every action passes through safety gates before execution. Destructive actions (form submissions, data entry) require explicit human approval.

The system operates in a continuous loop: it observes the current browser state via screenshot, reasons about the next action using an AI model, executes the action (subject to safety gates), and verifies the result. This loop continues until the task is complete or the user intervenes.

The automation loop

Step-by-step

  1. Screenshot. The system captures a screenshot of the current browser viewport. It has no DOM access, only pixel-level observation.
  2. Reason. An AI model analyses the screenshot together with the task description and action history. It determines the next action: what to click, what to type, where to scroll, or whether the task is complete.
  3. Safety gate. Before execution, the proposed action is classified against four safety levels (see below). Prohibited actions are blocked. Required confirmations pause for human approval.
  4. Act. The action is executed in the browser: a click at specific coordinates, text entry into a field, scrolling, navigation, or form submission.
  5. Verify. A new screenshot is taken. The system compares the result against expected state. If the action failed or produced unexpected results, the loop can retry or escalate to the user.

Safety gates

Every action passes through safety gates before execution. The four levels control what the system may, must, must not, and need not do.

Safety levels applied to browser automation actions
LevelGate behaviourActions
Prohibition Blocks the action entirely. The system cannot proceed. Submitting forms without human approval. Navigating to domains not on the allowlist. Entering credentials. Clicking “delete” or “remove” buttons.
Obligation Pauses execution. Requires explicit human confirmation before the action proceeds. Form submission (human reviews all filled fields). Data entry into official portals. Any action that creates or modifies a record.
Permission Action may proceed while the user session is active. No confirmation needed, but the action is logged. Clicking navigation links. Selecting dropdown values. Typing search queries. Scrolling.
Exemption Read-only actions that require no gate check. Taking screenshots. Observing page state. Reading text. Waiting for page load.

Human-in-the-loop flow

When an action triggers an Obligation gate (typically form submission), the system pauses and presents a confirmation modal. The user sees:

The system does not resume until the user makes an explicit choice. This ensures no data is submitted to external portals without human review.

Action types

The system supports 7 browser action types. Each maps to a specific safety gate level.

Action types and their default safety classification
ActionDescriptionDefault gate
clickClick at specific (x, y) coordinates in the viewportPermission
typeEnter text into the currently focused fieldPermission
scrollScroll the page in a given direction and distancePermission
navigateNavigate to a URL (must be on the domain allowlist)Permission
selectSelect an option from a dropdown or listPermission
submitSubmit a form - always requires human confirmationObligation
waitWait for page load or element to appearExemption

Gate classification can be elevated by context. For example, a click on a “Submit” button is reclassified from Permission to Obligation. A navigate to a domain not on the allowlist is reclassified from Permission to Prohibition (blocked).

Action lifecycle

Each action progresses through a defined state machine with 7 possible statuses.

Action statuses
StatusMeaning
pendingAction proposed by the AI model, awaiting gate check
confirmedGate check passed (or human approved for Obligation actions)
executingAction is being performed in the browser
completedAction finished successfully, verified by screenshot
failedAction execution failed (element not found, timeout, etc.)
cancelledUser or system cancelled the action before execution
rolled_backAction was reverted by rollback to a previous step

Architecture components

ActionOverlay

Visual action indicator Shows what the system is about to do: a highlight overlay positioned at the target coordinates (x, y) with confirm and cancel buttons. For click actions, a crosshair marks the exact pixel. For type actions, the overlay shows the text to be entered. The user can approve or reject any action directly from the overlay.

SessionPanel

Session sidebar Displays the full action history for the current session: each step with its type, target, status, and timestamp. Includes a progress bar showing completion toward the task goal. Supports rollback - clicking any previous completed step offers to revert the session to that point.

ConfirmationModal

Human-in-the-loop gate (Obligation) Triggered before form submission. Displays all fields the system has filled, their values, and the target form. Three buttons: Approve (submit as-is), Edit (modify values before submission), Cancel (abort). The system is fully paused until the user responds. No timeout - the modal stays open until explicit human action.

ReplayViewer

Session replay Step-through viewer for completed sessions. Each step shows the screenshot captured at that point, the action taken, coordinates, and result. Navigate forward and backward through the entire session. Useful for auditing and debugging.

Dashboard

Portal statistics Aggregated view of automation activity: active sessions, total action count, success rate per portal (e.g., TED eProcurement, EUR-Lex). Tracks how many actions required human confirmation and how many were auto-approved.

SSE event stream

The automation executor streams real-time events to the frontend via Server-Sent Events (SSE). The connection reconnects automatically with exponential backoff (maximum 5 attempts).

Event types

SSE event types
EventPayloadWhen
action_pendingAction type, target coordinates, descriptionAI model proposes a new action
action_executingAction ID, typeAction passes gate check and begins execution
action_completedAction ID, result, screenshot URLAction finishes successfully
action_failedAction ID, error message, screenshot URLAction execution fails
session_updateSession status, current step, progressSession state changes (pause, resume, complete)
form_confirmationForm fields, values, target URLObligation gate triggers - awaiting human approval
screenshotScreenshot data (base64 or URL)New viewport capture available

Connection example

const source = new EventSource('/v1/cua/stream?session=' + sessionId);

source.addEventListener('action_pending', (e) => {
  const action = JSON.parse(e.data);
  // Show ActionOverlay at action.coordinates
});

source.addEventListener('form_confirmation', (e) => {
  const form = JSON.parse(e.data);
  // Show ConfirmationModal with form.fields
});

source.addEventListener('action_completed', (e) => {
  const result = JSON.parse(e.data);
  // Update SessionPanel with completed step
});

Session model

Each automation session targets a specific portal and task. Sessions maintain full state across the action lifecycle.

Session properties

Session object fields
FieldTypeDescription
idstringUnique session identifier (UUID v4)
statusstringSession status: active, paused, completed, failed, cancelled
portalstringTarget portal (e.g., “TED eProcurement”, “EUR-Lex”)
taskDescriptionstringNatural-language description of the task to perform
actionsarrayOrdered list of all actions in the session
currentStepintegerIndex of the current action (0-based)
createdAtISO 8601Session creation timestamp
updatedAtISO 8601Last state change timestamp

Session example

{
  "id": "cua_sess_a1b2c3d4",
  "status": "active",
  "portal": "TED eProcurement",
  "taskDescription": "Search for procurement notices matching CPV 72000000 in Finland",
  "actions": [
    {
      "step": 0,
      "type": "navigate",
      "target": "https://ted.europa.eu/en/search",
      "status": "completed",
      "timestamp": "2026-03-10T14:00:01Z",
      "screenshot": "screenshots/step-0.png"
    },
    {
      "step": 1,
      "type": "type",
      "target": "input#search-query",
      "value": "CPV 72000000",
      "coordinates": { "x": 540, "y": 312 },
      "status": "completed",
      "timestamp": "2026-03-10T14:00:03Z",
      "screenshot": "screenshots/step-1.png"
    },
    {
      "step": 2,
      "type": "click",
      "target": "Country filter: Finland",
      "coordinates": { "x": 180, "y": 480 },
      "status": "pending",
      "timestamp": "2026-03-10T14:00:05Z"
    }
  ],
  "currentStep": 2,
  "createdAt": "2026-03-10T14:00:00Z",
  "updatedAt": "2026-03-10T14:00:05Z"
}

Pause and resume

Sessions can be paused at any time. When paused, the system stops proposing new actions but retains full state. Resuming continues from the current step. This is useful when the user needs to inspect intermediate results or perform manual steps.

Audit trail

Every automation action is logged with full provenance. The audit trail is immutable - entries cannot be modified or deleted after creation.

Audit record fields

Fields captured for every automation action
FieldDescription
Session IDLinks the action to its parent session
Step numberSequential position in the session (0-based)
Action typeOne of the 7 action types (click, type, scroll, navigate, select, submit, wait)
TargetHuman-readable description of the action target (e.g., “Search button”, “CPV code input”)
CoordinatesPixel coordinates (x, y) where the action was performed
ValueFor type and select actions: the text entered or option selected
Safety gateWhich safety level was applied (Prohibition, Obligation, Permission, or Exemption) and whether it passed
Human approvalFor Obligation actions: whether the user approved, edited, or cancelled
Screenshot (before)Screenshot of the viewport before the action was executed
Screenshot (after)Screenshot of the viewport after the action completed
TimestampISO 8601 timestamp with millisecond precision
DurationTime in milliseconds between action start and completion
StatusFinal status of the action (completed, failed, cancelled, rolled_back)
ErrorFor failed actions: error message and context

Audit records are stored in EU jurisdiction and retained per GDPR data retention policies. Screenshots are stored alongside action metadata for complete session reconstruction.

Rollback capability

The system supports rollback to any previously completed step within a session. When a rollback is triggered:

  1. All actions after the target step are marked as rolled_back
  2. The browser navigates back to the state captured in the target step’s screenshot
  3. The session’s currentStep is reset to the target step
  4. The automation loop resumes from that point, proposing new actions based on the restored state

Rollback is non-destructive: rolled-back actions remain in the audit trail with their original timestamps and screenshots. The audit record shows that a rollback occurred, when it was triggered, and which step was the target.

Rollback limitations

Support

Technical: support@pauhu.eu

Sales: sales@pauhu.eu