Browser Automation Architecture

Screenshot, reason, act, verify. Every browser action passes through safety gates with human-in-the-loop confirmation before form submission.

Overview

The Pauhu Browser Automation system automates browser-based tasks on EU institutional portals - such as TED eProcurement, EUR-Lex, and national law databases - while maintaining strict safety controls. Unlike unguarded browser automation, every action passes through safety gates before execution. Destructive actions (form submissions, data entry) require explicit human approval.

The system operates in a continuous loop: it observes the current browser state via screenshot, reasons about the next action using an AI model, executes the action (subject to safety gates), and verifies the result. This loop continues until the task is complete or the user intervenes.

The automation loop

┌──────────────────────────────────────────────────┐ │ AUTOMATION LOOP │ │ │ │ ┌────────────┐ ┌────────────┐ │ │ │ SCREENSHOT │────▶│ REASON │ │ │ │ │ │ │ │ │ │ Capture │ │ AI model │ │ │ │ viewport │ │ decides │ │ │ │ state │ │ next step │ │ │ └────────────┘ └─────┬──────┘ │ │ ▲ │ │ │ │ ▼ │ │ ┌─────┴──────┐ ┌─────────────┐ │ │ │ VERIFY │ │ SAFETY │ │ │ │ │◀───│ GATE │ │ │ │ Screenshot │ │ │ │ │ │ confirms │ │ permission │ │ │ │ result │ │ check │ │ │ └────────────┘ └─────┬───────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ │ ACT │ │ │ │ │ │ │ │ click/type/ │ │ │ │ scroll/nav/ │ │ │ │ select/ │ │ │ │ submit/wait │ │ │ └─────────────┘ │ └──────────────────────────────────────────────────┘

Step-by-step

Screenshot. The system captures a screenshot of the current browser viewport. It has no DOM access, only pixel-level observation.
Reason. An AI model analyses the screenshot together with the task description and action history. It determines the next action: what to click, what to type, where to scroll, or whether the task is complete.
Safety gate. Before execution, the proposed action is classified against four safety levels (see below). Prohibited actions are blocked. Required confirmations pause for human approval.
Act. The action is executed in the browser: a click at specific coordinates, text entry into a field, scrolling, navigation, or form submission.
Verify. A new screenshot is taken. The system compares the result against expected state. If the action failed or produced unexpected results, the loop can retry or escalate to the user.

Safety gates

Every action passes through safety gates before execution. The four levels control what the system may, must, must not, and need not do.

Safety levels applied to browser automation actions
Level	Gate behaviour	Actions
Prohibition	Blocks the action entirely. The system cannot proceed.	Submitting forms without human approval. Navigating to domains not on the allowlist. Entering credentials. Clicking “delete” or “remove” buttons.
Obligation	Pauses execution. Requires explicit human confirmation before the action proceeds.	Form submission (human reviews all filled fields). Data entry into official portals. Any action that creates or modifies a record.
Permission	Action may proceed while the user session is active. No confirmation needed, but the action is logged.	Clicking navigation links. Selecting dropdown values. Typing search queries. Scrolling.
Exemption	Read-only actions that require no gate check.	Taking screenshots. Observing page state. Reading text. Waiting for page load.

Human-in-the-loop flow

When an action triggers an Obligation gate (typically form submission), the system pauses and presents a confirmation modal. The user sees:

All fields the system has filled, with their values
The target form and portal
Three options: Approve (proceed with submission), Edit (modify field values before submission), or Cancel (abort the action)

The system does not resume until the user makes an explicit choice. This ensures no data is submitted to external portals without human review.

Action types

The system supports 7 browser action types. Each maps to a specific safety gate level.

Action types and their default safety classification
Action	Description	Default gate
`click`	Click at specific (x, y) coordinates in the viewport	Permission
`type`	Enter text into the currently focused field	Permission
`scroll`	Scroll the page in a given direction and distance	Permission
`navigate`	Navigate to a URL (must be on the domain allowlist)	Permission
`select`	Select an option from a dropdown or list	Permission
`submit`	Submit a form - always requires human confirmation	Obligation
`wait`	Wait for page load or element to appear	Exemption

Gate classification can be elevated by context. For example, a click on a “Submit” button is reclassified from Permission to Obligation. A navigate to a domain not on the allowlist is reclassified from Permission to Prohibition (blocked).

Action lifecycle

Each action progresses through a defined state machine with 7 possible statuses.

pending ──▶ confirmed ──▶ executing ──▶ completed │ │ │ │ │ ├──▶ failed ▼ ▼ │ cancelled cancelled ▼ rolled_back

Action statuses
Status	Meaning
`pending`	Action proposed by the AI model, awaiting gate check
`confirmed`	Gate check passed (or human approved for Obligation actions)
`executing`	Action is being performed in the browser
`completed`	Action finished successfully, verified by screenshot
`failed`	Action execution failed (element not found, timeout, etc.)
`cancelled`	User or system cancelled the action before execution
`rolled_back`	Action was reverted by rollback to a previous step

Architecture components

ActionOverlay

Visual action indicator Shows what the system is about to do: a highlight overlay positioned at the target coordinates (x, y) with confirm and cancel buttons. For click actions, a crosshair marks the exact pixel. For type actions, the overlay shows the text to be entered. The user can approve or reject any action directly from the overlay.

SessionPanel

Session sidebar Displays the full action history for the current session: each step with its type, target, status, and timestamp. Includes a progress bar showing completion toward the task goal. Supports rollback - clicking any previous completed step offers to revert the session to that point.

ConfirmationModal

Human-in-the-loop gate (Obligation) Triggered before form submission. Displays all fields the system has filled, their values, and the target form. Three buttons: Approve (submit as-is), Edit (modify values before submission), Cancel (abort). The system is fully paused until the user responds. No timeout - the modal stays open until explicit human action.

ReplayViewer

Session replay Step-through viewer for completed sessions. Each step shows the screenshot captured at that point, the action taken, coordinates, and result. Navigate forward and backward through the entire session. Useful for auditing and debugging.

Dashboard

Portal statistics Aggregated view of automation activity: active sessions, total action count, success rate per portal (e.g., TED eProcurement, EUR-Lex). Tracks how many actions required human confirmation and how many were auto-approved.

SSE event stream

The automation executor streams real-time events to the frontend via Server-Sent Events (SSE). The connection reconnects automatically with exponential backoff (maximum 5 attempts).

Event types

SSE event types
Event	Payload	When
`action_pending`	Action type, target coordinates, description	AI model proposes a new action
`action_executing`	Action ID, type	Action passes gate check and begins execution
`action_completed`	Action ID, result, screenshot URL	Action finishes successfully
`action_failed`	Action ID, error message, screenshot URL	Action execution fails
`session_update`	Session status, current step, progress	Session state changes (pause, resume, complete)
`form_confirmation`	Form fields, values, target URL	Obligation gate triggers - awaiting human approval
`screenshot`	Screenshot data (base64 or URL)	New viewport capture available

Connection example

const source = new EventSource('/v1/cua/stream?session=' + sessionId);

source.addEventListener('action_pending', (e) => {
  const action = JSON.parse(e.data);
  // Show ActionOverlay at action.coordinates
});

source.addEventListener('form_confirmation', (e) => {
  const form = JSON.parse(e.data);
  // Show ConfirmationModal with form.fields
});

source.addEventListener('action_completed', (e) => {
  const result = JSON.parse(e.data);
  // Update SessionPanel with completed step
});

Session model

Each automation session targets a specific portal and task. Sessions maintain full state across the action lifecycle.

Session properties

Session object fields
Field	Type	Description
`id`	string	Unique session identifier (UUID v4)
`status`	string	Session status: `active`, `paused`, `completed`, `failed`, `cancelled`
`portal`	string	Target portal (e.g., “TED eProcurement”, “EUR-Lex”)
`taskDescription`	string	Natural-language description of the task to perform
`actions`	array	Ordered list of all actions in the session
`currentStep`	integer	Index of the current action (0-based)
`createdAt`	ISO 8601	Session creation timestamp
`updatedAt`	ISO 8601	Last state change timestamp

Session example

{
  "id": "cua_sess_a1b2c3d4",
  "status": "active",
  "portal": "TED eProcurement",
  "taskDescription": "Search for procurement notices matching CPV 72000000 in Finland",
  "actions": [
    {
      "step": 0,
      "type": "navigate",
      "target": "https://ted.europa.eu/en/search",
      "status": "completed",
      "timestamp": "2026-03-10T14:00:01Z",
      "screenshot": "screenshots/step-0.png"
    },
    {
      "step": 1,
      "type": "type",
      "target": "input#search-query",
      "value": "CPV 72000000",
      "coordinates": { "x": 540, "y": 312 },
      "status": "completed",
      "timestamp": "2026-03-10T14:00:03Z",
      "screenshot": "screenshots/step-1.png"
    },
    {
      "step": 2,
      "type": "click",
      "target": "Country filter: Finland",
      "coordinates": { "x": 180, "y": 480 },
      "status": "pending",
      "timestamp": "2026-03-10T14:00:05Z"
    }
  ],
  "currentStep": 2,
  "createdAt": "2026-03-10T14:00:00Z",
  "updatedAt": "2026-03-10T14:00:05Z"
}

Pause and resume

Sessions can be paused at any time. When paused, the system stops proposing new actions but retains full state. Resuming continues from the current step. This is useful when the user needs to inspect intermediate results or perform manual steps.

Audit trail

Every automation action is logged with full provenance. The audit trail is immutable - entries cannot be modified or deleted after creation.

Audit record fields

Fields captured for every automation action
Field	Description
Session ID	Links the action to its parent session
Step number	Sequential position in the session (0-based)
Action type	One of the 7 action types (`click`, `type`, `scroll`, `navigate`, `select`, `submit`, `wait`)
Target	Human-readable description of the action target (e.g., “Search button”, “CPV code input”)
Coordinates	Pixel coordinates (x, y) where the action was performed
Value	For `type` and `select` actions: the text entered or option selected
Safety gate	Which safety level was applied (Prohibition, Obligation, Permission, or Exemption) and whether it passed
Human approval	For Obligation actions: whether the user approved, edited, or cancelled
Screenshot (before)	Screenshot of the viewport before the action was executed
Screenshot (after)	Screenshot of the viewport after the action completed
Timestamp	ISO 8601 timestamp with millisecond precision
Duration	Time in milliseconds between action start and completion
Status	Final status of the action (completed, failed, cancelled, rolled_back)
Error	For failed actions: error message and context

Audit records are stored in EU jurisdiction and retained per GDPR data retention policies. Screenshots are stored alongside action metadata for complete session reconstruction.

Rollback capability

The system supports rollback to any previously completed step within a session. When a rollback is triggered:

All actions after the target step are marked as rolled_back
The browser navigates back to the state captured in the target step’s screenshot
The session’s currentStep is reset to the target step
The automation loop resumes from that point, proposing new actions based on the restored state

Rollback is non-destructive: rolled-back actions remain in the audit trail with their original timestamps and screenshots. The audit record shows that a rollback occurred, when it was triggered, and which step was the target.

Rollback limitations

Submitted forms cannot be un-submitted. If a form was submitted (approved) and the user rolls back past that point, the system will flag that the portal state may be inconsistent.
External state changes persist. Rollback restores the system’s internal session state but cannot undo changes already made on external portals.

Support

Technical: support@pauhu.eu

Sales: sales@pauhu.eu