Browser Automation Architecture
Screenshot, reason, act, verify. Every browser action passes through safety gates with human-in-the-loop confirmation before form submission.
Overview
The Pauhu Browser Automation system automates browser-based tasks on EU institutional portals - such as TED eProcurement, EUR-Lex, and national law databases - while maintaining strict safety controls. Unlike unguarded browser automation, every action passes through safety gates before execution. Destructive actions (form submissions, data entry) require explicit human approval.
The system operates in a continuous loop: it observes the current browser state via screenshot, reasons about the next action using an AI model, executes the action (subject to safety gates), and verifies the result. This loop continues until the task is complete or the user intervenes.
The automation loop
Step-by-step
- Screenshot. The system captures a screenshot of the current browser viewport. It has no DOM access, only pixel-level observation.
- Reason. An AI model analyses the screenshot together with the task description and action history. It determines the next action: what to click, what to type, where to scroll, or whether the task is complete.
- Safety gate. Before execution, the proposed action is classified against four safety levels (see below). Prohibited actions are blocked. Required confirmations pause for human approval.
- Act. The action is executed in the browser: a click at specific coordinates, text entry into a field, scrolling, navigation, or form submission.
- Verify. A new screenshot is taken. The system compares the result against expected state. If the action failed or produced unexpected results, the loop can retry or escalate to the user.
Safety gates
Every action passes through safety gates before execution. The four levels control what the system may, must, must not, and need not do.
| Level | Gate behaviour | Actions |
|---|---|---|
| Prohibition | Blocks the action entirely. The system cannot proceed. | Submitting forms without human approval. Navigating to domains not on the allowlist. Entering credentials. Clicking “delete” or “remove” buttons. |
| Obligation | Pauses execution. Requires explicit human confirmation before the action proceeds. | Form submission (human reviews all filled fields). Data entry into official portals. Any action that creates or modifies a record. |
| Permission | Action may proceed while the user session is active. No confirmation needed, but the action is logged. | Clicking navigation links. Selecting dropdown values. Typing search queries. Scrolling. |
| Exemption | Read-only actions that require no gate check. | Taking screenshots. Observing page state. Reading text. Waiting for page load. |
Human-in-the-loop flow
When an action triggers an Obligation gate (typically form submission), the system pauses and presents a confirmation modal. The user sees:
- All fields the system has filled, with their values
- The target form and portal
- Three options: Approve (proceed with submission), Edit (modify field values before submission), or Cancel (abort the action)
The system does not resume until the user makes an explicit choice. This ensures no data is submitted to external portals without human review.
Action types
The system supports 7 browser action types. Each maps to a specific safety gate level.
| Action | Description | Default gate |
|---|---|---|
click | Click at specific (x, y) coordinates in the viewport | Permission |
type | Enter text into the currently focused field | Permission |
scroll | Scroll the page in a given direction and distance | Permission |
navigate | Navigate to a URL (must be on the domain allowlist) | Permission |
select | Select an option from a dropdown or list | Permission |
submit | Submit a form - always requires human confirmation | Obligation |
wait | Wait for page load or element to appear | Exemption |
Gate classification can be elevated by context. For example, a click on a “Submit” button is reclassified from Permission to Obligation. A navigate to a domain not on the allowlist is reclassified from Permission to Prohibition (blocked).
Action lifecycle
Each action progresses through a defined state machine with 7 possible statuses.
| Status | Meaning |
|---|---|
pending | Action proposed by the AI model, awaiting gate check |
confirmed | Gate check passed (or human approved for Obligation actions) |
executing | Action is being performed in the browser |
completed | Action finished successfully, verified by screenshot |
failed | Action execution failed (element not found, timeout, etc.) |
cancelled | User or system cancelled the action before execution |
rolled_back | Action was reverted by rollback to a previous step |
Architecture components
ActionOverlay
SessionPanel
ConfirmationModal
ReplayViewer
Dashboard
SSE event stream
The automation executor streams real-time events to the frontend via Server-Sent Events (SSE). The connection reconnects automatically with exponential backoff (maximum 5 attempts).
Event types
| Event | Payload | When |
|---|---|---|
action_pending | Action type, target coordinates, description | AI model proposes a new action |
action_executing | Action ID, type | Action passes gate check and begins execution |
action_completed | Action ID, result, screenshot URL | Action finishes successfully |
action_failed | Action ID, error message, screenshot URL | Action execution fails |
session_update | Session status, current step, progress | Session state changes (pause, resume, complete) |
form_confirmation | Form fields, values, target URL | Obligation gate triggers - awaiting human approval |
screenshot | Screenshot data (base64 or URL) | New viewport capture available |
Connection example
const source = new EventSource('/v1/cua/stream?session=' + sessionId);
source.addEventListener('action_pending', (e) => {
const action = JSON.parse(e.data);
// Show ActionOverlay at action.coordinates
});
source.addEventListener('form_confirmation', (e) => {
const form = JSON.parse(e.data);
// Show ConfirmationModal with form.fields
});
source.addEventListener('action_completed', (e) => {
const result = JSON.parse(e.data);
// Update SessionPanel with completed step
});
Session model
Each automation session targets a specific portal and task. Sessions maintain full state across the action lifecycle.
Session properties
| Field | Type | Description |
|---|---|---|
id | string | Unique session identifier (UUID v4) |
status | string | Session status: active, paused, completed, failed, cancelled |
portal | string | Target portal (e.g., “TED eProcurement”, “EUR-Lex”) |
taskDescription | string | Natural-language description of the task to perform |
actions | array | Ordered list of all actions in the session |
currentStep | integer | Index of the current action (0-based) |
createdAt | ISO 8601 | Session creation timestamp |
updatedAt | ISO 8601 | Last state change timestamp |
Session example
{
"id": "cua_sess_a1b2c3d4",
"status": "active",
"portal": "TED eProcurement",
"taskDescription": "Search for procurement notices matching CPV 72000000 in Finland",
"actions": [
{
"step": 0,
"type": "navigate",
"target": "https://ted.europa.eu/en/search",
"status": "completed",
"timestamp": "2026-03-10T14:00:01Z",
"screenshot": "screenshots/step-0.png"
},
{
"step": 1,
"type": "type",
"target": "input#search-query",
"value": "CPV 72000000",
"coordinates": { "x": 540, "y": 312 },
"status": "completed",
"timestamp": "2026-03-10T14:00:03Z",
"screenshot": "screenshots/step-1.png"
},
{
"step": 2,
"type": "click",
"target": "Country filter: Finland",
"coordinates": { "x": 180, "y": 480 },
"status": "pending",
"timestamp": "2026-03-10T14:00:05Z"
}
],
"currentStep": 2,
"createdAt": "2026-03-10T14:00:00Z",
"updatedAt": "2026-03-10T14:00:05Z"
}
Pause and resume
Sessions can be paused at any time. When paused, the system stops proposing new actions but retains full state. Resuming continues from the current step. This is useful when the user needs to inspect intermediate results or perform manual steps.
Audit trail
Every automation action is logged with full provenance. The audit trail is immutable - entries cannot be modified or deleted after creation.
Audit record fields
| Field | Description |
|---|---|
| Session ID | Links the action to its parent session |
| Step number | Sequential position in the session (0-based) |
| Action type | One of the 7 action types (click, type, scroll, navigate, select, submit, wait) |
| Target | Human-readable description of the action target (e.g., “Search button”, “CPV code input”) |
| Coordinates | Pixel coordinates (x, y) where the action was performed |
| Value | For type and select actions: the text entered or option selected |
| Safety gate | Which safety level was applied (Prohibition, Obligation, Permission, or Exemption) and whether it passed |
| Human approval | For Obligation actions: whether the user approved, edited, or cancelled |
| Screenshot (before) | Screenshot of the viewport before the action was executed |
| Screenshot (after) | Screenshot of the viewport after the action completed |
| Timestamp | ISO 8601 timestamp with millisecond precision |
| Duration | Time in milliseconds between action start and completion |
| Status | Final status of the action (completed, failed, cancelled, rolled_back) |
| Error | For failed actions: error message and context |
Audit records are stored in EU jurisdiction and retained per GDPR data retention policies. Screenshots are stored alongside action metadata for complete session reconstruction.
Rollback capability
The system supports rollback to any previously completed step within a session. When a rollback is triggered:
- All actions after the target step are marked as
rolled_back - The browser navigates back to the state captured in the target step’s screenshot
- The session’s
currentStepis reset to the target step - The automation loop resumes from that point, proposing new actions based on the restored state
Rollback is non-destructive: rolled-back actions remain in the audit trail with their original timestamps and screenshots. The audit record shows that a rollback occurred, when it was triggered, and which step was the target.
Rollback limitations
- Submitted forms cannot be un-submitted. If a form was submitted (approved) and the user rolls back past that point, the system will flag that the portal state may be inconsistent.
- External state changes persist. Rollback restores the system’s internal session state but cannot undo changes already made on external portals.
Support
Technical: support@pauhu.eu
Sales: sales@pauhu.eu