Engineering

Under the hood

Technical deep-dives on the infrastructure problems we solve to ship browser agents that survive production.

01 / 02

Knowledge graphs: teaching agents to navigate, extract, and interact

For a browser agent to reliably operate on the web, it needs to be capable of three primitives: Navigation, Extraction, and Interaction. Every agent task—booking a flight, filing an insurance claim, completing a purchase—is a composition of these three operations. Get any one of them wrong, and the task fails.

The standard approach is to drop an LLM into a browser, hand it the raw DOM or a screenshot, and hope it figures things out. This works on demos. It does not work in production. Raw HTML pages can exceed a million tokens. CSS selectors break on every redesign. The agent has no map of where it is, what it can do, or where it needs to go. It is navigating blind.

Knowledge graphs solve this. They convert a website from an opaque rendering surface into a structured, traversable map. Pages become nodes. Transitions become edges. The agent stops guessing and starts following a graph.

The three primitives

Every browser task decomposes into a sequence of three operations. Agents that treat the browser as a single undifferentiated action space fail because they conflate fundamentally different problems. Each primitive has its own failure modes and requires its own solution.

Primitive What the agent does Failure without a graph
Navigation Move between pages and application states Gets lost in multi-step flows, loops back to visited pages, can't find the right portal
Extraction Pull structured data from the current page Hallucinates field values, misses data behind dynamic rendering, breaks on every redesign
Interaction Fill forms, click buttons, manipulate UI controls Clicks the wrong element, misses required fields, doesn't handle complex UI controls correctly

Primitive 1: Navigation

Navigation is the most underestimated primitive. The core problem is topological blindness—an agent that only sees the current page has no knowledge of the site's complete state space. It doesn't know how many steps remain, whether it's on the right path, or how to recover when something loads unexpectedly.

Consider a multi-step insurance quoting flow: broker portal → carrier selection → risk details → coverage options → quote summary → bind. Without a map, the agent is forced into trial-and-error exploration, burning tokens and time on dead ends.

A knowledge graph encodes the full site topology upfront. Each page state is a node. Each transition—clicking a link, submitting a form, opening a modal—is a directed edge with a specific action and a predicted destination. Navigation becomes deterministic pathfinding rather than probabilistic exploration.

The result: the agent traverses the graph instead of reasoning about where to go next. The LLM is invoked once to understand the task, not at every step to decide where to click.

Primitive 2: Extraction

Extraction is the reason most agents exist—pulling structured data out of unstructured web pages. The standard DOM-based approach is fragile because the DOM is a rendering interface, not a data interface.

By the time data reaches the DOM, it has been fragmented across elements, hidden behind dynamic rendering, and decorated with framework-specific artifacts that change between versions. Selectors that work today break tomorrow.

Knowledge graphs solve extraction by attaching data schemas directly to graph nodes. Each region of the page knows what fields it contains and their types. The agent knows what to extract before it touches the page, and can target specific regions rather than sending the entire DOM to an LLM.

This schema-first approach means extraction is precise and efficient. Instead of asking an LLM “what data is on this page?” at massive token cost, the agent runs targeted extraction against known regions with known schemas.

Primitive 3: Interaction

Interaction is the hardest primitive to get right. Clicking a button is simple. Filling out a multi-step form with date pickers, dropdowns, sliders, autocomplete fields, and conditional inputs is not. Every UI control has its own interaction model, and getting it wrong silently produces incorrect results.

Consider booking a flight. The agent needs to type a city into an autocomplete field and wait for suggestions. It needs to open a date picker and navigate to the right month. It needs to increment a passenger counter. Each step has timing dependencies and ordering constraints that a generic “click this element” approach can't handle.

Knowledge graphs encode the functional type of every interactive element—whether it's a text input, date picker, dropdown, slider, or submit button—along with the relationships between them. The graph knows that field B depends on field A, that a dropdown must be opened before an option can be selected, and that a form must be filled before it can be submitted.

With this information, interaction becomes deterministic. The LLM parses the user's task into structured parameters. The executor maps those parameters to graph nodes and runs them in the correct order. No per-step reasoning, no DOM interpretation, no hallucination risk.

Graph construction

StableBrowse builds knowledge graphs through automated site discovery. We visit the target site, capture its semantic structure, and map it into a typed graph of pages, regions, elements, and data nodes connected by verified transitions.

The graph is built once per site and reused for every subsequent task. Structural fingerprinting detects when a site has changed, triggering automatic re-discovery and graph updates. Nodes that no longer exist are marked stale. Nodes that still work retain their history.

Self-healing

Websites change constantly. A static graph would break just as fast as static selectors. StableBrowse's graphs adapt through empirical reliability tracking—every node carries a reliability score updated on each interaction. Nodes that consistently work rise in confidence. Nodes that start failing are deprioritized and eventually excluded from action plans.

When the graph detects that a site's structure has drifted, it re-discovers the changed portions and merges them with the existing graph. The agent always has an up-to-date map, and the transition from old structure to new structure is seamless.

Why this matters

The fundamental insight is that websites are not random. They have structure, and that structure is far more stable than the surface-level HTML. A knowledge graph captures that structure and lets agents operate on it directly.

The three primitives—navigation, extraction, interaction—are the minimal set of capabilities an agent needs to operate on any website. Each one maps to a different part of the graph: navigation uses edges to traverse between page states, extraction uses schemas attached to data nodes, and interaction uses functional types and dependency edges to execute actions deterministically.

Agents that treat the web as an undifferentiated stream of HTML will always be slow, expensive, and fragile. Agents that see the web as a graph can navigate it like a map.

i

Build once, run forever. A knowledge graph is built once per site. Every subsequent task runs at dramatically lower cost and higher reliability. The graph pays for itself on the second task.

02 / 02

Building Site Familiarity: The Discovery Pipeline Behind Knowledge Graphs

The web is not just a collection of pages. It is a collection of task paths. A page is only one snapshot. What matters for automation is what the system can do from that page and where those actions lead.

KG

Discovery is not crawling. StableBrowse discovery builds a reusable site model: route grammar, interaction context, and state recognition that make future runs less uncertain.

This post explains how bounded exploration becomes a reusable site model: states become waypoints, transitions become executable routes, repeated layouts become templates, and controls become interaction affordances.

Build agents on a reusable site graph.

Show us the workflow. We'll show you the execution map.