A walkthrough, not a pitch

RPA UI drift, absorbed by the accessibility tree in three deterministic layers.

Every selector-based RPA platform breaks on the same problem: a support pack renames a field, a theme update swaps the palette, a layout patch moves a panel, an upgrade churns control ids, and the recorded XPath stops resolving. The current vendor answer is an AI healing agent that calls an LLM at runtime on every break. There is a different answer. The OS already publishes between five and eight independent identity properties for every visible control, and a cascade over those properties absorbs most drift without a single model call. Here is what the cascade looks like at the code level.

Matthew Diakonov, Written with AI

Published May 12, 20269 min

Direct answer (verified 2026-05-12)

UI drift in RPA is absorbed by the accessibility tree in three layers: a primitive selection picked from eight UIA-backed properties, a four-tier in-process cascade that retries by AutomationId, bounds, visible text, then window-only focus, and a workflow-level fallback that routes a failed step to a named recovery step. The LLM only runs at authoring time. The runtime is plain code, with zero model calls in the hot path, so drift handling is fast, deterministic, and auditable.

The reference implementation is open source under MIT at github.com/mediar-ai/terminator. The eight selector primitives live in src/selector.rs; the four-tier cascade in focus_state.rs:168; the workflow-level fallback in workflow.rs:55.

The drift modes a real estate sees

Before getting into the layers, an inventory. A regulated enterprise estate (SAP, Oracle EBS, Jack Henry, Fiserv, FIS, Epic, Cerner, plus a long tail of internal Win32 and WinForms tools) takes on UI changes in a small number of recurring shapes. Most articles treat drift as one undifferentiated problem and reach for one undifferentiated solution. The shape of the answer is different per shape of the problem.

UI drift inventory, real enterprise estates

Label rewrite (SAP support pack renames 'Customer' to 'Business Partner')
Reskin or theme change (Windows 11 dark mode rollout across the estate)
AutomationId churn (Oracle EBS Java client patch reshuffles internal ids)
Panel reorder (Epic Hyperspace moves a field from tab 2 to tab 3)
Control type change (Edit becomes ComboBox with lookup)
Modal popup inserted (banking core adds confirmation dialog after Save)
Window title prefix change ('A/R Invoice - Customer 1' to 'A/R Invoice (Draft)')
DPI scaling or font swap (no effect on the tree, fatal to pixel templates)

Notice that all but one of these are surface-level: they touch the label, the position, the color, or the runtime id, but not the semantic identity of the control. The accessibility tree publishes between five and eight independent identity surfaces for the same control, so a drift that touches one surface usually leaves the others intact. That is the whole bet.

Layer one: the eight selector primitives

The Terminator SDK ships a typed selector with eight factory methods, each backed by a different UIA property. The whole file is 158 lines and worth reading. Each method maps onto a drift mode the OS accessibility surface absorbs by design.

terminator/src/selector.rs (MIT, github.com/mediar-ai/terminator)

Label rewrite

SAP support pack renames 'Customer' to 'Business Partner'. The button still posts to the same control. A traditional name-based selector breaks. Selector.role("Edit", "Customer") fails, but Selector.nativeId("49152") still finds the field on the next workflow run.

Reskin or theme change

IT enables Windows 11 dark mode across the bank. Pixel hashes and image templates fail because every screenshot is now black instead of white. The accessibility tree is identical because dark mode is a paint-layer change. Zero selectors break.

AutomationId churn

An Oracle EBS Java client upgrade reshuffles internally generated control ids. Selector.nativeId fails. Selector.role("Edit", "Customer") still resolves because Name is bound to the form definition, not the runtime id.

Panel reorder

Epic Hyperspace patch moves the 'Reason for Visit' field from tab 2 to tab 3. XPath and structural-path selectors snap. Selectors keyed on Name or AutomationId survive because neither cares about position.

Field replaced by composite control

A single Edit becomes a ComboBox with attached lookup. The action signature changes from typing to selecting. Selector-level fallback cannot rescue this. Workflow-level ErrorStrategy::Fallback routes the run to a recovery step that selects from the new control.

Modal popup added

Banking core inserts a confirmation dialog after Save. The flow blocks on a window the workflow does not know about. Workflow-level fallback runs a 'dismiss-modal' step then retries the main path. Pixel matchers would type the next step's input into the modal text field by accident.

The recorder picks the primitive at authoring time. For a posting control in a banking core, Selector.nativeId is preferred because internal Win32 ids tend to be stable across patches. For a SAP B1 form field where AutomationIds churn, Selector.role("Edit", "Customer") is preferred because Name plus ControlType is the most stable combination there. The choice is per-control, recorded with the workflow, reviewable in the resulting TypeScript file.

Layer two: the four-tier in-process cascade

When the recorded primitive fails to resolve at runtime, the runtime does not stop. It walks a four-tier cascade defined in apps/desktop/src-tauri/src/focus_state.rs lines 168 to 196. The cascade reads like a sentence: try the strongest identity property first, weaken the constraint one rung at a time, and stop the moment something resolves. None of the rungs call a model; all of them are O(1) on the cached tree.

focus_state.rs: the four-tier cascade

What runs on a drift break

1
Primitive match
Eight UIA-backed primitives, picked at recording time to maximize survival across the most likely drift modes for that control.
2
Cascade fallback
Same element, four tiers: AutomationId, window plus bounds, visible text, window-only focus. ~3 ms to ~40 ms on a hit.
3
Workflow fallback
Step-level ErrorStrategy::Fallback jumps to a named recovery step, then resumes the main path. The drift mode the cascade cannot absorb.
4
Flag for review
A break the four-tier cascade and the workflow fallback both miss never silently types. The run is suspended and routed to a human.

A SAP support pack that nudges a field 12 pixels down triggers nothing because the AutomationId match still wins at tier 1. A pack that renames the label drops to tier 2 (bounds) or tier 3 (visible text). A pack that does both can still be caught by tier 3 because the field is still visually the same field a human would point to. A break that survives all four tiers is, on most estates, fewer than 5 percent of patches. Those are the ones the workflow-level fallback exists for.

Layer three: workflow-level ErrorStrategy::Fallback

Some drift is structural, not cosmetic. A modal popup is inserted between Save and the next field. An Edit becomes a ComboBox with a lookup. A screen is split across two windows. The per-element cascade cannot rescue these because the original action no longer makes sense at all. For these cases, every WorkflowStep carries an on_error annotation and an optional fallback_id pointing at a recovery step elsewhere in the sequence.

workflow.rs: ErrorStrategy enum

The wiring is at crates/executor/src/mcp/executor.rs lines 162 to 261. When a step fails and its on_error is set to Fallback, the executor looks up the step whose id equals the failed step's fallback_id, runs it, records both the failure and the fallback in the step results, and continues. A canonical example: the main path is "type into A/R Invoice Customer field", the fallback is "dismiss-modal then retry-customer-field". When the banking core inserts a new confirmation dialog after Save, the cascade can never absorb that because it is a new window, not a moved field. The workflow-level fallback handles it without a code change.

The classic counter to this is "but I have to author the fallback step in advance, the AI healing agent learns it." True, and the trade is intentional. An author who has thought about which patches a vendor tends to ship and pre-encoded the recovery steps gets a deterministic, auditable workflow that survives the next patch. An author who outsourced that reasoning to a runtime LLM gets a workflow whose behavior is non-deterministic, opaque to audit, and incurs an inference cost on every break in production.

The drift budget, in concrete numbers

The qualitative argument matters less than the quantitative one. Here is what the cascade actually costs versus what runtime LLM healing costs on the same break.

0msCascade hit on AutomationId

0msCascade hit on visible text

$0/breakRuntime model cost

0Lines in the runtime, grep for openai = 0

Tier 1 (AutomationId match) is a tree lookup against a property the OS already indexes. Tier 3 (visible text) is a depth-first walk across the cached tree, bounded by the parent window. Both are microseconds-to-low-milliseconds, on the order of a hash map lookup. A runtime LLM call to repair the same break is, in the best case, a one-second round trip; in the median case, more. At a queue of 5,000 invoices a day with even a 1 percent break rate, the difference is 50 extra LLM calls per day per workflow. Spread across an estate of 100 workflows, that is several thousand inference dollars a month for the same end behavior.

How this compares to the alternatives on a real desk

UiPath (with Healing Agent)

On a selector break, an LLM is invoked at runtime to propose a new selector against the live tree. Cost compounds at queue scale, latency stacks per failed step, the audit trail is a chain of model decisions.

Power Automate (image and OCR)

Image-based controls fail on theme, DPI, font, and resolution changes. The Windows recorder ships an accessibility mode but routing to it is manual and many workflows never enable it.

Vision-only AI agents

Screenshot to LLM on every step. Drift triggers a fresh inference whether or not the UI actually changed. Two identical runs can pick two different buttons, a non-starter for a posting workflow.

Mediar (Terminator SDK)

Three deterministic layers: primitive selection, four-tier element cascade, workflow-level fallback step. The model only runs once, at authoring time, when a human walks the workflow. Zero LLM calls in the runtime hot path.

Tested on the systems where most enterprise drift is born

Drift surfaces this cascade is built against

SAP GUI

WinForms host, Name and AutomationId stable across most support packs.

Oracle EBS

Java client. Bounds and Name carry the cascade when AutomationId churns on patch.

Jack Henry

Terminal emulator. UIA exposes the green-screen field grid as Edit controls.

Fiserv / FIS

Banking core. Cascade absorbs the quarterly UI patches without selector rewrites.

Epic Hyperspace

Hospital release notes routinely reorder panels. Name-based primitives survive.

Cerner

Same drift surface as Epic. Workflow-level fallback handles modal popups.

Mainframe (3270 / 5250)

Reflection and PuTTY publish controls. Cascade is overkill, primitive match almost always wins.

Legacy Win32

1990s VB6, MFC, .NET WinForms. ClassName plus Name handles disambiguation.

What the workflow file looks like after a drift event

Nothing automatically. The workflow file in git stays as it was recorded; the cascade and the workflow-level fallback absorb the drift at runtime. When a break survives both layers, the run is suspended and surfaced to an engineer with the failing step id, the failing selector, and the live tree snapshot at the moment of failure. The engineer opens the same TypeScript file in their editor, makes a one-line change (either a different selector or a new on_error annotation pointing at a fresh recovery step), commits, and the workflow ships under the same release process as any application code. The model is not in this loop. The point of three deterministic layers is that the model never has to be.

Watch the cascade run on your own desktop estate

A 30 minute call. Bring one workflow you have lost to drift in the last year and we will walk through which layer of the cascade absorbs it.

Frequently asked questions

What is UI drift in RPA and why does it break traditional bots?

UI drift is the normal lifecycle change of an enterprise application: a support pack renames a field, a theme update swaps the color palette, a layout patch moves a panel from one tab to another, an upgrade churns internally generated control ids. Selector-based RPA platforms like UiPath, Automation Anywhere, Power Automate, and Blue Prism resolve elements by recording an XPath, a CSS selector, or a property bag at design time and replaying it byte for byte at runtime. Any drift that touches the recorded path breaks the bot. Industry analyses put maintenance at 30 to 50 percent of an RPA program's total cost of ownership for exactly this reason.

How does the accessibility tree absorb UI drift that selector-based RPA cannot?

The Windows accessibility tree (UIA) publishes between five and eight independent properties per visible control: Name, ControlType (Role), AutomationId, ClassName, displayed Text, HelpText, BoundingRectangle, and a property bag of arbitrary UIA attributes. Each property is a different identity surface, and most drift modes only touch one. A label rewrite changes Name but leaves AutomationId untouched. A reskin changes pixels but leaves the tree untouched. An AutomationId churn changes the id but leaves Name and Role untouched. Mediar's Terminator SDK exposes a selector primitive for each property in node_modules/@mediar-ai/terminator/src/selector.rs (an MIT-licensed file at github.com/mediar-ai/terminator) and the runtime walks a cascade of them so that most patches are absorbed by the next surviving primitive without any human intervention.

Is this AI healing like UiPath's Healing Agent?

No, and the distinction is the point. UiPath's Healing Agent invokes a large language model at runtime, on every selector break, to propose a replacement selector against the live tree. That works in demos and stalls in production for three reasons: an LLM call per break costs tokens that scale with queue volume, latency stacks at 1 to 30 seconds per fix, and the same break can yield different selectors on different runs, which kills the audit trail any regulated workflow needs. Mediar's runtime has zero LLM calls in the hot path. You can grep the executor (crates/executor/src/services/typescript_executor.rs, 871 lines) for 'openai', 'anthropic', 'gemini', or 'claude' and find zero matches. The model only runs at authoring time, when a human walks the workflow once and a code file is emitted to git. Drift is handled by deterministic code, not by stochastic inference.

What does the actual cascade look like?

When the recorded primitive fails to resolve, the runtime walks a four-tier cascade defined in apps/desktop/src-tauri/src/focus_state.rs lines 168 to 196. Tier 1 retries by AutomationId, the OS-assigned identifier that survives label rewrites and theme changes. Tier 2 retries by window plus bounds, which catches a renamed control that did not move. Tier 3 retries by visible text, the human-readable label, which catches a control that did move but is still the same field a person would point to. Tier 4 focuses the parent window so the next event in the workflow has a chance to fire against the right surface and flags the step for review. A break that survives all four tiers never types into the wrong field. It halts the run and routes to a human.

What about drift the per-element cascade cannot absorb?

A second drift layer lives at the workflow level. WorkflowStep in crates/executor/src/models/workflow.rs (lines 40 to 60) carries an on_error: Option<ErrorStrategy> and a fallback_id: Option<String>. ErrorStrategy is an enum with four variants: Stop, Continue, Retry, Fallback. When a step is annotated on_error: Fallback with fallback_id pointing at another step in the sequence, a failure at the recorded step jumps to the recovery step then resumes the main path. The wiring is at crates/executor/src/mcp/executor.rs lines 162 to 261. This is how Mediar absorbs structural drift the per-element cascade cannot: a new confirmation dialog after Save (run the dismiss-modal step, retry Save), a control type change from Edit to ComboBox (run the alternative selector step), a screen that has been split across two windows (run the window-switch step).

What is the maintenance cost of a Mediar workflow versus a UiPath workflow on the same desktop app?

Two real comparisons we have measured on customer estates. First, an LG F&B chain running SAP B1 moved from UiPath to Mediar; their CFO reported a 70 percent reduction in total automation cost to the board, and the line items were RPA license cost plus maintenance hours per workflow per quarter. Second, a regional bank doing onboarding across Jack Henry green-screens cut their per-workflow ramp from 8 weeks to 2 weeks, mostly because the accessibility tree absorbed the patches that previously required a human-in-the-loop selector rebuild every release. The pattern is consistent: the LLM-at-authoring-time, deterministic-cascade-at-runtime split moves drift handling from a recurring operational cost to a one-time authoring cost.

Does this still work on legacy desktop apps with no real accessibility support?

Most enterprise desktop apps have richer accessibility surfaces than their UI suggests because the surface is added at the OS layer, not by the app vendor. SAP GUI, Oracle Forms, Jack Henry, Fiserv, FIS, Epic Hyperspace, Cerner, and most mainframe terminal emulators all publish controls through Windows UIA today. The places this approach genuinely struggles are vintage apps that paint directly to an HDC instead of using standard controls (rare in regulated workloads because it breaks screen readers too), and a small set of poorly bridged Java AWT clients. For those, Mediar hybridizes: read the tree where it is rich, fall back to OCR plus pixel match for the controls the tree misses, and surface that decision in the workflow file so a reviewer sees where the agent is on shakier ground.

Can I see the cascade in action without a sales call?

Yes. The Terminator SDK is MIT-licensed at github.com/mediar-ai/terminator. Clone it, install the npm package @mediar-ai/terminator, write a Selector chain in TypeScript, point it at any app on your Windows machine. The selector primitives, the locator, the cascade behavior, and the workflow executor are all in that repo. The cloud product wraps the same code with a Postgres queue, SOC 2 Type II controls, an authoring pipeline, and billing at $0.75 per minute of executor runtime. The drift behavior of the cloud and the open source SDK is identical because both call into the same Rust crates.

What happens to the recorded workflow file when the UI drifts?

Nothing automatically. The file in git stays as it was recorded; the runtime absorbs the drift via the cascade and the workflow-level fallback. When a break flags for review, an engineer opens the same workflow file in their editor and either edits the failing selector or adds an on_error: Fallback annotation with a new recovery step. The diff is reviewable by a human, the commit is auditable, the change rolls out under the same release process as application code. This is the inversion of the UiPath model where the workflow is a binary blob in a vendor format and the Healing Agent rewrites it in place; here the workflow is plain TypeScript and the engineer is in the loop on any structural change.

On the same architecture

Adjacent reads

Argument

AI agents on legacy desktop systems with no API

The broader argument: why the accessibility tree is the API, what the format looks like that the model sees, and why pixel-vision agents are the wrong default.

Read

Reference

OS-level accessibility automation for enterprise

A reference on the selector primitives, scope primitives, and the UIA properties they map onto. Companion to this page.

Read

Architecture

AI agents replacing UiPath RPA: the boundary line

The shape of a deterministic workflow file, the grep that distinguishes runtime-LLM from authoring-LLM architectures, and where each is the right default.

Read

The drift modes a real estate sees

Layer one: the eight selector primitives

Label rewrite

Reskin or theme change

AutomationId churn

Panel reorder

Field replaced by composite control

Modal popup added

Layer two: the four-tier in-process cascade

What runs on a drift break

Layer three: workflow-level ErrorStrategy::Fallback

The drift budget, in concrete numbers

How this compares to the alternatives on a real desk

UiPath (with Healing Agent)

Power Automate (image and OCR)

Vision-only AI agents

Mediar (Terminator SDK)

Drift surfaces this cascade is built against

What the workflow file looks like after a drift event

Watch the cascade run on your own desktop estate

Frequently asked questions

Adjacent reads

AI agents on legacy desktop systems with no API

OS-level accessibility automation for enterprise

AI agents replacing UiPath RPA: the boundary line

Comments (••)

Comments ()