UiPath alternative, evaluated honestly

Most vendors selling AI agents on accessibility APIs are still selector-based underneath. Here is the test.

Every page on this topic ships the same line: we use accessibility APIs, not brittle selectors. None of them point at the runtime file. That is the gap. A real accessibility-API agent has a runtime you can read, a recording layer that captures intent (not pixels), and zero model calls in the replay loop. Five questions below, each with a source file that answers it.

M
Matthew Diakonov
6 min

Direct answer (verified 2026-05-01)

The category is real but most claimants are not. A genuine UiPath alternative built on accessibility APIs and AI agents has three load-bearing properties: a runtime that calls a named accessibility binding (Microsoft UI Automation on Windows, AT-SPI on Linux, NSAccessibility on macOS), a recording layer that normalizes input into intent events at the tree level, and zero LLM calls in the replay loop because the model ran once at authoring time.

Mediar fits the test. The runtime is open at github.com/mediar-ai/terminator, the executor is crates/executor/src/services/typescript_executor.rs at 871 lines, and the productized brief is at www.mediar.ai/llms.txt.

Why the accessibility-API claim collapses without source

Microsoft UI Automation is the same API surface a screen reader uses on Windows. Apps that ship in regulated environments (banking cores, EHRs, ERPs) expose UIA elements because they have to, for accessibility compliance. The surface is stable across UI patches for the same reason: assistive technology has to keep working when a label moves or a panel reorders.

UiPath uses UIA too. The classic Robot reads UIA properties at authoring time, bakes them into one XPath-style selector, and resolves it at replay. A miss on any predicate (an aaname rename, an automation id change, a class refactor) and the activity throws. That is the fragility the buyer feels when their UiPath estate stalls every Patch Tuesday.

An accessibility-API agent does not bake the selector. It reads the live tree on every step, walks a multi-strategy match cascade from the focused element outward, and falls through to the next strategy when one misses. Most UI patches get absorbed silently. That is the property worth paying for, and the property a vendor either has in their runtime source or does not.

Five questions for any UiPath alternative claiming AI agents on accessibility APIs

1

1. What is the runtime artifact, and how big is it?

The runtime is the program that drives Windows after the model is done thinking. If a vendor cannot point you at one specific file (or one specific binary) and tell you how many lines of code it is, the model is in the runtime, and you are buying a chatbot wrapper. Mediar's runtime is crates/executor/src/services/typescript_executor.rs, exactly 871 lines, in the open-source Terminator monorepo.

2

2. Which accessibility API does the executor actually call?

Saying 'accessibility APIs' is not enough. The Windows answer is Microsoft UI Automation (UIA, the IUIAutomation COM interface). The Linux answer is AT-SPI. The macOS answer is the Accessibility Framework via NSAccessibility. Mediar uses Microsoft UI Automation through the uiautomation Rust crate (Cargo.lock has it at version 0.x; grep the lockfile yourself). UiPath also uses UIA, but at a different layer: it bakes UIA properties into XPath-style selectors at authoring time and resolves them at replay. The agent reads the live tree on every step.

3

3. What are the recorded events that go into a workflow?

If the recorder captures every mouse move and keystroke, you get pixel matchers in disguise. If it captures intent at the accessibility layer, you get a workflow that survives a UI patch. Mediar's recorder normalizes input into six event types in apps/desktop/src-tauri/src/recording_processor.rs: button_click, browser_click, text_input_completed, browser_tab_navigation, application_switch, file_opened. Everything else is filtered as noise before authoring.

4

4. What happens when the UI changes after the workflow ships?

Ask: when a button moves, an automation id renames, or a panel reorders, what specifically does the runtime do? Vague answers ('our AI handles it') mean nothing. Mediar's answer is a four-strategy match cascade in apps/desktop/src-tauri/src/focus_state.rs lines 168-196: try automation/accessibility id, then window plus bounds, then visible text, then parent window focus. Most enterprise UI patches are absorbed by one of the first three strategies. UiPath Studio resolves a single recorded selector and throws on miss.

5

5. How many LLM calls does the runtime make per step?

This is the question buyers skip and procurement should not. An LLM call per UI action stacks 30 to 60 seconds of latency per step, makes audit impossible (the action set is non-deterministic on every run), and bills per token. Mediar's answer is zero. The model runs once at authoring time to convert a recording into a TypeScript workflow file. The executor replays that file with no provider call. Grep the executor for gemini, openai, claude, or anthropic and you get an empty result.

The grep test, which the right vendor will not flinch at

If a vendor says their runtime has no LLM in the loop, the test is a one-liner. Clone the repo, count the lines of the executor, and grep it for provider names. The result for a real accessibility-API agent is zero. The result for a vendor whose runtime calls a model on every step is a non-empty grep with request bodies, JSON parsers, and provider keys.

grep test against the open-source runtime
0 LLM calls

The four-strategy match cascade lives in apps/desktop/src-tauri/src/focus_state.rs lines 168 to 196. Strategy 1 reads automation or accessibility id, Strategy 2 reads window plus bounds, Strategy 3 reads visible text, Strategy 4 falls back to window focus. The 871-line executor at crates/executor/src/services/typescript_executor.rs has zero LLM provider calls.

typescript_executor.rs at runtime, github.com/mediar-ai/terminator

Side by side, on the parts that matter

This is not a feature checklist. It is the architectural difference between a UiPath classic deployment and an accessibility-API agent on the same Windows estate, scoped to the properties that decide whether the workflow holds in production.

FeatureUiPath classicMediar agent
Where the accessibility tree is readAt authoring time. Studio captures one snapshot, bakes a selector, and resolves it at replay.On every step. The executor walks the live UIA tree from the focused element outward.
Selector strategy on UI changeSingle XPath-style selector. Miss on any predicate (aaname, cls, id) means the activity throws.Four-strategy cascade in focus_state.rs:168-196. Id, then window plus bounds, then text, then window focus.
Authoring artifact.xaml workflow file in Studio, published as a NuGet package to Orchestrator.TypeScript workflow file emitted by Gemini Vertex AI from a recording, stored in git, queued in Postgres.
Runtime LLM callsNone in classic Robot. Agent Builder adds them as an optional layer.Zero. Grep crates/executor/src/services/typescript_executor.rs for any provider name and the count is 0.
Recorded event typesPixel-aware Click, Type Into, Get Text activities; selectors include screen coords as a fallback.Six normalized events in recording_processor.rs: button_click, browser_click, text_input_completed, browser_tab_navigation, application_switch, file_opened.
Browser plus desktop unificationDifferent runtimes for browser (UIPath.WebAutomation) and desktop (UIA).One executor. Browser clicks fall back to UIA via mcp_converter.rs (run_command with browser script plus UIA fallback).
Where the source livesClosed source. Selectors are visible in Studio; the resolver is not.Open source at github.com/mediar-ai/terminator. The executor crate, the recorder, and the match cascade are all readable.

Specific to Windows desktop workflows on legacy systems (SAP GUI, banking core, EHR). Browser-only flows on modern SaaS sit in a different bucket where API-first integration beats both.

What this means if your UiPath estate has stalled

The honest path is parallel. Stable workflows that have run unchanged for eighteen months are paying for themselves; touching them is churn. The workflows worth moving are the ones where selector maintenance has stalled development for more than three months, or where the queue of new automation requests is permanently longer than the team that ships them. We see this most often at mid-market banks on Jack Henry, Fiserv, or FIS, at insurance carriers on claims intake, and at F&B chains on SAP B1 with high transaction volume.

For those workflows, the accessibility-API-agent replacement is not a UiPath feature match. It is a different architecture with the model at authoring and plain code at runtime. Read the executor source before you sign anything. If a vendor cannot show you the equivalent file in their stack, you are buying a chatbot that clicks buttons.

Bring one stalled UiPath workflow on the call. We will show the runtime file.

A 30-minute walk through the recorder, the authoring pipeline, and the 871-line executor against your real environment. No slides; the actual artifact that ships.

Frequently asked questions

Is 'accessibility tree' the same thing as 'accessibility API' in this context?

Yes. The accessibility API (Microsoft UI Automation on Windows, AT-SPI on Linux, NSAccessibility on macOS) is the binding; the accessibility tree is the live data structure that binding exposes (a tree of elements with role, name, automation id, bounds, parent, children). Vendors using the phrase 'accessibility tree agents' and the phrase 'accessibility API agents' are talking about the same architectural pattern: an agent that walks the live tree on every step rather than baking a selector at authoring time. The five-question test below applies to both phrasings.

Why does it matter whether a UiPath alternative actually uses the accessibility API?

Because the alternative is doing one of three things under the hood: pixel matching (computer vision), accessibility-tree reading (UIA on Windows, AT-SPI on Linux, NSAccessibility on macOS), or a hybrid. Pixel matching breaks on screen resolution changes, dark-mode toggles, and font rendering shifts. Accessibility-tree reading is what screen readers use, which means the API surface is stable across UI patches because the same surface has to keep working for assistive technology. If a vendor markets accessibility APIs and ships pixel matching, the workflow will fail in production for the same reasons UiPath workflows fail. The answer is to ask for the source.

Doesn't UiPath also use Microsoft UI Automation?

Yes, but at a different layer. UiPath Studio reads UIA properties (automationid, name, role, class) at authoring time and bakes them into a single XPath-style selector that the Robot resolves at replay. If any predicate misses (a label rename, a class change, a panel reorder), the activity throws. An accessibility-API agent reads the live tree on every step from the currently focused element outward, with a multi-strategy match cascade that tolerates predicate misses. Same API, different replay model. The cost difference at scale is the maintenance hours that selector-based RPA charges back to its developers every month.

How do I tell if a vendor's AI agent really runs on accessibility APIs?

One test: ask for the file path of the runtime that drives the OS, then ask which crate or library that file imports for accessibility. On Windows that should be a UI Automation binding (the uiautomation Rust crate, the System.Windows.Automation .NET namespace, or a wrapper around IUIAutomation directly). On Linux, AT-SPI through pyatspi or libatspi. On macOS, Accessibility through PyObjC or a Swift wrapper. If the answer is a vision model, a screenshot pipeline, or 'we abstract that' without naming a binding, the agent is not reading the accessibility tree. Mediar's binding is the uiautomation Rust crate; the importing files are in apps/desktop/src-tauri/src and crates/executor/src.

What recording layer should an accessibility-API agent ship with?

Recording is where the agent learns intent. A good recorder filters mouse moves, raw keystrokes, and pixel coordinates. It captures normalized intent events at the accessibility layer. Mediar's recorder produces six events: button_click and browser_click (clicks normalized across UIA and browser DOM), text_input_completed (final value after focus loss, not every keystroke), browser_tab_navigation, application_switch, and file_opened. The list is in apps/desktop/src-tauri/src/recording_processor.rs around line 250. If a vendor records every mouse move, the workflow will be a pixel chain underneath whatever marketing is on top.

Does this approach work with SAP GUI, Jack Henry, Fiserv, and other legacy desktop systems?

Yes, and that is the whole point. Browser-only AI agents fail on these systems because there is no DOM. Vision-based agents fail because the screens are dense, low-contrast, and nondeterministic in layout. SAP GUI, Jack Henry, Fiserv FIS, Epic, Cerner, Oracle EBS, and mainframe terminals all expose UIA elements (because they have to, for accessibility compliance), even when they look like opaque green-screens. The Terminator SDK reads those elements directly. That is why Mediar exists for legacy desktop estates rather than for new SaaS workflows.

Is the runtime really 871 lines? What does the rest of the system look like?

The 871 lines are the executor service that pulls workflows off a queue and calls MCP execute_sequence. The MCP server is a separate service that wraps the Terminator SDK; the SDK itself is a published Rust crate (terminator-rs). The recorder is a Tauri desktop app at apps/desktop with a recorder module (apps/desktop/src-tauri/src/workflow_recorder.rs) and a tree-capture module (apps/desktop/src-tauri/src/ui_tree_capture.rs). The authoring layer (recording to TypeScript workflow file) runs in modal_apps and uses Gemini Vertex AI. The point is that the model is in the authoring path, not in the runtime path.

What does a Mediar pilot look like compared to a UiPath pilot?

A UiPath pilot starts with selector authoring in Studio. A certified developer rebuilds the workflow click by click, tunes the selectors against your environment, publishes a NuGet package to Orchestrator, and schedules it. Three to six weeks for a moderately complex workflow is normal. A Mediar pilot starts with a recording. An ops analyst opens the desktop app, hits record, and walks the workflow once on a real environment. The authoring layer emits a TypeScript file, the executor replays it, and the result lands in the same Postgres queue your team can inspect. Days to first run instead of weeks.

What does this cost vs UiPath?

UiPath public list pricing for an Attended Robot starts at about $420 per month, plus per-Studio-seat developer fees and a separate Orchestrator license tier. Enterprise contracts at a Center of Excellence commonly land between $100K and $500K-plus for the first year. Mediar bills $0.75 per minute of executor runtime, plus a $10K turn-key program fee that converts to credits with a bonus. A workflow that runs ten minutes a day for a year costs roughly $1,575 in runtime. The site cites about 20% of UiPath cost at typical mid-enterprise volumes; the savings shrink at extreme concurrency and grow on long-tail workflows that selector-based RPA cannot keep alive.