RPA, at the engine level

How RPA automation actually runs

Most explanations of RPA stop at “software robots that click through your apps.” That tells you what it looks like, not what runs. This page opens the engine: what an automation is as a data structure, how a single step executes, and why a per-step failure rule is the difference between an automation that self-heals and one that stops the first time a screen moves.

Matthew Diakonov, Written with AI

Published June 18, 20269 min read

Direct answer · verified 2026-06-18

Automating a task with RPA means a software robot runs a saved sequence of interface actions against the same apps a person uses. The durable version of this reads the operating system’s accessibility tree (the layer screen readers use) instead of matching pixels or recorded selectors, and every step carries its own failure rule, so the automation retries or reroutes itself when a UI changes rather than throwing and stopping.

The conventional framing, that a classic RPA bot will not adapt to changes in the systems it interacts with, is true of selector-based tools. It is not a law of RPA. The rest of this page shows why.

An automation is a sequence of steps, not a recording

When people say “the robot does the task,” the robot is walking an ordered list. In Mediar’s executor, the unit of work is a WorkflowSequence: a list of WorkflowStep entries plus the variables and inputs they read from. You can see the exact shape in the Rust source at crates/executor/src/models/workflow.rs.

A single step is small and explicit. It names a tool_name (the action), the arguments that action needs, a timeout, a retry_count, and, crucially, an on_error rule. That last field is where the resilience lives. Here is one step that posts an invoice number into a SAP field:

step.post_invoice.json

Read it in plain terms: find the text field whose accessible name is “Vendor invoice no.” and type the invoice number into it. Give it 15 seconds. If it fails, try up to two more times. If it still fails, do not stop the whole run; jump to the reopen_sap_transaction step and recover. Nothing in that step references a pixel coordinate or a recorded XPath. It references the field by what the operating system already calls it.

What happens when one step fires

The executor does not poke at the screen directly. It dispatches each step as a tool call over MCP to the machine running the target apps, which reads and acts on the accessibility tree and returns a result. On a failure, the step’s on_error rule decides what happens next.

One step, from dispatch to recovery

Because the target is identified by its accessible name rather than a screenshot or a brittle selector, a cosmetic change to the UI, a moved panel, a restyled button, usually does not move the name. That is the mechanism behind “self-healing”: there is no pixel matcher or recorded coordinate to invalidate in the first place.

The part most RPA tutorials skip: the failure model

Anyone can record a happy path. The hard part of production automation is what happens when something goes wrong, and that is exactly where generic RPA guides go quiet. Mediar splits failures into two categories in crates/executor/src/config/retry.rs.

Infrastructure failure

The VM is down, the connection was refused, a gateway returned 502. The work itself is fine; the plumbing hiccuped. These are retried automatically with exponential backoff.

Workflow-logic failure

A business rule did not hold, a validation failed, a value was missing. Retrying would just repeat a wrong answer, so these are surfaced rather than silently retried.

For the infrastructure bucket, the default retry policy is concrete and lives in the same file:

0Infrastructure attempts (1 try + 3 retries)

0sInitial backoff before first retry

0xBackoff multiplier each attempt

0sBackoff ceiling (10 min cap)

On top of that engine-level retry, every step you author has its own on_error choice from four options: stop the run, continue past it, retry the step a set number of times, or fallback to a named recovery step. A traditional bot gives you one global try/catch the developer has to wire up. Here the failure plan is part of each step’s definition.

Selector-based RPA vs accessibility-tree automation

Feature	Classic RPA	Mediar
How a target field is found	Recorded selector or pixel match against a screenshot	Read live from the OS accessibility tree by role + name
What happens when a label or layout changes	Selector misses, the activity throws, the run stops	Step matches on the stable accessibility name; if it still misses, the step's on_error rule retries or reroutes
Per-step failure handling	Global try/catch authored by a developer in the studio	Each step declares on_error: stop, continue, retry, or fallback
Auto vs manual recovery	Failed run lands in a queue, a developer reopens it	Engine classifies the error as infrastructure (auto-retried) or workflow logic (surfaced)
Legacy desktop apps with no API	Often the breaking point: SAP GUI, mainframes, banking core	Exactly where the accessibility-tree approach is designed to run
Time to a deployed automation	Weeks to months of authoring and selector tuning	Watch the workflow once, run it in days

Classic RPA here means the selector-and-recording model used by most traditional desktop RPA tools. Specific products differ in detail.

70%

“We moved an LG-customer F&B chain from UiPath to Mediar; their CFO told the board they are now saving 70% on costs.”

Mediar deployment, SAP Business One

The engine design is not academic. The payoff shows up as time and money on real desktop workloads. At one mid-market insurance carrier, claims intake went from 30 minutes per claim to 2 minutes, which their own AP-team headcount math put at roughly $750K a year. A regional bank cut onboarding from 8 weeks to 2 weeks. A regional healthcare group saved about $210K a year on patient intake. None of those run on a clean API; they run on SAP GUI, banking core screens, and EHR clients, the legacy desktop layer where an accessibility-tree agent works and a browser-only agent cannot reach.

When RPA automation is the wrong tool

Automating through the UI is the right move when there is no clean way in underneath it. If a system already exposes a stable API, call the API; it will be faster and steadier than driving a screen. And if the workflow is a brand-new browser-only SaaS flow, a browser-based agent may fit better than a desktop one. Mediar is built for the opposite case: the no-API legacy desktop systems where teams have either watched a UiPath project stall or are staring at a six-figure implementation quote. That is where reading the accessibility tree earns its place.

See your own workflow run on the accessibility tree

Bring one repetitive desktop task on SAP, a banking core, or an EHR. We will show you what the step sequence and failure plan look like for it.

RPA automation, answered

What does it mean to automate something with RPA?

It means a software robot runs a recorded sequence of interface actions (click, type, read, copy) against the same desktop or web apps a person uses, so a rule-based, repetitive task runs without a human driving the keyboard. The robot does not call an API; it operates the UI the way a worker would.

How does RPA automation actually execute step by step?

Mediar represents a workflow as a sequence of steps. Each step names a tool (like type_into_field or click_element), the arguments it needs, a timeout, a retry count, and an on_error rule. The executor walks the steps one at a time, dispatches each one against the target machine through MCP, and reads results back. You can see the shape in the executor crate at crates/executor/src/models/workflow.rs.

Why do traditional RPA bots break when a screen changes?

Most RPA tools find a target by a recorded selector or by matching pixels in a screenshot. When a label is renamed, a control id changes, or a panel is reordered, the selector no longer matches, the activity throws, and the run stops. As the Wikipedia entry on RPA puts it, a classic RPA tool will not adapt to changes in the systems it interacts with.

What is the accessibility tree and why does it matter here?

Every Windows app exposes its controls through OS-level accessibility APIs, the same interfaces screen readers use. Each control has a role (button, text field, list item) and an accessible name. Reading targets from that tree is more stable than pixels because the accessible name usually survives cosmetic UI changes, which is what lets an automation self-heal instead of breaking.

What happens automatically when a step fails?

The engine classifies the error. Infrastructure failures (a VM is down, the connection was refused, a 502) are retried automatically with exponential backoff: by default that is up to three retries starting at a 30 second delay, doubling each time, capped at 600 seconds. Workflow-logic failures (a business rule did not hold, a validation failed) are not silently retried; they surface so a human can decide.

Does Mediar work on apps that have no API, like SAP GUI or a mainframe?

Yes, that is the design point. Because Mediar reads and drives the accessibility tree rather than calling an API, it runs on legacy Windows desktop systems where browser-only AI agents cannot help: SAP GUI, Oracle EBS, Jack Henry, Fiserv, FIS, Epic, Cerner, and mainframe terminals.

When is RPA automation not the right tool?

If your data lives in a modern web app with a clean API, call the API. And if a workflow is a brand-new browser-only SaaS flow, a browser-based agent may be a better fit. Mediar's edge is the legacy desktop layer, the no-API systems where every other approach stalls.