How RPA automation actually runs
Most explanations of RPA stop at “software robots that click through your apps.” That tells you what it looks like, not what runs. This page opens the engine: what an automation is as a data structure, how a single step executes, and why a per-step failure rule is the difference between an automation that self-heals and one that stops the first time a screen moves.
Direct answer · verified 2026-06-18
Automating a task with RPA means a software robot runs a saved sequence of interface actions against the same apps a person uses. The durable version of this reads the operating system’s accessibility tree (the layer screen readers use) instead of matching pixels or recorded selectors, and every step carries its own failure rule, so the automation retries or reroutes itself when a UI changes rather than throwing and stopping.
The conventional framing, that a classic RPA bot will not adapt to changes in the systems it interacts with, is true of selector-based tools. It is not a law of RPA. The rest of this page shows why.
An automation is a sequence of steps, not a recording
When people say “the robot does the task,” the robot is walking an ordered list. In Mediar’s executor, the unit of work is a WorkflowSequence: a list of WorkflowStep entries plus the variables and inputs they read from. You can see the exact shape in the Rust source at crates/executor/src/models/workflow.rs.
A single step is small and explicit. It names a tool_name (the action), the arguments that action needs, a timeout, a retry_count, and, crucially, an on_error rule. That last field is where the resilience lives. Here is one step that posts an invoice number into a SAP field:
Read it in plain terms: find the text field whose accessible name is “Vendor invoice no.” and type the invoice number into it. Give it 15 seconds. If it fails, try up to two more times. If it still fails, do not stop the whole run; jump to the reopen_sap_transaction step and recover. Nothing in that step references a pixel coordinate or a recorded XPath. It references the field by what the operating system already calls it.
What happens when one step fires
The executor does not poke at the screen directly. It dispatches each step as a tool call over MCP to the machine running the target apps, which reads and acts on the accessibility tree and returns a result. On a failure, the step’s on_error rule decides what happens next.
One step, from dispatch to recovery
Because the target is identified by its accessible name rather than a screenshot or a brittle selector, a cosmetic change to the UI, a moved panel, a restyled button, usually does not move the name. That is the mechanism behind “self-healing”: there is no pixel matcher or recorded coordinate to invalidate in the first place.
The part most RPA tutorials skip: the failure model
Anyone can record a happy path. The hard part of production automation is what happens when something goes wrong, and that is exactly where generic RPA guides go quiet. Mediar splits failures into two categories in crates/executor/src/config/retry.rs.
Infrastructure failure
The VM is down, the connection was refused, a gateway returned 502. The work itself is fine; the plumbing hiccuped. These are retried automatically with exponential backoff.
Workflow-logic failure
A business rule did not hold, a validation failed, a value was missing. Retrying would just repeat a wrong answer, so these are surfaced rather than silently retried.
For the infrastructure bucket, the default retry policy is concrete and lives in the same file:
On top of that engine-level retry, every step you author has its own on_error choice from four options: stop the run, continue past it, retry the step a set number of times, or fallback to a named recovery step. A traditional bot gives you one global try/catch the developer has to wire up. Here the failure plan is part of each step’s definition.
Selector-based RPA vs accessibility-tree automation
| Feature | Classic RPA | Mediar |
|---|---|---|
| How a target field is found | Recorded selector or pixel match against a screenshot | Read live from the OS accessibility tree by role + name |
| What happens when a label or layout changes | Selector misses, the activity throws, the run stops | Step matches on the stable accessibility name; if it still misses, the step's on_error rule retries or reroutes |
| Per-step failure handling | Global try/catch authored by a developer in the studio | Each step declares on_error: stop, continue, retry, or fallback |
| Auto vs manual recovery | Failed run lands in a queue, a developer reopens it | Engine classifies the error as infrastructure (auto-retried) or workflow logic (surfaced) |
| Legacy desktop apps with no API | Often the breaking point: SAP GUI, mainframes, banking core | Exactly where the accessibility-tree approach is designed to run |
| Time to a deployed automation | Weeks to months of authoring and selector tuning | Watch the workflow once, run it in days |
Classic RPA here means the selector-and-recording model used by most traditional desktop RPA tools. Specific products differ in detail.
“We moved an LG-customer F&B chain from UiPath to Mediar; their CFO told the board they are now saving 70% on costs.”
Mediar deployment, SAP Business One
The engine design is not academic. The payoff shows up as time and money on real desktop workloads. At one mid-market insurance carrier, claims intake went from 30 minutes per claim to 2 minutes, which their own AP-team headcount math put at roughly $750K a year. A regional bank cut onboarding from 8 weeks to 2 weeks. A regional healthcare group saved about $210K a year on patient intake. None of those run on a clean API; they run on SAP GUI, banking core screens, and EHR clients, the legacy desktop layer where an accessibility-tree agent works and a browser-only agent cannot reach.
When RPA automation is the wrong tool
Automating through the UI is the right move when there is no clean way in underneath it. If a system already exposes a stable API, call the API; it will be faster and steadier than driving a screen. And if the workflow is a brand-new browser-only SaaS flow, a browser-based agent may fit better than a desktop one. Mediar is built for the opposite case: the no-API legacy desktop systems where teams have either watched a UiPath project stall or are staring at a six-figure implementation quote. That is where reading the accessibility tree earns its place.
See your own workflow run on the accessibility tree
Bring one repetitive desktop task on SAP, a banking core, or an EHR. We will show you what the step sequence and failure plan look like for it.
RPA automation, answered
What does it mean to automate something with RPA?
It means a software robot runs a recorded sequence of interface actions (click, type, read, copy) against the same desktop or web apps a person uses, so a rule-based, repetitive task runs without a human driving the keyboard. The robot does not call an API; it operates the UI the way a worker would.
How does RPA automation actually execute step by step?
Mediar represents a workflow as a sequence of steps. Each step names a tool (like type_into_field or click_element), the arguments it needs, a timeout, a retry count, and an on_error rule. The executor walks the steps one at a time, dispatches each one against the target machine through MCP, and reads results back. You can see the shape in the executor crate at crates/executor/src/models/workflow.rs.
Why do traditional RPA bots break when a screen changes?
Most RPA tools find a target by a recorded selector or by matching pixels in a screenshot. When a label is renamed, a control id changes, or a panel is reordered, the selector no longer matches, the activity throws, and the run stops. As the Wikipedia entry on RPA puts it, a classic RPA tool will not adapt to changes in the systems it interacts with.
What is the accessibility tree and why does it matter here?
Every Windows app exposes its controls through OS-level accessibility APIs, the same interfaces screen readers use. Each control has a role (button, text field, list item) and an accessible name. Reading targets from that tree is more stable than pixels because the accessible name usually survives cosmetic UI changes, which is what lets an automation self-heal instead of breaking.
What happens automatically when a step fails?
The engine classifies the error. Infrastructure failures (a VM is down, the connection was refused, a 502) are retried automatically with exponential backoff: by default that is up to three retries starting at a 30 second delay, doubling each time, capped at 600 seconds. Workflow-logic failures (a business rule did not hold, a validation failed) are not silently retried; they surface so a human can decide.
Does Mediar work on apps that have no API, like SAP GUI or a mainframe?
Yes, that is the design point. Because Mediar reads and drives the accessibility tree rather than calling an API, it runs on legacy Windows desktop systems where browser-only AI agents cannot help: SAP GUI, Oracle EBS, Jack Henry, Fiserv, FIS, Epic, Cerner, and mainframe terminals.
When is RPA automation not the right tool?
If your data lives in a modern web app with a clean API, call the API. And if a workflow is a brand-new browser-only SaaS flow, a browser-based agent may be a better fit. Mediar's edge is the legacy desktop layer, the no-API systems where every other approach stalls.
Keep reading
Robotic process automation with UiPath
Studio, Robot, and Orchestrator: how a UiPath automation is authored, published, and where the selector maintenance loop comes from.
The cost of brittle selectors in UiPath
Why recorded selectors break on UI drift and what the ongoing maintenance actually costs an RPA center of excellence.
Legacy desktop apps with no API are the real moat
SAP GUI, green-screen banking core, and EHR clients have no clean API. That is where desktop automation either works or stalls.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.