Plain-English explainer

What is RPA? The definition, and the one part every guide skips

Matthew Diakonov, Written with AI

Published June 18, 20268 min

Direct answer · verified 2026-06-18

RPA (robotic process automation) is software that runs repetitive, rule-based tasks across applications by mimicking the clicks and keystrokes a person would make in the graphical interface. It follows a fixed set of rules rather than learning, which is what separates classic RPA from AI. People reach for it most when two systems need to exchange data but have no shared API to connect them.

Definition cross-checked against Wikipedia, IBM, and UiPath on 2026-06-18. The rest of this page covers the thing none of those pages explain: how the robot finds the button it clicks, which is what decides whether it keeps working.

The part everyone agrees on

Open any explainer and the definition is nearly identical: a software robot watches what a person does in an app, builds a list of those actions, then repeats them by driving the same graphical interface a human would use. Think of a macro that is not confined to one spreadsheet. It logs into systems, copies a value out of one screen, pastes it into another, clicks Submit, and moves to the next record. It is fast, it does not get bored, and it does not fat-finger a digit at 4pm.

The reason RPA exists at all is the API gap. In a clean world, two systems talk to each other directly through an interface built for machines. In the real world, the system holding the data, a 1990s SAP GUI, a mainframe green-screen, a core-banking terminal, exposes nothing a normal integration can call. The only interface it offers is the one a human stares at. RPA automates that interface. That is the whole pitch, and it is genuinely useful.

Every guide stops here, adds a list of benefits, and ends. But the definition describes what RPA looks like, not what makes one RPA bot survive a year in production while another dies on its first Tuesday. That difference comes down to a single mechanism nobody writes about.

The question every definition skips: how does the robot find the button?

“It clicks the button” hides the only hard problem in RPA. A screen is a grid of pixels. There is no button there, just colored rectangles and text. Before a robot can click anything, it has to decide which thing on screen is the button it wants. How it answers that question determines everything about whether the automation is robust or brittle. There are three ways to do it.

How it locates the element	What it stores	Breaks when
Coordinates	An x/y pixel position to click	The window moves, resizes, or the screen resolution changes
Image / OCR (vision)	A screenshot of the button, or text read off the pixels	Theme, font, scaling, or contrast shifts; slow and fuzzy at volume
Accessibility tree	The element’s role and name, e.g. a button labeled “Post”	Rarely; the label has to actually change, not just move

The third row is the same data a screen reader uses to announce a screen to a blind user. Every well-built Windows app already publishes it through the UI Automation API, whether or not anyone uses it. An RPA bot that reads that tree does not care where the button sits or what it looks like. It asks the operating system, “where is the control whose role is Button and whose name is Post,” and the OS answers with the current position. Move the button, restyle it, bump the resolution: the query still resolves.

What a robust locator actually looks like

This is not theory. In Mediar’s desktop recorder, the moment you click a control once, the code in apps/desktop/src-tauri/src/mcp_converter.rs reads the accessibility node under your cursor and writes a selector in this exact form:

// from a recorded click, generated at capture time
let element_role = ui_element.role();          // "Button"
let element_name = ui_element.name();           // "Post"

let selector = if !element_name.is_empty() {
    format!("role:{element_role} && text:{element_name}")
    //   ->  role:Button && text:Post
} else {
    format!("role:{element_role}")
};

No coordinate is saved. No screenshot is saved. The unit of memory is role:Button && text:Post, and selectors chain across windows, for example process:chrome >> role:Document. That single design decision is why the same recording survives the UI update that would have killed a coordinate-based or image-based bot. You can read the grammar yourself: the engine is the open-source, MIT-licensed Terminator SDK, described by its authors as “Playwright for Windows.”

Record: one click becomes a durable selector

record

Replay: the layout moved, the step still passes

replay

Why this is the real cost of RPA

When teams say their RPA program “stalled,” they almost never mean it failed to build. They mean it failed to stay built. A bot bound to pixel positions or saved images needs a person to repair it every time a vendor ships a release, moves a field, or renames a label. Multiply that across hundreds of automations and the maintenance line item dwarfs the build cost. The locator mechanism you picked on day one is what sets that bill.

70%

“We moved an LG-customer F&B chain from UiPath to Mediar. Their CFO told the board they are now saving 70% on costs, largely because the automations stop needing a babysitter every time a screen changes.”

Mediar deployment, food and beverage chain on SAP Business One

This is also the honest line between approaches. Browser-based AI agents are strong on modern web SaaS. But if your data lives in SAP GUI, an Oracle EBS form, or a Jack Henry green-screen, a browser agent has nothing to grab onto. The accessibility-tree approach exists precisely for the desktop and legacy layer where the others do not reach. For a closer look at why those locators differ, see selectors: Selenium paths vs the accessibility tree.

What happens when an RPA step runs

Stripped to its core, every RPA action is the same short loop, repeated thousands of times an hour. The locator strategy lives inside the first step, and it quietly decides whether the rest of the loop succeeds.

Locate

Find the target control. A robust bot queries the accessibility tree by role and name; a brittle one trusts a remembered pixel or image.

Act

Send the click or keystroke to that control: press a button, type into a field, select from a list.

Verify

Confirm the screen changed the way it should before moving on, so a missed click does not cascade into corrupt data.

Recover or stop

If the control was not found, a per-step rule decides what happens: retry, skip, fall back, or halt. This is where AI-assisted RPA re-finds an element instead of failing.

For the data-structure view of that loop, how a whole automation is stored and executed step by step, read how RPA automation actually runs.

So what should you actually use?

If you are evaluating RPA, the most useful question is not “what is RPA” but “how does this tool find elements, and what happens when the UI changes.” The mature platforms, UiPath, Automation Anywhere, Blue Prism, Power Automate, can all do the work. The difference shows up in the maintenance bill and in whether they reach your legacy desktop apps at all.

Mediar’s bet is the accessibility-first one described above: watch a workflow once, execute it through Windows accessibility APIs, self-heal when labels and layouts shift, and keep the engine open source so the locator logic is inspectable rather than a black box. Pricing is usage-based at $0.75 per minute of runtime with no per-seat licensing. It is SOC 2 Type II certified and HIPAA compliant, and runs on-prem or in the cloud. If your workflows are stuck on SAP, Oracle, a core-banking system, or an EHR, that is the layer it was built for. If everything you automate is modern web SaaS, a browser-native tool may serve you just as well, and it is fair to say so.

See it find an element on your own legacy screen

Bring a workflow stuck on SAP, a core-banking system, or an EHR. We will record it once and show you the selector it generates, live.

What is RPA: common questions

What does RPA stand for and what is it in one sentence?

RPA stands for robotic process automation. It is software that runs repetitive, rule-based tasks across desktop and web applications by mimicking the same clicks and keystrokes a person would perform in the graphical interface, driven by a fixed set of rules rather than by learning.

Is RPA the same as AI?

No. Classic RPA is process-driven: it follows a fixed script of steps and does only what it was told. AI is data-driven: it recognizes patterns and can handle inputs it was not explicitly programmed for. Modern tools blend the two, using AI to read a workflow once and to recover when a screen changes, while still executing deterministic steps for speed and auditability.

How does an RPA robot actually find the button it clicks?

There are three approaches. Old or cheap tools record screen coordinates (x and y) and replay them blindly. Vision-based tools take a screenshot and match a saved image or run OCR. Accessibility-first tools read the operating system's UI Automation tree and target an element by its role and name, for example role:Button && text:Post, which is the same data a screen reader uses. The third approach survives layout changes because it never depends on where a control sits on screen.

Why does traditional RPA break so often?

Most RPA bots are bound to brittle locators: a pixel position, a saved image, or a deep selector path that names every parent element. When a vendor ships a UI update, moves a field, or renames a label, the locator no longer resolves and the bot stops. That maintenance burden is the single biggest hidden cost of an RPA program, and it is why projects that look cheap to build get expensive to keep running.

What kinds of tasks is RPA good for?

High-volume, rule-based, repetitive work where the steps are stable: data entry, moving records between systems that have no shared API, reconciling spreadsheets, extracting fields from PDFs into a desktop app, claims intake, and core-banking or EHR data sync. It is a poor fit for judgment-heavy work or anything that requires understanding unstructured context the way a person would.

Where does RPA still beat APIs and integrations?

When the system you need to touch has no usable API. SAP GUI, mainframe green-screens, Oracle EBS, Jack Henry, Fiserv, FIS, Epic, and Cerner often expose nothing a normal integration can call. RPA works at the interface layer those systems do present to a human, which is exactly why it persists in finance, insurance, banking, and healthcare.

How is Mediar's approach to RPA different?

Mediar watches a workflow once, then executes it through Windows accessibility APIs rather than pixel matching or saved selectors. Because each step is resolved by an element's role and name at runtime, the automation self-heals when a UI moves. The execution engine is the open-source, MIT-licensed Terminator SDK at github.com/mediar-ai/terminator, so the way it locates and acts on elements is something you can read, not a black box.

The mechanics behind robust desktop automation

Keep reading

Engine

How RPA automation actually runs

A deeper look under the hood: what an automation is as a data structure, how one step executes, and why a per-step failure rule decides whether it self-heals.

8 minRead

Comparison

RPA selectors: Selenium paths vs the accessibility tree

Why deep selector paths break on the next release, and what changes when you target elements by their accessibility role and name instead.

7 minRead

Context

Why legacy desktop apps with no API are the real moat

SAP GUI, mainframes, and core-banking screens expose nothing an API can call. That gap is exactly where RPA lives and where browser agents do not help.

6 minRead

Want to try the recorder yourself? Start at app.mediar.ai/web.