Three words, two decades

The meaning of “robotic process automation”, one word at a time.

Most explanations of robotic process automation read like a textbook entry copied from another textbook entry. They define the phrase as a unit, hand you a list of use cases, and finish on a comparison with AI. The phrase deserves a closer reading. It is three words, coined by Blue Prism around 2003 to describe a specific Windows scripting technique, and in 2026 each word now points at something the original definition did not anticipate. This page walks each word in turn, lines up the 2003 meaning against the 2026 meaning, and grounds the modern reading in the source of an open-source executor a reader can clone and inspect.

Matthew Diakonov, Written with AI

Published April 27, 202611 min

Where the phrase came from

The story most often retold is that Blue Prism, a UK software company founded in 2001, coined the phrase “robotic process automation” around 2003 to describe their new product category. Blue Prism was selling a Windows runtime that could log into enterprise apps the same way a back-office worker would, click through screens, copy values from one system to another, and finish without supervision. The runtime was a server-side process that drove an unattended Windows session. Internally, the company called the units of work “digital workers”. Externally, that did not yet have a category name. The team needed a label that placed the product alongside business process management software while distinguishing it from a macro recorder, and they landed on robotic process automation.

Each word carried a specific load. “Robotic” signaled the metaphor of a tireless worker that could be cloned and supervised. “Process” tied the category to BPM, where the unit of automation was a business process, not a single screen or a single keystroke. “Automation” meant programmatic execution without a human in the loop, which was the whole pitch. UiPath, founded in Romania the same year as Blue Prism, ended up building a much larger version of the same technical idea, and Automation Anywhere did the same in the United States. The category became a global software market that, by the late 2010s, was measured in the tens of billions of dollars per year of license revenue.

What did not change for almost twenty years was the underlying technique: each of these runtimes records a script of UI actions against a rendered Windows interface and replays the script later. The scripts are brittle, the runtimes need professional services to keep alive, and the workflows take months to ship. Every word of “robotic process automation” in that sentence is doing real work, but every word means something slightly different in 2026 than it did in 2003. The next three sections walk each one.

Word one

“Robotic”: the robot used to be a script. Now it is a binary.

In 2003, the “robot” in robotic process automation was a script generated by a studio application. A developer opened the studio, walked through the target Windows app, and the studio recorded a sequence of selectors: XPath against an embedded browser, an image-hash for a button on a Citrix-published app, a control id for a Win32 window, an automation id for a WPF app. The robot was the runtime that walked that recorded selector list and dispatched Win32 input events at the matching element. It felt robotic in the sense that it was a tireless worker following a checklist, but mechanically it was a wrapped scripting engine.

In 2026, the “robot” in the Mediar build is a 871-line Rust binary at crates/executor/src/services/typescript_executor.rs. That file is where the runtime entry point lives. It is responsible for pulling a workflow off the queue, downloading the workflow file from cloud storage, injecting organization secrets into the parameters, calling the MCP execute_sequence tool, and parsing the result. The MCP server it talks to wraps the Terminator SDK, which is the part that actually walks the Windows UI Automation tree, finds the matching element via the four-strategy cascade, and dispatches a real Win32 input event. The entire chain is published as open source at github.com/mediar-ai/terminator.

Why does that matter for the meaning of the word? Because it changes what “robotic” reduces to once you open the box. The 2003 robot was a recording that broke when the recorded selectors stopped matching. The 2026 robot is a binary whose match logic does not depend on a single selector at all; it depends on the live accessibility tree. A button that moves five pixels to the left in a software update does not break this robot. A panel that reorders does not break this robot. Only a change deep enough to flip the role and the name and the automation id and the visible text and the parent window all at once will. That is the modern meaning of the word: a tireless worker, yes, but one whose definition of “the same button” is much more robust than a stored selector.

The other thing that changed is who can read the source. In 2003 the robot was inside a licensed installer; the runtime was a proprietary black box. In the Mediar build the runtime is published under MIT. Anyone can clone the repository, run cargo build -p executor, and audit how the bot makes a click. That is not a footnote. Compliance teams in regulated industries care about it more than the runtime performance, because an opaque robot is a robot they cannot sign off on.

Word two

“Process”: the process used to be a flowchart. Now it is a typed file.

In 2003 the “process” in robotic process automation came from BPM software, and it inherited BPM's vocabulary. A process was a flowchart drawn in a studio application. Each shape on the canvas represented a UI action (click here, type there) or a control flow node (decide on this value, loop until that condition). The studio saved the diagram as a custom XML format that the runtime knew how to interpret. The shape of the process file was a tree of UI actions and conditional edges, and the inputs to the process were named parameters defined in the studio. That format made sense for diagrams but was hostile to source control: a one-line change might rewrite the entire XML file because the studio renumbered nodes when you saved.

In 2026 the “process” in the Mediar build is a TypeScript file. Every step in that file is a structured object with eight named fields, defined upstream by the recording pipeline: step_title, step_summary, events_that_happened, how_content_changed, results_if_any, what_was_clicked, what_was_typed, and user_intent. You can read the file in your editor. You can diff it against the previous version. You can review it in a pull request. You can leave a comment on a single step. The process is, in the most literal sense, a piece of code.

That changes what the word “process” admits. In 2003 a process was a unit you visualized first and reasoned about second. In 2026 it is a unit you reason about first; visualization is a renderer over the file, not the source of truth. That has real consequences: a security team can run a static analyzer over a directory of workflow files and ask “which workflows touch a banking core system?” and get an answer. The studio-canvas era could not answer that question without a custom integration into the studio.

The other layer of meaning is what counts as one process. In 2003 a process was almost always a single end-to-end workflow with a single canvas. In 2026 a process is whatever unit the recording pipeline synthesized, which is sometimes a single workflow and sometimes several smaller workflows the synthesis stage extracted from one long session. The granularity is no longer dictated by the developer's patience for drawing flowcharts; it is dictated by the natural seams the AI authoring layer found in the recording. That is a quieter shift, but it shows up in the average length of a generated workflow file: shorter, with more of them, each tied to a specific business event rather than a sprawling end-to-end macro.

Word three

“Automation”: the guarantee used to be brittle. Now it is layered.

In 2003 the “automation” in robotic process automation guaranteed exactly one thing: if the recorded selector matches the live element, the step runs. If not, the run fails. There was no fallback. The product was honest about that limit; it just built a service practice around fixing the misses. A typical enterprise rollout sized its support team to the expected number of selector breaks per quarter. Selector maintenance was a line item.

In 2026 the “automation” in the Mediar build is layered. The runtime tries four strategies in order before it gives up. First it tries the recorded automation id. If that fails it tries window plus bounds. If that fails it tries the visible text. If all three fail it falls back to focusing the parent window and asks the next step to retry. That cascade is implemented in pure Rust at apps/desktop/src-tauri/src/focus_state.rs. None of the four strategies call a model; all four operate on the live UI Automation tree.

What that does to the meaning of the word is push it from binary to graceful. The 2003 automation was either-or: the script ran or it didn't. The 2026 automation is a cascade with a known failure point: the worker tried the four most informative strategies the accessibility tree exposes, and only after all four failed did the run stop. That is a different semantic guarantee. It is also why automation rates above 95 percent on legacy desktop apps become possible without an army of developers fixing selectors after every release.

One more nuance is worth naming. The 2026 automation does not promise that a model will recover when all four strategies fail. There is no “LLM falls back and improvises a click.” The runtime is deterministic; if the cascade exhausts, the step is queued for re-recording the next time a human walks that path, and the AI authoring layer takes over to write a fresh step. That separation is intentional and it preserves the original promise of the word “automation”: predictable, repeatable execution. A model in the hot path would compromise that, and most enterprise compliance teams will not accept it. So the modern definition of automation is not “the bot figures out what to do” but “the bot has more ways to find the same element, and a clean re-author path for the cases none of them survive.”

“The phrase 'robotic process automation' was coined to describe a Windows scripting runtime. In 2026 the runtime that ships under the same phrase has zero references to gemini, claude, openai, or any inference library in its production crate. The architectural meaning shifted; the user-facing promise did not.”

LLM imports in crates/executor (Mediar production runtime)

The split, side by side

The two architectures share a name and a promise. They differ on how the robot is built, what a process file looks like, and how automation handles a UI that has shifted since the original recording. Here is a row-by-row read of where they diverge today.

Feature	Blue Prism style (2003 lineage)	Mediar (2026 build)
What the robot is	A scripting runtime that records selectors (XPath, image hash, control id) and replays them through Win32 SendInput	A 871-line Rust binary in crates/executor/src/services/typescript_executor.rs that calls MCP execute_sequence against Windows UI Automation
What a process is	A list of UI actions: click(x,y), type('value'), wait(2s). Branches and inputs are encoded in custom workflow XML	A TypeScript file with eight semantic fields per step (step_title, user_intent, what_was_clicked, what_was_typed, etc.) checked into source control
What automation guarantees	If the recorded selector matches at runtime, the step runs; otherwise the bot fails or proceeds against the wrong element	A four-strategy match cascade in focus_state.rs (automation id, window+bounds, visible text, parent window) absorbs typical UI shifts
How the script gets written	A developer in studio software drags actions onto a canvas, picks selectors from a recorder, debugs for weeks	A Gemini Vertex AI pipeline reads the recording session and emits the TypeScript file in minutes; the AI is gone after that
When the UI changes	Selectors miss, the run fails, a developer rebuilds the broken steps in studio	The cascade usually absorbs the change. If not, the failed step is queued for re-recording the next time a human walks that path
Where the runtime is published	Closed-source desktop runtime distributed as a licensed installer	Open source as terminator-rs on crates.io and github.com/mediar-ai/terminator under MIT

So what is the right modern definition?

A useful one-line answer: robotic process automation is software that performs a defined business task by interacting with the same applications a human would, through the same surfaces a human uses, without supervision. That is what Blue Prism meant in 2003 and it is what Mediar means in 2026. Both halves of the phrase, “robotic process” and “automation”, still pull their weight; they just resolve to different implementations than they did originally.

When you read a vendor page using these three words, the question to keep in mind is which architecture they describe. Selector-based runtimes are still a valid product category and they still ship workflows that run; they just carry more maintenance overhead. Accessibility API runtimes, with AI doing the authoring, are the newer shape. Both are honestly called robotic process automation, and both deliver the original Blue Prism pitch: a tireless worker that runs a business process unattended. The difference is what the worker is made of and how a process gets written in the first place.

If a definition is supposed to do work, this one should: it tells you what to expect from the category (unattended business task execution), it admits that the implementation has forked, and it points at the file you can open if you want to see how one of the two forks actually does the job.

Why bother with the etymology

A reader who cares about the meaning of a phrase usually has a downstream decision behind the question. In this case the decision is often whether a particular product belongs in the “real RPA” bucket or whether it is a different category in disguise. The word-by-word reading above gives you a way to answer that without a marketing argument. Three checks: is there a software bot that runs unattended? Is there a process file that encodes the business task? Is there a runtime guarantee that the bot executes the file predictably? If yes to all three, the product is robotic process automation, regardless of how the file got authored or what the bot is implemented in.

That is the version that survives a second read. It also explains why the category has not died despite a decade of think-pieces predicting otherwise. The promise was always unattended business task execution. The technology underneath has shifted twice already, and it will probably shift again. The phrase will stay because the promise has stayed.

See an AI-authored, deterministic-runtime RPA bot run on your own workflow

Bring a Windows process you would have built in UiPath. We will record it live, show the TypeScript file the AI emits, and run the deterministic replay against your environment in the same call.

Frequently asked questions

What does robotic process automation actually mean today?

It means software that performs a defined business task by interacting with the same applications a human would, through the same surfaces a human uses (screen, keyboard, mouse, accessibility APIs). The word 'robotic' is metaphorical: there is no physical robot. The word 'process' refers to a business process, not an OS process. The word 'automation' refers to programmatic, repeatable execution. The phrase was coined by Blue Prism around 2003 and originally described a Windows-only scripting runtime. In 2026 it covers two very different architectures that share the name.

Why did Blue Prism coin the phrase in 2003?

Blue Prism needed a label that distinguished its product from older categories. Screen scrapers existed since the 1990s, BPM (business process management) software existed, and macro recorders existed. Blue Prism's pitch was that they had a runtime that could string those primitives into end-to-end business processes that ran unattended on a server. They needed a phrase that captured 'automation of business processes by software bots.' The word 'robotic' was the marketing flourish, the rest was the technical claim. The category exploded a decade later when UiPath and Automation Anywhere productized similar runtimes.

Why are there two architectures sharing the same name now?

Because the original architecture, selector-based scripting against the rendered UI, has well known limits that compounded over time: months to author, brittle when UIs change, expensive at enterprise scale, and unable to handle workflows where the input format keeps changing. AI-authored alternatives use the same accessibility tree the original products use, but they let a frontier model write the workflow from a recording instead of asking a developer to author it by hand. The runtime is still deterministic; only the authoring step is intelligent. Both styles are honestly called 'robotic process automation' because both deliver the original promise: software bots that run a business process unattended.

Is RPA the same thing as an AI agent?

No. An AI agent calls a model on every step at runtime to pick the next action. RPA, in either of the two architectures above, runs a deterministic script. Mediar AI is a confusing case because it uses an LLM during authoring (a Gemini call processes the recording) and then strips the LLM out of the runtime entirely. The production executor crate at crates/executor in the open-source repository has zero references to gemini, claude, openai, or any inference library; it only talks to a Windows session via MCP. So the product is technically RPA at runtime and AI at authoring time. Most enterprises that need predictable per-step behavior want this exact split.

Does 'process' in RPA refer to a Linux process or a Windows process?

Neither. It refers to a business process, which is the unit of work an enterprise tracks: a vendor onboarding, a claims intake, an invoice posting, a patient registration. The system process the bot runs inside is incidental. Most RPA bots run inside a regular Windows user session, sometimes attended (next to a human) and sometimes unattended (on a dedicated VM), but the 'process' in the name does not refer to that. This is a common source of confusion when engineers first encounter the category.

What does the open-source executor in Mediar look like?

It is a Rust crate at crates/executor in the github.com/mediar-ai/terminator repository. The runtime entry point is crates/executor/src/services/typescript_executor.rs, which is 871 lines and does one thing: pull a workflow off the queue, build MCP arguments (file URL, secrets, trace id), and call the MCP execute_sequence tool against a Windows session. The MCP server in turn wraps the Terminator SDK, which is the part that actually talks to Windows UI Automation. There is no model in this loop. The 'robot' in robotic process automation is, in this codebase, that 871-line Rust binary plus the SDK underneath it.

How is the Mediar executor different from a generic Win32 macro recorder?

A macro recorder stores keystrokes and mouse coordinates and replays them. The Mediar executor stores no coordinates. It stores semantic descriptions of what the user did (what was clicked, what was typed, what changed in the accessibility tree) and at replay time it walks the live tree to find the matching element. That tree is the same one screen readers consume, so the executor inherits decades of accessibility plumbing inside Windows applications. Coordinates are recomputed on every run, which is why a window resize, a DPI change, or a small UI tweak does not break the workflow.

Is RPA dying because of AI agents?

The category is consolidating, not dying. Pure selector-based RPA has the limits described above and the largest vendors are building AI-authoring layers on top of their selector runtimes to address them. New entrants like Mediar started with the AI-authoring layer and built a deterministic accessibility-API runtime underneath. The end state across both vendors is the same shape: an AI compiles a recording into a workflow file, a deterministic runtime executes the file. The phrase 'robotic process automation' will keep describing this whole stack because the user-facing promise (software bots run a business process unattended) has not changed.

Keep reading

Inside Mediar

What is Mediar? The recording-to-replay pipeline, opened up

What gets stored when Mediar watches one click: the accessibility tree before and after, the four-stage processing pipeline, and the eight-field semantic record per step.

Read

Architecture

Where the AI lives in Mediar AI (and where it does not)

The AI runs once, offline, during recording. The runtime is plain Rust calling Windows accessibility APIs. This is the source-level walkthrough of that split.

Read

Company

Mediar, the company: the founders, the funding, and the open-source SDK

Background on the Y Combinator backed company at mediar.ai, the open-source Terminator SDK, and how the open-source pieces fit into the commercial product.

Read