A buyer's argument
Workflow automation tools split by surface, not by feature count.
Every guide on this topic is a listicle. They rank Zapier and Make and n8n and Power Automate and Workato by how many connectors each one publishes, and they leave the buyer with a feature matrix that does not answer the only question that decides whether a tool is the right one. This piece is the inverse argument. The first decision dimension is not feature count or even price. It is the runtime surface the tool can physically reach: cron triggers, public APIs, rendered browser DOM, or the OS-level desktop event stream. The surface is what determines which workflows are representable in the tool at all. Every other decision is downstream of that one.
The four surfaces, named honestly
Every tool sold under the “workflow automation” label reaches into one or two of four surfaces. The categories are the honest taxonomy that the listicles paper over with a unified comparison grid.
Cron and polled triggers. The smallest surface. The tool fires on a schedule or on the appearance of a file or an email. Apache Airflow, GitHub Actions cron, n8n's schedule trigger, and the “new email matches X” trigger inside Zapier are the same shape. Anything that can be expressed as “run this script when this thing happens on the clock or in a known queue” lives here.
Public API orchestration. The next surface up. The tool's input vocabulary is the union of the connectors its vendor has built. A workflow is a directed graph of API calls plus the conditional logic between them. Zapier, Make, n8n, Workato, Pipedream, and Power Automate Cloud all live here. The defining constraint is that every step has to map to an HTTP request the vendor has wrapped, and the cardinality of the connector library is the load-bearing differentiator within the category.
Browser DOM and rendered web. A surface visible only when an action requires a rendered Chromium tab. Skyvern, Browser Use, CloudCruise, and the “web agents” flavor of recent AI products live here. The bundled cost is the anti-bot fabric (CAPTCHA solving, residential proxies, geo-targeting). The unit of work is one tab step.
OS-level desktop event stream. The widest surface, and the one most listicles either skip or fold under “RPA” without explanation. The tool reads what the operating system exposes through the same accessibility interfaces a screen reader uses. UiPath, Automation Anywhere, Blue Prism, Power Automate Desktop, and Mediar live here. The unit of work is a captured user action: a click, a typed value, a hotkey. This is the surface that lets a workflow include a SAP GUI window, an Oracle Forms session, an Epic Hyperspace chart inside a Citrix shell, or a Jack Henry green-screen, because the surface is the OS itself rather than any one application's API.
The keystroke-level spec that defines a step
Surfaces sound like marketing language until you ask each tool to publish what it thinks an automatable step is. An API-orchestration tool answers with a connector list. A browser-native agent answers with a list of DOM events. An OS-level recorder has a harder question to answer, because the input it receives is a raw human interface device stream and most of that stream is noise (mouse moves, modifier-less keystrokes that are part of a longer typed word, repeated key-down events from auto-repeat). The recorder has to publish a spec for what survives the filter.
Mediar's spec is in apps/desktop/src-tauri/src/workflow_recorder.rs, in a function called is_meaningful_event that lives at lines 252-316. The function is small enough to print in full, and printing it in full is the right move because the honest answer to “what counts as a step” is exactly this match expression and nothing more.
Read the array. The four virtual-key codes are the keys that count as a step on their own, with no modifier. Enter is 0x0D because it commits a form. Delete is 0x2E because it removes a row. Escape is 0x1B because it dismisses a modal. Tab is 0x09 because it moves focus, which on a SAP GUI screen is the difference between filling the next field and committing the wrong record. The letter a is not in the array because the recorder treats alphanumeric typing as part of a longer aggregated event called TextInputCompleted. Mouse moves and ordinary key-up events are dropped on the floor before they ever reach the matcher.
The point is that this is a published vocabulary the buyer can hold in their head. Seven event variants, four standalone keys, any modifier combo. That is the full spec for what an OS-level recorder on this product treats as “a step the user took.” A workflow on Mediar is a sequence of those, and nothing else. An API-orchestration tool literally has no way to write down that sequence, because the keystroke stream is not in its input channel. That gap is the entire reason a buyer with desktop workflows ends up on a different tool than a buyer with HTTP workflows.
“Claims intake at one mid-market carrier went from 30 minutes per claim to 2 minutes. The savings were $750,000 a year, drawn straight from the AP-team headcount math.”
Mediar deployment, mid-market insurance carrier (llms.txt)
The four surfaces, mapped onto a buying decision
The grid below restates the surface taxonomy as a head-to-head between an API-orchestration tool (the modal answer in this category) and an OS-level desktop tool (Mediar). The point of the grid is not to claim one always wins. It is to make explicit the axes that listicle comparisons usually elide.
| Feature | API orchestration (Zapier-shape) | OS-level desktop (Mediar) |
|---|---|---|
| Trigger surface | Cron, polled file/email, third-party API webhook. Time and external events only. | OS-level event stream: clicks, aggregated typing, hotkeys, clipboard, application switch, browser tab navigation. |
| What counts as a step | An API call or a connector node. Each step is a request the vendor has built a connector for. | Seven event variants plus four standalone keys (Enter, Delete, Escape, Tab) and any modifier combo. Defined in is_meaningful_event. |
| Apps without a public API | Out of scope. If the connector list does not include the app, the tool cannot describe the workflow. | In scope by design. The accessibility tree is the integration surface. SAP GUI, Oracle EBS, Jack Henry, Fiserv, FIS, Epic, Cerner, Excel. |
| Failure mode when the UI changes | Selector or pixel match breaks. A human edits the connector or re-records the click. | Role plus visible name plus tree path are stable across renames and reflows. No selector to break. |
| Cost unit | Per task, per zap, per connector, per seat. Charged whether the work was 200 ms or 12 seconds. | $0.75 per minute of runtime, drawn against a $10,000 prepay that converts to credits. |
Read the grid as a pair of constraints, not a scorecard. The API-orchestration column wins when every step is an HTTP call you have credentials for. The desktop column wins when at least one step is a window the user has to look at to do the work.
Where each category stops working
The honest version of a workflow automation guide names the failure modes by category. The dishonest version implies a feature matrix can rank one tool above another in a way that survives the user's first non-trivial workflow.
Cron and polled triggers fail when the trigger event is not on the clock and is not in a queue the tool can poll. A workflow that fires when a CSR drops a paper form on the front-office scanner is not a cron workflow. The right answer is to lift the trigger up to a higher surface (a polled file watcher on the share, an email rule, an OS-level recorder watching the application that opens the scan).
API-orchestration tools fail on any application that does not have a public API the buyer is willing to use. Most of the canonical legacy enterprise targets fall into this bucket: SAP GUI, Oracle EBS, Jack Henry, Fiserv, FIS, Epic Hyperspace, Cerner, mainframe terminal emulators. The API exists in some cases, but it costs more to license, requires a vendor engagement, or is gated behind a credentialing flow that the operations team does not have authority over. The API-orchestration tool is the wrong tool for that workflow not because it is bad software but because the surface is unreachable from where the tool stands.
Browser-DOM agents fail on every workflow that involves a window that is not Chromium. Skyvern, Browser Use, and CloudCruise are excellent on a hardened public web portal where the bundled anti-bot fabric is the load-bearing cost. They have no Windows desktop runtime, no Citrix runtime, no mainframe terminal connector. A workflow whose first step is a SAP order-entry screen is unrepresentable on a browser-DOM agent regardless of how much credit the buyer is willing to spend.
OS-level desktop tools fail on applications that genuinely expose nothing useful through Windows UI Automation: a small handful of custom OpenGL surfaces, some Java Swing apps in the default theme, a few legacy terminal emulators that paint their cells with raw GDI. Computer-vision RPA can sometimes reach those surfaces by reading pixels directly. The honest move is to admit the failure mode and pick the right tool. Most legacy systems Mediar's customers care about (SAP GUI, Oracle EBS, Jack Henry, Fiserv, FIS, Epic, Cerner) do publish dense and stable accessibility trees, because their vendors have to maintain them for assistive-technology compliance reasons.
A short selection rule, written down
Walk the workflow with a stopwatch. Label every step by the surface it lives on. Add up the time per surface. The right tool is the one whose surface covers the largest time-weighted slice of the workflow, with two adjustments. First, if any single step is on a surface only one category can reach, the tool has to be in that category, no matter what the time-weighted breakdown says. Second, if you find yourself building a Rube-Goldberg pipeline that hands the workflow off between three tools, the right answer is almost always to move the entire workflow up to the wider surface and let the wider tool handle the cheaper steps too. Wider tools have always been able to imitate narrower ones; the reverse is structurally impossible.
That is the whole rule. It is shorter than every comparison matrix on the topic, and it is the only one that survives the first time a buyer's actual workflow does not fit the matrix the listicle published.
Walk through your workflow on the OS-level surface
Bring the three or four steps that you suspect are the load-bearing ones, and we will tell you whether the OS-level recorder can describe them or whether you are on the wrong surface.
Questions a buyer eventually asks
What are workflow automation tools, in one sentence that survives a sales-engineering round?
Software that captures a procedure once, in some representation, and then replays it on a recurring schedule or on a trigger. The honest definition stops there. Everything past it (low-code vs no-code, AI-assisted vs deterministic, hosted vs self-hosted) is a layer of vendor positioning that does not actually decide whether a given tool can automate your workflow. The decision dimension is what the tool's input stream is allowed to contain. A tool whose only input is an HTTP webhook can never represent a workflow whose first event is a user pressing Tab inside a SAP GUI window. A tool whose input is the OS event stream can. That is the boundary, and it is the only one that holds at the architectural layer.
Are Zapier, Make, n8n, Power Automate, and Workato all the same category?
Almost. Zapier, Make, n8n, and Workato all sit on the API-orchestration surface: they describe a workflow as a chain of HTTP requests, and a step is something the tool has a connector for. Power Automate is the one that splits, because Microsoft ships two products under the same name. Power Automate Cloud is on the same API-orchestration surface as Zapier. Power Automate Desktop is on the OS-level desktop surface, the same surface as UiPath and Mediar. The cloud-vs-desktop distinction is buried inside Microsoft's product taxonomy, but it matters more than any feature on the comparison grid because it changes which workflows are even representable.
How do I tell which surface my workflow lives on without running a pilot?
Walk through the workflow with a stopwatch and label every action by where the data physically moves. If an action is one system writing to another via an HTTP call you have credentials for, it lives on the API surface. If an action requires a person to be looking at a rendered web page in a Chromium tab, it lives on the browser DOM surface. If an action requires a person to be looking at a window that is not Chromium (a SAP GUI, an Oracle Forms session, an Epic Hyperspace patient chart, an Excel workbook), it lives on the OS desktop surface. If you find your workflow has actions in two surfaces, the right tool is the one that covers the wider surface, because covering the narrower one is always cheaper to bolt on than covering the wider one is to extend down.
Why does the keystroke-level spec in workflow_recorder.rs matter for a buying decision?
Because the spec is the answer to a question every buyer eventually asks: what does this tool think a step is. An API-orchestration tool answers that with a connector list. A browser-only agent answers with a list of DOM events. An OS-level recorder has to publish a more specific answer, because its input is a raw HID stream and most of that stream is noise. Mediar's answer is the seven-variant match plus the four-element special-keys array (Enter at 0x0D, Delete at 0x2E, Escape at 0x1B, Tab at 0x09). That spec is what the tool is willing to commit to capturing across a SAP order-entry flow, an Epic chart-review pass, and a Jack Henry teller batch. If you want to know whether a desktop automation tool will hold up on your workflow, the question to ask is what does its is_meaningful_event look like.
What about the tools that promise 'AI agents that just figure it out'?
They split into two camps based on what they read. The vision-LLM camp reads pixels, runs an OCR or a multimodal model, and asks the model where to click next. The accessibility-tree camp reads what the OS exposes through the interfaces screen readers use. The vision approach is more flexible on apps that expose nothing useful through accessibility (some custom OpenGL, some Java Swing surfaces in the default theme), and weaker everywhere else because pixel patterns are less stable than a role-plus-name pair. The accessibility approach is more reliable on the canonical legacy desktop targets (SAP GUI, Oracle EBS, mainframe terminal emulators on Windows, Epic, Cerner, Jack Henry, Fiserv, FIS) because those vendors maintain their accessibility trees across releases for compliance reasons. The honest answer is to ask which surface your workflow lives on, then pick the camp whose primitives match.
How does the cost shape change between an API-orchestration tool and an OS-level desktop tool?
API tools price per task or per connector node. The cost is bounded by the number of integrations you wire up and the cardinality of the events you process. Per-task pricing is honest for steady, high-volume webhook chains. It punishes occasional but expensive workflows because the price unit does not contain the running time. OS-level desktop tools price per minute of runtime. A three-minute SAP order-entry pass is $2.25 of meter time at Mediar's $0.75 per minute. The unit is wall-clock time on the OS, which is what the tool actually consumes. The price shape that makes sense is the one whose unit matches the work the tool does. Per-task is right for orchestration; per-minute is right for desktop replay; per-step (the model Skyvern retired in January 2026) is the awkward middle.
Do I need an open-source option, and what does that get me on this surface?
Open source matters when you need to extend the input vocabulary the recorder publishes, or when your security team needs to audit the runtime end-to-end, or when you intend to embed the runtime in a product you ship. Mediar's runtime is published as the Terminator SDK at github.com/mediar-ai/terminator, with TypeScript bindings on the Rust core. The SDK is what you pull when you want to add a custom event type, integrate a different vision source, or call the recorder from inside another agent loop. If your team is happy with the published seven-variant vocabulary and the no-code recorder, the SDK is optional and most customers never touch it.
Where does Mediar refuse to be the answer?
Three places. First, pure browser-tab work where every screen is a public web app and the bundled cost is anti-bot fabric (CAPTCHA solving, residential proxies, geo-targeting). A browser-native agent like Skyvern is sized for that surface. Second, pure API-orchestration work where every step is an HTTP call you have credentials for. Zapier, Make, and n8n are sized for that. Third, applications that genuinely expose nothing through Windows UI Automation (a small handful of custom OpenGL surfaces, some legacy Java Swing themes). Computer-vision RPA can sometimes work on those. The honest tool selection names the failure mode and picks the right tool. The wrong move is to force a desktop tool onto an HTTP problem or an HTTP tool onto a desktop problem, which is exactly what happens when the buyer scores tools on feature count instead of surface.
Related
Adjacent reading from the same product
RPA agent UI input layer: accessibility tree vs pixels, and the bet each one is making
The companion piece on the input-layer choice. Walks through how the accessibility-tree approach trades a volatile selector for a captured tree, and where computer-vision RPA still wins.
What robotic process automation is, in three numbers: six event types, four stages, four match strategies
The numbers behind the surface choice. Six event types in the recorder, four stages in the runtime, four match strategies at replay time.
Skyvern pricing 2026: forecasting spend, and the boundary the credit unit cannot cross
The same surface argument applied to a budget read on a browser-native agent. Where the credit unit stops being the right number to forecast against.