RPA, but only inside the browser

Skyvern as RPA: what it automates, and where the browser boundary falls

Skyvern is one of the better AI takes on RPA. It automates web workflows by looking at the page and reasoning about it, not by replaying a brittle recorded script. That genuinely fixes the maintenance problem classic RPA tools have. The part the existing write-ups skip is the boundary: Skyvern drives a Chromium browser, and a large share of enterprise RPA does not live in a browser.

M
Matthew Diakonov
7 min read

Direct answer

Yes, Skyvern is RPA: open-source, Y Combinator-backed, and built to automate browser workflows using an LLM and computer vision instead of CSS selectors. Because it drives a real Chromium browser, it covers anything that runs on the web and does not cover desktop thick-client apps (SAP GUI, mainframe terminals, Epic Hyperspace), which have no browser surface to act on.

Verified June 18, 2026 against github.com/Skyvern-AI/skyvern and skyvern.com.

What Skyvern actually is

Skyvern is an open-source browser-automation agent. Under the hood it is a Playwright-compatible runtime with a model layer bolted on: you describe a task in plain language, and instead of running a recorded click path, it reads the rendered page (both the DOM and the pixels) and decides what to do next. It handles the messy parts of real web work, including captchas and 2FA, and exposes both an SDK for developers and a no-code workflow builder for everyone else.

The reason this counts as RPA, and not just a chat toy, is that it executes end-to-end unattended runs against live websites. The reason people reach for it over UiPath or Automation Anywhere is the maintenance math: a vision-and-reasoning agent does not snap the moment a button moves twelve pixels or a field gets renamed. That is a real and useful difference. It is also entirely a story about the web surface.

The benchmark behind the claims

When Skyvern talks about state-of-the-art performance, the number is from WebVoyager. Read it carefully, because it tells you precisely what the tool is good at.

0%

Skyvern 2.0 score on the WebVoyager web-agent benchmark

0

tasks in the benchmark, across live websites

0

public consumer-web sites, every one a browser page

Fifteen sites: shopping, maps, travel, search, the kind of public web properties you reach from a stock browser. There is no SAP screen in the suite, no green-screen terminal, no Citrix-published desktop client, because those are not web pages. The 85.85% is an honest measure of browser competence. It says nothing about the desktop, and it was never meant to.

How a Skyvern run executes

Watching one run go by makes the boundary obvious. Every step lives in a browser tab.

One Skyvern task, start to finish

01 / 04

1. Goal goes in

You give Skyvern a task in natural language (“log in, download the March statement, extract the line items”) or build it in the no-code workflow editor. No selectors, no XPaths.

The anchor: there is no desktop runtime

Here is the single fact that the ranking guides on this topic leave out. The word “RPA” historically covers desktop work: data entry into SAP GUI, posting to a mainframe terminal, keying a claim into an insurance thick client. Skyvern’s entire action space is the browser. No tier, free or enterprise, ships a Windows desktop runtime that can click a button in a native application that is not a web page.

That is not a knock on Skyvern. A browser agent that tried to also be a desktop agent would be two products. It is just the line you need to know before you scope a project. If your target app opens in Chrome, Skyvern is in play. If it opens as a Windows executable, a Citrix session, or a terminal emulator, the browser agent has nothing to grab onto, and you need a different mechanism entirely.

That different mechanism is the OS accessibility tree, the same interface screen readers use. Every Windows control exposes a role, a name, and a value to the accessibility API whether it lives in a browser or in SAP GUI. Mediar reads that tree to drive the desktop apps a browser agent cannot see, and it self-heals on a UI change for the same reason Skyvern does on the web: it matches on labels and roles, not on pixel coordinates or recorded selectors. The open-source Terminator SDK is how teams script that layer directly.

Same idea, two different surfaces

Both tools replace brittle selectors with something more durable. They just point that durability at different layers of the stack.

FeatureSkyvernMediar
Where it runsInside a Chromium browser tabOn the Windows desktop, any app with an accessibility tree
Public consumer web (shop, book, log in, scrape)Strong. This is what it is built forWorks, but not where Mediar earns its keep
SAP GUI, Oracle EBS, mainframe green-screensOut of scope. No desktop runtime ships in any tierNative. Reads fields via the OS accessibility API
Epic / Cerner thick clients, Jack Henry, Fiserv, FISOut of scope unless a pure web portal existsSupported on the desktop client directly
How it survives a UI changeVision model re-reads the page each runReads accessibility labels and roles, not pixels or coordinates
Open sourceYes, AGPL core on GitHubTerminator SDK on GitHub for custom desktop workflows

This is not a knock on Skyvern. For pure web workflows it is one of the stronger AI-RPA options. The split is about surface, not quality.

So how do you choose?

Do an honest inventory of where the work is stuck. If your backlog is SaaS dashboards, web portals, public lookups, and e-commerce flows, a browser-native agent is the right tool and Skyvern is a credible pick. If the bottleneck is a Windows desktop app with no usable API, an SAP GUI, an Oracle EBS form, a Jack Henry or Fiserv core, an Epic or Cerner thick client, or a mainframe green-screen, then no amount of browser cleverness reaches it, and that is exactly the layer Mediar was built for.

Plenty of real automation programs need both. The mistake is assuming a browser agent’s benchmark score transfers to the desktop. It does not, because the desktop was never on the benchmark.

Stuck on the desktop half of your RPA backlog?

Bring the workflows a browser agent cannot reach and we will tell you, honestly, whether the accessibility-tree approach fits.

Frequently asked questions

Is Skyvern an RPA tool?

Yes, but a specific kind. Skyvern is AI-driven RPA for the browser. It automates web workflows (logins, form fills, data extraction, checkouts) using an LLM and computer vision instead of the recorded selectors and brittle click paths that classic RPA tools like UiPath and Automation Anywhere rely on. The catch is the surface: it drives a Chromium browser, so it is RPA for anything that lives on the web, not for desktop thick-client apps.

What is the 85.85% number Skyvern cites?

That is Skyvern 2.0's score on WebVoyager, a public web-agent benchmark of 643 tasks spread across exactly 15 real consumer websites (search, shopping, maps, travel, and similar). It is a strong result for browser navigation. It is worth reading literally: every one of those 15 sites is a public web property reachable from a stock browser, so the score measures web competence, not desktop competence.

Can Skyvern automate SAP GUI or a mainframe terminal?

Not the desktop application. If your SAP or mainframe workflow runs through a pure web portal, a browser agent can drive it. But the SAP GUI Windows client, Oracle EBS forms, Citrix-published thick clients, and 5250/3270 green-screens are not web pages, and no Skyvern tier ships a Windows desktop runtime. That layer needs an agent that reads the OS accessibility tree, which is the gap Mediar fills.

How is Skyvern different from UiPath?

UiPath records and replays explicit steps and selectors; it breaks when the UI shifts and needs maintenance. Skyvern uses a model to look at the page and decide the next action, so it tolerates layout changes a recorded script would not. The trade is scope: UiPath has decades of desktop, Citrix, and mainframe connectors, while Skyvern is browser-native. The two are good at different halves of the RPA problem.

Does Skyvern being open source mean I can self-host it?

Yes. The Skyvern core is open source on GitHub (Skyvern-AI/skyvern) and can be self-hosted, with a paid cloud offering on top. Mediar similarly ships an open-source SDK called Terminator (mediar-ai/terminator) for teams that want to script desktop automation directly. Open source on its own does not change the surface either tool can reach.

When should I pick a browser agent like Skyvern over Mediar?

If the workflows you need to automate live entirely on the public or internal web (SaaS dashboards, web portals, e-commerce, public lookups), a browser-native agent is the right fit and Skyvern is one of the better ones. Pick Mediar when the work is stuck on Windows desktop apps with no usable API: SAP GUI, Oracle EBS, banking cores, EHR thick clients, or mainframe terminals.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.