Sola RPA, explained
Sola RPA reads the screen with computer vision. The interesting question is what your data lives in.
Direct answer · verified 2026-06-17
Sola is an AI-native RPA platform. You record a workflow once on your screen, and Sola rebuilds it as a bot using, in its own words, a combination of large language models and computer vision, replaying the work at the UI level across browser and desktop apps. It sells on time-to-value and a no-code editor, and it does not publish pricing.
Source: sola.ai, checked 2026-06-17. The rest of this page is about the one mechanism that decides where a vision-based bot holds up and where reading the accessibility tree is the only thing that works.
What Sola is, without the spin
Sola is a credible, modern take on RPA. The old generation (UiPath, Automation Anywhere, Blue Prism) made you build a workflow by hand in a designer, wiring up selectors and image anchors step by step. Sola flips that: you do the task once, it watches, and it produces a bot. Large language models read the intent of what you did, computer vision locates the elements on screen, and the result runs without API access across the tools an office actually uses. It is a no-code editor pointed at business users, and it leans hard on getting from a recording to a working bot in minutes.
Every guide that currently ranks for this topic stops there: the recording UX, the no-code editor, the customer logos, the time-to-value pitch. None of them name the thing that actually decides whether a bot survives in production, which is what surface it reads to find a field. That is the whole subject below.
Two ways to read a screen
There are only two places a UI automation tool can look to find the box it needs to type into. It can look at the pixels, the rendered image of the screen, and recognize the field visually. Or it can look at the accessibility tree, the structured list of controls every well-behaved application publishes for screen readers and other assistive tech. Sola is in the first camp. Mediar is in the second.
Pixels (vision + recording)
Recognize the field by how it looks
- Works without any API, on almost anything that renders to a screen.
- Strong on modern web apps with clear, consistent visual layouts.
- Re-recognition needed when the layout, theme, or resolution shifts.
- Blind to anything not currently painted in the viewport.
Accessibility tree (Mediar)
Look the field up by its role and name
- Works without any API, by reading what the app already exposes.
- Strong exactly where pixels are weak: dense legacy desktop UIs.
- Self-heals on relabeled fields and moved buttons, no re-recording.
- Sees off-screen and collapsed controls the viewport never paints.
On a clean modern SaaS app, both approaches land in roughly the same place. The divergence shows up the moment your workflow touches software written before APIs were the default.
The anchor fact: how a Mediar step actually resolves a field
Here is the part you can verify yourself. When Mediar runs a recorded step against a SAP GUI screen, it never asks “which pixels look like a text box?” It asks the accessibility tree for the control whose name is Vendor Code, gets the node back, sets its value, and then re-reads the tree to confirm the state changed before moving on.
One recorded step, executed via the accessibility tree
You do not have to take this on faith. The engine that reads and drives that tree is open source: the Terminator SDK at github.com/mediar-ai/terminator resolves a control by role and name, not by coordinates. And before you automate anything, you can dump the same tree Mediar will read using Microsoft's Accessibility Insights or the classic Inspect.exe, and see for yourself whether the field you care about is named in the tree or only visible as pixels.
Four places a vision-based bot loses the thread on legacy desktop
None of these are knocks on Sola for the surface it was built for. They are simply the spots where recognizing a field from its rendered image is the harder bet, and where reading the published tree is the safer one.
SAP GUI and mainframe green-screens
Computer vision has to recognize a field by how it looks. A SAP GUI transaction screen, or a 3270 terminal emulator, is a dense grid of monospace text with none of the visual affordances a vision model was trained on. The accessibility tree names every field SAP already publishes to assistive technology, so the agent reads them as data, not as a picture of data.
Quarterly UI refreshes
When a button moves forty pixels or a label shifts color, a pixel or vision matcher has to be re-recorded or re-trained. An accessibility-tree agent looks the control up by its role and name, so a button that moved is still the same node.
Off-screen and scrolled content
A vision pass only sees what is painted in the viewport. Fields below the fold, collapsed panels, and virtualized lists are invisible until you scroll them into frame. The tree exposes those controls whether or not they are currently rendered.
Citrix and RDP sessions
Inside a published Citrix or RDP window, the pixels a vision model would read are a re-encoded, compressed video stream of a remote desktop. Running as a Windows agent inside the session lets Mediar read the application's own tree and sidestep the image entirely.
What this is worth when the legacy layer is the bottleneck
The reason any of this matters is that the expensive, stuck workflows in most enterprises are precisely the ones sitting on no-API desktop software. These are real deployments, with the workflow named and the math shown, not press-release numbers.
“We moved an F&B chain running SAP B1 off UiPath. Their CFO told the board they are now saving 70 percent on costs.”
Mediar deployment, F&B chain on SAP Business One
For a longer read on the line where browser and pixel automation stops and the desktop tree begins, see why legacy desktop apps with no API are the real moat and the maintenance math in what brittle selectors cost a UiPath program.
When Sola is the right call
If your workflows live in modern web apps with clean layouts, and your team wants a no-code recorder that a business user can own end to end, Sola's vision-plus- recording model is a reasonable choice, and its time-to-value pitch is real on that terrain. The honest dividing line is not which tool is “better” in the abstract. It is whether the systems you are trying to automate publish a usable accessibility tree, and whether your pixels are stable enough to recognize from one quarter to the next.
If your data is stuck in SAP GUI, a mainframe terminal, a banking core like Jack Henry or Fiserv or FIS, or an EHR like Epic or Cerner, the accessibility-tree approach is not a nicety. It is usually the only thing on screen that is reliable.
Bring one legacy workflow to the call
Show us the SAP GUI screen or green-screen terminal that broke your last RPA project, and we will tell you on the call whether the accessibility tree can drive it.
Questions people ask about Sola RPA
What is Sola RPA?
Sola is an AI-native robotic process automation platform from Sola (sola.ai). You record a workflow once on your screen and Sola converts that recording into a bot. By its own description it interprets what you did using a combination of large language models and computer vision, then replays the work at the UI level across browser and desktop applications. It is positioned on time-to-value (recording to a working bot in minutes) and a no-code editor so business users can build and maintain their own automations.
How does Sola actually drive an application?
At the UI level, the way a person would: it looks at the screen and replays clicks and keystrokes. Sola's site describes bots that visually interact with screens and applications, using LLMs plus computer vision to interpret behavior. That is a vision-and-recording approach, which is why it works without API access and across many off-the-shelf tools.
How much does Sola cost?
Sola does not publish pricing. As of 2026-06-17 there is no public pricing page on sola.ai; it is sold as an enterprise product with custom quotes. If you need a published per-unit number to compare against, Mediar lists $0.75 per minute of runtime with a $10,000 turn-key program fee that converts to credits, and no per-seat licensing.
How is Mediar different from Sola?
The mechanism. Sola reads the screen with computer vision over pixels. Mediar reads the Windows accessibility tree, the same interface a screen reader uses, where every field and button the app publishes is named data rather than an image to be recognized. On modern web SaaS the two approaches look similar. On no-API legacy desktop software (SAP GUI, mainframe terminals, Jack Henry, Fiserv, FIS, Epic, Cerner, Oracle EBS) the tree is usually present where pixels are unreliable.
Does the accessibility-tree approach work when there is no API at all?
Yes, that is the point of it. The accessibility tree is published by the application to assistive technology regardless of whether the app exposes an API, an SFTP feed, or any integration. Mediar reads the fields and buttons already in that tree and types into them. You can confirm a given app has a usable tree yourself with Microsoft's Accessibility Insights or Inspect.exe before you automate anything.
What happens when the interface changes?
An agent that locates controls by role and name self-heals on relabeled fields and moved buttons, because there are no recorded pixel coordinates or brittle selectors to break. A vision or screen-recording bot generally needs to be re-recorded or re-trained when the layout shifts. This is the maintenance line where RPA programs usually leak budget over time.
Is Mediar open source?
The engine is. The Terminator SDK that reads and drives the accessibility tree is open source at github.com/mediar-ai/terminator, so a team can read exactly how a control is resolved and extend it for custom workflows. Mediar itself ships as a Windows desktop application plus a no-code web app at app.mediar.ai/web.
Keep reading
Legacy desktop apps with no API are the real moat
Why the workflows nobody can automate are the ones sitting on software written before APIs existed.
What brittle selectors cost a UiPath program
The maintenance line where traditional RPA quietly leaks its budget, and where self-healing changes the math.
RPA on legacy desktop with accessibility-tree agents
How agents that read the OS accessibility tree drive apps that expose no API at all.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.