OS-level reference

OS-level accessibility automation, for the enterprise apps that have no API.

When an enterprise says "OS-level" in this market, the specific layer being named is Microsoft UI Automation (UIA), the accessibility framework Windows publishes for every UI thread. UIA is what Narrator reads. It is also the surface a deterministic automation runtime addresses to drive SAP GUI, mainframe terminals, Jack Henry, Fiserv, FIS, Epic, Cerner, and the long tail of internal Win32 admin tools that never got a public interface. This page walks through the actual primitives an enterprise workflow is built on, taken straight from the open-source Selector API at github.com/mediar-ai/terminator.

Matthew Diakonov, Written with AI

Published May 10, 202611 min

Direct answer (verified 2026-05-10)

OS-level accessibility automation for enterprise is automation that drives Windows desktop applications through the operating system's UI Automation (UIA) framework, the same layer screen readers consume. A workflow reads structured properties Windows already publishes for every UI control (Name, ControlType, AutomationId, ClassName, plus a small set of patterns) and acts on the resolved element. The mechanism is documented at learn.microsoft.com/en-us/windows/win32/winauto/entry-uiauto-win32. The reason it matters at enterprise scale: SAP GUI, SAP B1, Oracle EBS, mainframe 3270 emulators, Jack Henry, Fiserv, FIS, Epic, and Cerner all publish UIA, and UIA is identical inside RDP and Citrix. One primitive set covers every desktop the function uses.

The OS already publishes the contract. The runtime just reads it.

A standard Windows desktop app does not need to do anything special to be addressable. When it draws a button or an editable field on top of the standard control palette (Win32, .NET WinForms, WPF, MFC, Delphi VCL), the operating system generates a UI Automation node for it automatically. That node carries ControlType, Name, AutomationId, and a set of patterns describing what you can do with it (read its value, set its value, click it, expand it, select an item from it). The form designer who built the app twenty years ago did not opt in. UIA existed at runtime regardless.

What inspect.exe shows you, hovering over the Customer field on a SAP B1 A/R Invoice form, is the contract the OS has already published for you. An OS-level automation runtime keys on those properties; that is the entire idea.

inspect.exe -- live UIA dump

Every line in that dump is a property Windows publishes for free. The runtime never has to look at a screenshot to know that this rectangle is an Edit control named Customer. The vendor never had to expose an API. The form definition the SAP team wrote against the WinForms control library is what populates Name and AutomationId, and that stays stable across patches because the form definition is what does not move.

The seven element primitives an enterprise workflow is built from.

The Selector API in Terminator (open source, MIT) wraps the UIA property surface as seven element primitives plus three structural scopes. This is the entire authoring surface. There is no DSL, no per-app SDK. The same primitives address fields on SAP B1, on a Reflection 3270 emulator, on the Epic chart, and on a homegrown Win32 admin tool.

Element primitives, in resolution-priority order

Selector.name("Customer")

Match by the UIA Name property. This is the human-visible label Windows hands to screen readers. For SAP GUI, Oracle Forms, mainframe terminals, and banking cores, the Name comes from the underlying form definition, so it stays stable across themes, DPI, and most patches.

Selector.role("Edit", "Customer")

Match by ControlType plus optional Name. ControlType maps to UIA's typed surface: Edit, Button, ComboBox, DataItem, Document, ListItem, Pane, Tab, Window, plus the rest of the UIA control palette. Role is the property that survives a vendor reskin where the label moved but the control kind did not.

Selector.nativeId("49152")

Match by the UIA AutomationId. This is the runtime identifier the form definition assigned to the control. WinForms apps publish stable AutomationIds across patches, which is why a B1 9.3 to 10.0 upgrade can reflow the line item grid and a workflow keyed on AutomationId still resolves the right field.

Selector.className("WindowsForms10.EDIT.app.0.bf7d44_r6_ad1")

Match by the Win32 window class name. Useful as a tiebreaker when a form has many controls with the same Name (line item grids, repeating subforms). For Win32, .NET WinForms, WPF, and MFC apps the class name is what the framework registered with the OS.

Selector.text("Posting Date")

Match by displayed text. Different from Name when the control has no accessible label but the rendered text is what the user reads. Used as a fallback for older Win32 controls where the form designer never populated AccessibleName.

Selector.path("/Pane[1]/Tab[2]/Edit[3]")

Match by an XPath-style structural path through the UIA tree. Used when a control has no stable Name or AutomationId at all. This is the most brittle primitive on purpose, the runtime prefers Name and AutomationId first and only falls back to path when both are empty.

Selector.attributes({ HelpText: "Customer code" })

Match by an arbitrary UIA property bag. UIA exposes HelpText, AcceleratorKey, AccessKey, ItemType, ItemStatus, and several dozen other properties. Any of them can be the discriminator when Name and AutomationId are not enough.

The runtime resolves in the order shown. Selector.name and Selector.role are the workhorses; Selector.nativeId is the tiebreaker that survives upgrades; Selector.className, Selector.text, Selector.path, and Selector.attributes are fallbacks for old C++/MFC and DirectX-flavored apps where the higher-level properties are not populated. You can read every one of these factory methods at github.com/mediar-ai/terminator.

Three structural scopes that make the workflow safe in production.

A field by itself is not enough. An enterprise desktop typically has twelve windows open across six processes; an automation that reads the wrong tree will type a customer code into Excel because the user tabbed away. The Process and Window scopes are how the workflow stays pinned to the right tree, and Chain composes them so the final locator is unambiguous.

Scopes, every workflow uses these

Selector.process("sap.exe")

Scope the search to a specific Windows process. The first link in any enterprise locator chain. Pinning to a process ID stops a workflow from accidentally reading another app's tree when the user has SAP B1, Oracle, and Excel open at the same time.

Selector.window("A/R Invoice")

Scope to a specific top-level window inside that process. Internally this is a Role("Window") with a Name match. Required when the same process owns multiple form windows that share field names (B1's A/R Invoice and A/P Invoice both have a Customer/Vendor field with the same Name).

.chain(...)

Compose Process to Window to Element. The chain is what makes a workflow safe in a multi-window enterprise desktop, including RDP and Citrix sessions where a single Windows account is hosting several long-lived applications. Each step in the chain narrows the UIA search; the final step is the field you act on.

The combined locator that an AP function actually writes against B1 looks like this in the open-source SDK:

// AP cash posting on SAP B1 A/R Invoice, Customer field
const target = Selector
  .process("sap.exe")
  .chain(Selector.window("A/R Invoice"))
  .chain(Selector.role("Edit", "Customer"));

await desktop.locator(target).typeText(invoice.customerCode, {
  clearBeforeTyping: true,
});

That fragment is what survives a B1 9.3 to 10.0 upgrade where the A/R Invoice line item grid was reflowed. The Customer field kept its Name, kept its AutomationId, kept its ControlType. The pixel coordinates moved by 84 pixels. The selector did not.

What the OS publishes for the enterprise stack you actually run.

The reason the same primitive set works across the enterprise stack is that all of these vendors render through the standard Windows control palette. The OS publishes UIA for every one of them, with differing levels of completeness on Name and AutomationId, but all deterministic enough that a workflow can resolve the field without looking at pixels.

Enterprise apps with first-class UIA coverage

SAP GUI / SAP B1

WinForms-based clients publish full UIA tree with stable AutomationIds.

Oracle EBS Forms

Java-based EBS forms expose the Java Access Bridge, which Windows surfaces via UIA.

Mainframe 3270 / 5250

Most modern emulators (Reflection, Rumba, BlueZone) ship UIA providers for screen fields.

Jack Henry / Fiserv / FIS

Banking core teller and middle-office screens render through Win32, fully UIA-addressable.

Epic / Cerner / eClinicalWorks

EHR Hyperspace and PowerChart panes publish ControlType and Name for chart fields.

Active Directory MMC

MMC snap-ins are pure UIA; AutomationId on tree nodes survives Windows Server upgrades.

Excel / Word / Outlook

Office apps expose UIA for cells, ribbon, and message panes alongside their COM model.

Custom Win32 / .NET / WPF

Anything built on the standard control palette gets a tree the OS publishes for free.

Anchor fact

Seven element primitives, three scopes, four prefixes for the fallback path.

That is the whole authoring surface. You can read it in a single file: src/selector.rs in the open-source Terminator repo, roughly 158 lines. The runtime supports a clustered tree at apps/desktop where the SDK fuses UIA with optional DOM, Omniparser, and Gemini Vision detection, prefixed in the dump as #u, #d, #p, and #g respectively. A workflow targets #u (UIA) by default and only uses the vision prefixes when UIA returns nothing.

The point is that the surface is small enough to audit before a security review signs off, and the prefix system makes the non-deterministic vision fallback visible in every step trace, not buried.

$750K/yr

“Claims intake at one mid-market carrier went from 30 minutes per claim to 2 minutes. That is $750K per year, not a press-release number, that is their AP-team headcount math.”

Mediar deployment, mid-market insurance carrier, 2025-2026

Why this matters for the RPA Center of Excellence specifically.

A typical RPA Center of Excellence has between 40 and 200 production bots, half of which are flagged red on the maintenance dashboard at any given time because a vendor patch reflowed a form. The cost is not the license; it is the FTEs assigned to keep the bots green. An OS-level layer changes the maintenance shape. A Name plus AutomationId locator does not break when the Customer field moves from row 4 to row 6 of the A/R Invoice grid; the pixel matcher does. A UIA tree is identical inside an RDP session and on the host; the coordinate-driven script is not. The audit log records typed UIA calls with named elements, not click-at-(x, y) events; the compliance review is read in hours instead of weeks.

The honest framing is that OS-level is not a marketing claim. It is a literal property of where the runtime lives in the stack. UIA is a named, public Windows API documented by Microsoft. Either your runtime addresses it, or it addresses pixels and recorded paths. Two different things. Most enterprise outages on classical RPA come from conflating them.

Mediar runs at the UIA layer, with vision as a marked fallback. The execution layer is open source. The cost is $0.75 per minute of runtime, with a $10,000 turn-key program fee that converts to credits. If you want to walk through your specific stack (which apps, which RDP/Citrix shape, which Center of Excellence maintenance load), book a call below.

Walk through your enterprise stack on a 30 minute call.

Bring two or three workflows you would want to ship in the first month. We will walk through what the UIA tree looks like for each app, where Name and AutomationId are populated, and where the vision fallback would kick in.

Frequently asked questions

What does "OS-level accessibility automation" actually mean?

It means the runtime drives Windows applications by reading the UI Automation (UIA) tree the operating system publishes for every UI thread, then acting on the resolved element. UIA is the same accessibility framework Narrator and other screen readers consume. The tree exposes ControlType, Name, AutomationId, ClassName, BoundingRectangle, IsKeyboardFocusable, and a set of patterns (ValuePattern, SelectionPattern, InvokePattern, TogglePattern). An automation runtime that addresses elements through UIA reads structured data the OS already exposes, instead of inferring structure from pixels or recorded coordinates. Microsoft documents the framework at learn.microsoft.com/en-us/windows/win32/winauto/entry-uiauto-win32.

Why does enterprise specifically need this layer? What about browser agents and pixel matchers?

Enterprise workflows live on apps that have no public API: SAP GUI, SAP B1, Oracle EBS, mainframe 3270 emulators, Jack Henry, Fiserv, FIS, Epic, Cerner, eClinicalWorks, custom Win32 admin tools. Browser-based agents cannot see those apps at all. Pixel matchers can, but they break on every theme change, DPI change, and minor UI patch, and they cannot survive a Citrix session resize. UIA is the layer that exists for all of these apps, and the tree is identical regardless of resolution or scaling. That is what makes the same workflow run inside an RDP window, on a 4K monitor, on a published Citrix app, and on a developer laptop with no rewrites.

How is this different from RPA tools like UiPath or Power Automate Desktop?

Both UiPath and Power Automate Desktop have UIA backends; the difference is what they make you key your workflow on. The classical RPA recorder generates a recorded path through the UIA tree (visual ancestor chain plus index) and falls back to image matching when the path breaks. That is why a B1 9.3 to 10.0 upgrade typically requires re-recording. The Mediar approach keys on Name plus AutomationId first, falls back to ClassName, and only uses a structural path when nothing else is populated. The Selector primitives that express this are open source under MIT at github.com/mediar-ai/terminator/blob/main/src/selector.rs, so the resolution logic the runtime uses against your enterprise apps is auditable.

What if a control has no AutomationId or empty Name? Some legacy apps are sloppy.

Real answer: most enterprise Win32 and WinForms apps publish populated AutomationIds because the form designer auto-assigned them at compile time. The exceptions are old C++/MFC apps with hand-coded HWNDs and apps that render into a DirectX or canvas surface and never register a UIA provider. For sloppy-but-present cases, the runtime falls back through the primitive ladder: Name, then Role plus Name, then ClassName, then Path, then Attributes (HelpText, ItemStatus). For pure render-only apps, the clustered tree in the open-source SDK fuses UIA with optional vision detection from Omniparser or Gemini Vision; UIA elements get the #u prefix, vision elements get #p or #g. The workflow can target a vision element only when UIA returns nothing, instead of using vision for everything.

Does this work in RDP and Citrix?

Yes, that is one of the load-bearing reasons enterprises use this layer. UIA is published by the OS hosting the application, not by the client connecting to it. When a workflow runs inside the RDP or Citrix session (which is how Mediar deploys for banks and insurance carriers), it reads the UIA tree of the apps inside the session. The Citrix HDX wrapper does not introduce a pixel layer the runtime has to interpret. This is why pixel-matcher RPA struggles in Citrix farms: the same workflow that works on the RDP host fails on a published app because the framing pixels changed. UIA is identical in both.

What does the audit trail look like for SOC 2 and HIPAA reviews?

Every step the executor runs lands in a structured StepResult: step_id, tool_name (the UIA primitive), status (Pending, Running, Success, Failed, Skipped, Retrying), the structured result payload, the error string if any, duration in milliseconds, and retry count. The execution row stores the full step list as JSON with screenshots per step, the trace_id, the client_id, and an error_category that distinguishes Infrastructure failures from WorkflowLogic failures. For an auditor, the value of an OS-level layer is that every step is a typed UIA call against a named element, not a click at (x, y). The audit replay reads the same Name and AutomationId an automation engineer would have written into the workflow, which is what compliance reviewers can actually verify against a process narrative.

How do I confirm my own enterprise apps publish a usable UIA tree before any commitment?

30 seconds per app. inspect.exe ships with the Windows SDK at C:\Program Files (x86)\Windows Kits\10\bin\10.0.x.x\x64\inspect.exe. Open it, set Hover Mode, launch your app, walk through the form your team uses every day. For most enterprise apps you will see ControlType.Edit on editable fields, populated Name properties pulled from the form definition, and AutomationId values on most controls. If those properties are populated, any UIA-based runtime can drive them deterministically. Where Name is empty and AutomationId is missing on a control you need, that is the part where you would write a Selector.attributes or Selector.path fallback, or escalate to the vision overlay. The pre-pilot inspect.exe walk takes an hour and tells you exactly what surface area is going to be deterministic versus probabilistic.

Where does the open-source piece end and the commercial piece begin?

The execution layer is open source under MIT at github.com/mediar-ai/terminator. That is the package that wraps UIA, exposes the Selector primitives shown on this page, and runs the deterministic actions (type_into_element, click_element, set_value, get_text, run_command). A team can drive any Windows desktop app through Terminator alone with no Mediar cloud. The recording app, the orchestrator, the no-code workflow builder at app.mediar.ai/web, the SOC 2 control plane, and the audit trail are commercial. Pricing is $0.75 per minute of runtime and a $10,000 turn-key program fee that converts to credits.

What can break this approach?

Three failure modes, in order of frequency. First, an app that renders into pure DirectX, GPU canvas, or an embedded Chromium with no accessibility bridge enabled, and never registers a UIA provider. The fallback is the vision overlay, which is the worst case but rarely the whole workflow. Second, an app that publishes UIA but with empty Name and missing AutomationId on the field you care about. The fix is one of the lower primitives (ClassName, Path, Attributes) or a re-record against the parent window. Third, a major version upgrade that renames the field across the board (B1 9.3 to 10.0 reflowed the line item grid). The runtime emits structured failures so a human can re-record that one branch in the recording app, instead of silently retrying. For posting workflows like AP cash, silent retry is disabled by default because half-posting an invoice is worse than not posting it.

How is this priced for an enterprise pilot?

$0.75 per minute of workflow runtime, no per-seat licensing, no per-bot licensing. The $10,000 turn-key program fee covers a dedicated implementation engineer for the pilot and converts to runtime credits with a bonus, so it is effectively prepaid usage. The math an enterprise can run before the call: pick the workflow with the highest weekly volume that lives on a UIA-addressable app, multiply minutes saved per run by runs per week, divide by your current FTE-hour cost, and compare to runtime cost at $0.75 per minute. The deployments where this approach has shipped (insurance claims at 30 minutes to 2 minutes per claim for $750K per year, bank onboarding at 8 weeks to 2 weeks, F&B SAP B1 chains at 70 percent versus prior UiPath spend) all started from that arithmetic.

Adjacent reading on the same accessibility-tree input layer.

Architecture

AI agents for legacy desktop systems with no API

Why the accessibility tree is the universal fallback when an enterprise app never got a public interface, and what that means for the architecture.

Read

Function-by-function

AI for enterprise functions, through the accessibility tree

Same UIA layer, mapped to AP, Treasury, Claims, Patient Intake, Bank Onboarding, and IT operations as concrete locator triplets.

Read

Governance

Enterprise AI agent governance on legacy systems

What an audit trail looks like when every step is a typed UIA call: trace_id, error_category, screenshot per step, and the StepResult shape behind them.

Read

The OS already publishes the contract. The runtime just reads it.

The seven element primitives an enterprise workflow is built from.

Element primitives, in resolution-priority order

Selector.name("Customer")

Selector.role("Edit", "Customer")

Selector.nativeId("49152")

Selector.className("WindowsForms10.EDIT.app.0.bf7d44_r6_ad1")

Selector.text("Posting Date")

Selector.path("/Pane[1]/Tab[2]/Edit[3]")

Selector.attributes({ HelpText: "Customer code" })

Three structural scopes that make the workflow safe in production.

Scopes, every workflow uses these

Selector.process("sap.exe")

Selector.window("A/R Invoice")

.chain(...)

What the OS publishes for the enterprise stack you actually run.

Enterprise apps with first-class UIA coverage

Seven element primitives, three scopes, four prefixes for the fallback path.

Why this matters for the RPA Center of Excellence specifically.

Walk through your enterprise stack on a 30 minute call.

Frequently asked questions

Related

AI agents for legacy desktop systems with no API

AI for enterprise functions, through the accessibility tree

Enterprise AI agent governance on legacy systems