Function-by-function reference

AI for enterprise functions, through the accessibility tree.

The orthodox answer for "automate enterprise functions" is one bot army per function. AP gets a UiPath fleet against SAP. Treasury gets a Power Automate flow against the banking core. Claims gets a separate set of selectors for the payer portal and another for Epic. Patient intake gets a vision model. Bank onboarding gets three vendor-specific RPA suites for Jack Henry, Fiserv, and FIS. The unifying observation is boring: every one of those legacy Windows apps publishes the same Microsoft UI Automation tree, and an AI agent reading that tree can drive the field directly the way a screen reader does. One input layer, every function.

Matthew Diakonov, Written with AI

Published May 7, 202612 min

Direct answer (verified 2026-05-07)

An AI agent serves an enterprise function by reading the same Microsoft UI Automation tree a Windows screen reader uses, then acting on the resolved element. ControlType (role) plus Name plus AutomationId is the addressing triplet. The tree is published by the OS for every UI thread, which is why one approach covers AP, Treasury, Claims, Patient Intake, Bank Onboarding, and IT ops without per-function selectors. The mechanism is documented at learn.microsoft.com/en-us/windows/win32/winauto/entry-uiauto-win32.

One input layer, not one bot per function

The reason most enterprise automation programs end up with a fleet of per-function bots is that each function's vendor speaks a different protocol. SAP has SAP GUI Scripting, but only on the ECC and S/4 frontend. The SAP Business One client does not have it. Jack Henry exposes batch interfaces but not the teller and middle-office screens ops actually uses. Epic has APIs for some operations and not others, and the chart UI is the union surface. So the historical answer was one specialist bot per surface, written by a different team.

What the accessibility tree changes is that the OS itself publishes a uniform structure for all of them. Open inspect.exe (it ships with the Windows SDK), hover any control inside any of those apps, and you will see ControlType, Name, AutomationId, BoundingRectangle, and IsKeyboardFocusable. The values populate for the SAP B1 A/R Invoice, for the Jack Henry teller window, for the Epic chart, for the Cerner patient header, for the Oracle EBS form, and for the Active Directory MMC. They populate because Windows generates the tree from the standard control palette every desktop app draws on top of.

That is the part competitor pages keep skipping. They argue for "AI for finance" or "AI for HR" without ever saying which input layer the agent reads from. If the agent reads from screenshots, the agent breaks at every DPI change and theme switch. If the agent reads from the accessibility tree, every function shares the same primitive contract: read role plus Name plus AutomationId, type or click, read back the result. AP, Treasury, and Claims look the same to the runtime.

THREE FUNCTIONS, ONE INPUT LAYER

What the tree exposes, function by function

The matrix below is what you would actually see in inspect.exe for each function's primary app, with the locator that an accessibility-API runtime would use. The left column is the historical per-function-bot answer. The right column is the same workflow expressed in role plus Name plus AutomationId.

Feature	Per-function RPA bot	Accessibility-tree runtime
AP function: invoice lines into SAP A/R Invoice or SAP B1 line item grid	One bot per form, one selector library per SAP version, breaks when 9.3 to 10.0 reflows the matrix	ControlType.Edit nodes named Customer, Posting Date, Item No., Quantity. AutomationId stable across patches.
Treasury function: cash position and wire posting on a banking core	Pixel matchers against a fixed-size teller window, OCR for the confirmation number	ControlType.Edit nodes for Account, Amount, Beneficiary; ControlType.StatusBar for the confirmation id, read with get_text
Claims function: claim line submission against a payer portal or Epic chart	Browser automation for the portal, RDP plus screen scraping for the chart, two separate bots	Same UIA input layer, role:Edit and role:ComboBox in both the chart and the portal, one workflow file
Patient intake: demographics from a PDF into the EHR scheduling form	Form recognizer plus pixel-coordinate type-into, fragile across DPI and theme changes	PDF extraction step, then locator(name:First Name\|role:Edit).typeText, AutomationId-first resolution
Bank onboarding: KYC packet across Jack Henry, Fiserv, FIS terminals	Three vendor-specific RPA suites, three license stacks, three teams of certified developers	One Mediar workflow per terminal, all driven through the same UIA primitives, no vendor SDKs
IT operations: user provisioning across an Active Directory MMC plus a homegrown Win32 admin app	Selector-based RPA against MMC, custom HTTP scripting for the admin app	Both render through Windows UIA, both addressable as role + Name + AutomationId, one bot

The anchor: every step is a UIA call with a structured result

The reason the function-by-function story holds together at runtime is that every step the executor runs is a single UIA primitive, and every step produces a StepResult that the audit layer can read. Below is the recorder log shape (what the agent captured during the user's one-shot recording) and the StepResult shape (what the executor stores per step at runtime). Both come straight out of the public repos.

terminator + executor (Rust)

The recorder log captures the field_name (the UIA Name property of the control under focus), the field_type (UIA ControlType), the text_value, and the keystroke_count. None of those come from the keystroke stream alone; they come from joining keystrokes against the accessibility node that owns the focus. That join is what makes the recorded workflow portable across DPI changes and theme switches.

At runtime, every step the executor runs lands in a StepResult row with step_id, tool_name (which UIA primitive ran), status, the structured result, the error string, the duration in milliseconds, and the retry count. The execution row itself stores trace_id, client_id, error_category (Infrastructure or WorkflowLogic), and a screenshot per step. For an AP function moving cash entries, the audit team can replay any execution as a sequence of role plus Name interactions and confirm no step touched a field outside the workflow's declared schema.

$750K/yr

“Claims intake at one mid-market carrier went from 30 minutes per claim to 2 minutes. That is $750K per year, not a press-release number, that is their AP-team headcount math.”

Mediar deployment, mid-market insurance carrier, 2025-2026

What this looks like when you cross two functions in one workflow

The interesting case is when a workflow legitimately crosses functions. A claims-intake automation reads a PDF from email (information services), parses fields out of the PDF (a model step), enters the claim into the carrier's claims system (claims function), then writes the resulting claim id back into a finance tracker spreadsheet (AP function), then drops a Slack message into the carrier's ops channel. Three classical RPA platforms would need three connectors and a glue script. With the accessibility tree as the shared input layer, every desktop step is the same primitive: resolve a UIA element by Name plus role, type or click, read the result.

The same is true for bank onboarding across Jack Henry, Fiserv, and FIS. A typical onboarding packet touches all three: the customer record sits in the core, the IRA sits in a Fiserv module, the loan sits in FIS. Historically that is three vendor-specific bots, three license stacks, three teams. With UIA, the runtime treats them as three windows in the same tree. The role plus Name resolution does not care which vendor compiled the binary that drew the form.

What changes for the buyer is the org-chart unit that owns the workflow. Instead of routing every cross-function automation through an RPA Center of Excellence that has to load-balance specialists, the function head can own the workflow end to end. The AP lead writes the AP step. The claims lead writes the claims step. They share a runtime and a recording app. The CoE shifts from being a bottleneck to being a platform team.

Walk me through one of your enterprise functions

Bring an app, a recorded workflow, or a screenshot of inspect.exe against your form. We will tell you in 20 minutes whether the accessibility tree covers your function and what the locator strategy would look like.

Frequently asked questions

What does 'enterprise functions' mean here, and why does accessibility matter for them?

By enterprise functions I mean the org-chart units that own a class of repetitive desktop work: AP, Treasury, Claims, Patient Intake, Bank Onboarding, IT Operations, HR. Each one has at least one legacy Windows desktop app at the center of its workflow, and most of those apps never got a public API. Accessibility matters because Windows publishes a UI Automation tree for those apps automatically. The tree is what screen readers consume. It exposes ControlType (role), Name, AutomationId, and patterns like ValuePattern and SelectionPattern. An AI agent reading off the same tree can drive the field deterministically, regardless of which function the workflow belongs to. That is why one input layer can replace one bot army per function.

Is this just RPA with a different name?

No. Traditional RPA tools resolve elements through brittle selectors against a recorded path or, when those break, through pixel matchers and OCR. Both approaches assume the agent's input layer is the rendered surface of the app. The accessibility-tree approach assumes the input layer is the published structure underneath the rendered surface, the same structure a Windows screen reader uses. The practical difference is what survives a patch. If a B1 9.3 to 10.0 upgrade reflows the line item matrix, a pixel-matcher bot breaks. The UIA element with name 'Item No.' and AutomationId still resolves on the new layout, because the form definition keeps the field's identity even when the geometry changes.

Why not just use APIs for each enterprise function?

Where APIs exist, use them. The question this page answers is what to do when they do not. AP into SAP B1 on Microsoft SQL Server has no Service Layer. Mainframe terminals running 3270 emulators have no REST. Jack Henry, Fiserv, and FIS expose batch interfaces but the day-to-day teller and middle-office screens that ops actually uses do not. Epic has APIs for some operations and not others, and the chart UI is the union surface for the ones that are missing. The accessibility tree is the universal fallback because Windows publishes it for every UI thread, whether the app shipped in 1997 or last week.

What does the AI part actually do?

Two things, in different parts of the system. At record time, an LLM watches the user perform the workflow once, attaches semantic field names to the captured UIA events, generates the input schema, and writes the workflow file. At runtime, the executor mostly does deterministic UIA calls (type_into_element, click_element, set_value, get_text). The model only re-enters the loop when something deviates: a new dialog appears, a field name changed, a validation message blocked the path. That is when the agent reads the current tree, decides which branch to take, and proceeds or escalates. The deterministic core is what makes the runtime auditable, and the model layer is what lets the bot self-heal instead of failing on every UI change.

How is each step audited?

Every step the executor runs produces a StepResult: step_id, tool_name (the UIA primitive that ran), status (Pending, Running, Success, Failed, Skipped, Retrying), the structured result, the error string if any, duration in milliseconds, and the retry count. The struct is defined at crates/executor/src/models/execution.rs lines 75-95. The execution row stores the full step list as jsonb under execution_logs alongside a screenshot per step, the trace_id, the client_id, the started_at and completed_at timestamps, and the error_category (Infrastructure or WorkflowLogic). For an AP function moving cash entries, the audit team can replay any execution as a sequence of role plus Name interactions, compare to the original recording, and confirm that no step touched a field outside the workflow's declared schema.

Which functions actually pay back the investment first?

The honest order, from real deployments: insurance claims intake (one mid-market carrier went from 30 minutes per claim to 2 minutes, $750K per year), bank onboarding across Jack Henry or Fiserv terminals (8 weeks to 2 weeks per onboard), patient intake on Epic or Cerner (one regional health system at $210K per year), AP cash posting in SAP B1 (an LG-customer F&B chain saving 70 percent on costs versus their previous UiPath spend). The pattern is: high volume, high-touch, no good API, and a function head who can name the dollar value of an hour saved. Treasury and IT ops are real but tend to ship later because the volume per workflow is lower.

Can my team verify the accessibility surface for our function before we commit?

Yes, in 30 seconds per app. inspect.exe ships with the Windows SDK at C:\Program Files (x86)\Windows Kits\10\bin\10.0.x.x\x64\inspect.exe. Open it, set Hover Mode, launch your app, and walk through the form your function uses. For most enterprise apps you will see ControlType.Edit on the editable fields, populated Name properties pulled from the form definition, and AutomationId values on most controls. If those properties are populated, any UIA-based runtime can drive them deterministically. Where a node shows ControlType.Unknown, the runtime falls back to position relative to a parent window plus visible label, which is the only path that resembles classic OCR.

Where does the open-source piece end and the commercial piece begin?

The execution layer is open source under MIT at github.com/mediar-ai/terminator. It exposes the MCP tools (type_into_element, click_element, set_value, get_text, run_command) that the workflow file references. A team can drive any Windows desktop app through Terminator alone, no Mediar cloud, no orchestrator, just the UIA primitives and a Rust binary. The recording app, the cloud orchestrator, the no-code workflow builder at app.mediar.ai/web, the SOC 2 control plane, and the audit trail are commercial. The split is intentional: an enterprise security team can read the source of the runtime that touches their desktops before signing the order form.

How does this compare to the other Mediar pages on accessibility?

The sibling pages on this site go deep on a single surface. The SAP Business One page walks through inspect.exe against the B1 client and shows the locator strategies for ControlType.Edit and the line item matrix. The legacy desktop systems page argues why the accessibility tree is the right fallback when no API exists. This page is the function-level synthesis: the same accessibility tree, applied to AP, Treasury, Claims, Patient Intake, Bank Onboarding, and IT ops, with the role plus Name plus AutomationId triplet you would actually use for each. Read this if you are mapping enterprise function ROI; read the other pages if you are doing the implementation for one specific app.

What can break this approach?

Three things, in order of frequency. First, an app that renders into a pure DirectX or canvas surface and never registers a UIA provider; you fall back to OCR for that app, which is the worst case but rarely the whole workflow. Second, an app that publishes UIA but with empty Name and missing AutomationId on the field you care about; you have to use position relative to the parent window, which is more brittle than role plus Name. Third, a major version that renames the field across the board (B1 9.3 to 10.0 reflowed the line item grid). The runtime emits structured failures on the unresolved step so a human can re-record that branch in the recording app, instead of silently retrying. For a posting function like AP cash, silent retry is disabled by default because half-posting an invoice is worse than not posting it.

Adjacent reading