Eight questions for the controller's desk

What a finance AI agent actually does on a legacy Windows desktop

You saw a tweet. You run AP, AR, treasury, or the close. Most of your workflows still live in SAP GUI, Oracle EBS, or a Jack Henry teller window that has no real API. Here is what an accessibility tree agent reads on each of those screens, what one posting costs on the meter, and the workflows you should not give it.

Matthew Diakonov, Written with AI

Published May 7, 20269 min

Direct answer (verified 2026-05-07)

Yes, an AI agent can run the recording-shaped repetitive parts of your finance team's legacy desktop work today. Cash app, vendor invoices, vendor master, GL pulls, three-way match, ACH origination, teller entries: an accessibility-tree agent reads and writes the exact same fields a clerk would, on the exact same SAP, Oracle, or Jack Henry windows, deterministically, at roughly $0.05 to $1.50 of runtime per posting at list rate.

It does not replace judgement work: exception triage, controller signoff, fraud screening, or anything where the right answer depends on knowing why a number looks off. The reference implementation, including the deterministic runtime, is open source under MIT at github.com/mediar-ai/terminator. The billing pipeline that turns runtime telemetry into a per-workflow unit cost is one route handler at apps/web/src/app/api/billing/usage/route.ts in the production source.

70%

“we moved an LG-customer F&B chain from UiPath to Mediar; their CFO told the board they're now saving 70% on costs”

Internal pilot, F&B operator running SAP Business One

What the agent literally does on one MIRO posting

The simplest concrete unit. A vendor PDF lands in the queue. The agent reads the SAP MIRO window through Windows UI Automation, sets three or four fields, presses Post, and writes one row to the execution table. Everything else, including the bill at the end of the month, comes from that row.

One AP invoice from PDF to posted document

The interesting line is the last one. The agent inserts a single execution row with execution_duration_seconds, workflow_id, and status. That row is what the bill multiplies by the rate. There is no allocation step between license seats and process P&L; the row is the unit. The route handler is roughly 80 lines of Postgres aggregation at apps/web/src/app/api/billing/usage/route.ts, grouped by workflow and month, joined to deployed_workflows on tags @> ['prod'].

One execution log, line by line

What the executor writes to its log on a 4.2 second MIRO post. Every field is named by its accessibility role and label, not by a pixel coordinate or a brittle selector. That naming is what makes the run deterministic and what makes a SAP support pack stop being a P1.

ap_invoice_post_miro // production // run 5100000947

At the public list rate of $0.75 per minute, that 4.2 second run is worth about 5.25 cents on the meter. The cost line is exactly the duration multiplied by the rate; it is not a per-seat allocation. The same shape feeds the customer's internal cost-per-process dashboard.

The seven finance workflows, with the screen and the cost

These are the seven recording-shaped workflows finance teams run most often on Windows desktop apps with no usable API. For each one, the screen, the accessibility-tree nodes the agent touches, and the unit cost we observe at list rate. Costs are runtime only; the document understanding step (OCR, field map) is part of authoring for clean documents and a separate rate-card line for messy ones.

MIRO

Invoice receipt for materials (AP)

$0.05 to $1.13 per invoice depending on line count

Vendor invoice from a PDF lands as a posted document against the right PO and tax code.

Nodes touched: [Edit] 'Reference' value='', [ComboBox] 'Tax Code', [Button] 'Post'

F-28

Post incoming payments (cash app)

$0.30 to $0.90 per remittance

Bank lockbox file or remittance email converts into open-AR clearings against customer accounts.

Nodes touched: [Edit] 'Customer', [Edit] 'Amount', open-items grid checkboxes per invoice

FB60

Vendor invoice without PO

$0.45 per invoice on a clean run

Non-PO invoices (utilities, professional services, tax notices) post against an expense account and cost center.

Nodes touched: [Edit] 'Vendor', [Edit] 'G/L Account', [Edit] 'Cost Center', [Button] 'Post'

XK01 / XK02

Vendor master create / change

$1.00 to $2.50 per vendor (multi-tab walk)

New supplier intake from a vendor questionnaire PDF, or routine bank-detail changes from secure email.

Nodes touched: [TabPanel] 'General Data' -> [Edit] 'Name 1', 'Country', 'Tax Number 1'; [TabPanel] 'Payment Transactions' -> [Edit] 'IBAN'

FBL3N

G/L line items pull

$0.75 per pull (most of the time is the export)

Pull a clean ledger extract for a controller, filtered by company code, account, and period.

Nodes touched: [Edit] 'G/L Account', [Edit] 'Posting Date', [Button] 'Execute', export grid -> XLSX

Oracle EBS

Three-way match in Invoice Workbench

$0.60 to $1.40 per invoice

Invoice gets compared to PO and receipt; matched lines move to Validated, breaks land in a hold queue.

Nodes touched: [Window] 'Invoice Workbench' -> [Edit] 'Invoice Number'; [Button] 'Actions...' -> 'Match' -> [Window] 'Find PO'

Jack Henry / Fiserv

Teller cash letter and ACH origination

$0.50 to $1.20 per batch entry

Onboarded customer files, ACH batches, and cash-letter entries replay into the green-screen teller window.

Nodes touched: [Window] 'Teller' -> [Edit] 'Account #'; mainframe terminal field events via UI Automation

The two real numbers from production

Two outcomes worth carrying forward, both verified internally and both publishable in the public brief at mediar.ai/llms.txt.

One: a mid-market insurance carrier moved claims intake from 30 minutes per claim to about 2 minutes by replaying the FNOL desktop forms through the same accessibility tree we showed above. The CFO's AP-team headcount math, not a press release, put the run-rate savings at $750,000 per year. That number is the team's measured cycle-time savings, multiplied by their fully loaded rate, minus the runtime meter.

Two: a community bank cut new-customer onboarding from 8 weeks to 2 weeks by automating the cross-system data entry between their origination system, Jack Henry SilverLake, and their KYC vendor portal. The Jack Henry side is a green-screen teller window. The accessibility surface UI Automation publishes for that window is richer than what GUIScripting exposes, which is why the tree-based agent is the right tool there and a vision-based agent is not.

Where this stops being the right answer

Be honest about the boundary. The accessibility-tree agent stops earning its place in three regimes.

First, anywhere a stable HTTP API exists for the same data. If your bank publishes an ACH origination API and your accounting system publishes a journal entry API, an integration beats both desktop and vision agents on cost and reliability. Use the agent only where the GUI is the integration surface.

Second, on judgement work. Exception triage, fraud screening on first-time vendors, $40K invoice variances, audit-finding response. The right answer depends on context the recording has never seen. A deterministic replay is the wrong tool. Send those to a human with better tooling, not a faster agent.

Third, on workflows that need free-form discovery in screens the recorder has never seen. Production agents in this category are recording-shaped. The model lives at authoring time, not at runtime. If your workflow needs the model in the loop on every step, you are looking at a different product category, and the unit economics will be different.

Bring one finance workflow to a 30-minute call

Pick the screen that hurts most (MIRO, F-28, three-way match, teller entry). We will record it live, show you the accessibility tree behind it, and price the per-execution runtime against your current cycle time.

Common questions from finance and RPA leads

What does 'finance AI agent on legacy desktop' actually do today, in production?

It sits in a Windows session next to a clerk's normal SAP GUI, Oracle EBS, or Jack Henry teller window, and replays a workflow that a human walked through once. The replay reads and writes the same fields the clerk would: invoice date, vendor, amount, tax code, cost center, reconciliation account. The runtime sequencer is deterministic. The model is not in the loop at execution time. Authoring is the part the model does, once, when the human walks the workflow with the recorder open. After that the workflow is a file on disk and the runtime replays it the same way every run.

Which Windows desktop screens does it actually touch?

On the SAP side: MIRO (invoice receipt), F-28 (incoming payments), FB60 (vendor invoice non-PO), XK01/XK02 (vendor master), FBL3N (G/L line items), F-44 (clear vendor). On the Oracle EBS side: Invoice Workbench (three-way match), Cash Receipts Workbench, the GL_INTERFACE upload form. On the banking side: Jack Henry SilverLake teller, Fiserv Premier and Signature, FIS IBS. The screens are the same screens the AP, AR, and treasury teams open every morning. The agent reads them through the OS accessibility surface (UI Automation on Windows), which means SAP GUIScripting being on or off does not matter, the surface is published either way.

What does it cost per invoice or per posting, honestly?

List rate is $0.75 per minute of executor runtime. A clean MIRO post averages about 4 to 6 seconds of clicks once the OCR has the values; that is roughly $0.05 to $0.075 per invoice on the meter. A Vendor master create with multi-tab walks is closer to a minute and a half, $1.10 to $1.30. Cash app in F-28 is in the middle at $0.30 to $0.90 per remittance file. Where teams spend more is the document understanding step before the workflow starts: a complex utility bill with twelve cost centers takes more parsing time than a clean PO-matched invoice. The bill is the runtime telemetry multiplied by the rate. There is no per-seat license, no Robot fee, no Studio fee.

Why doesn't UiPath already do this?

UiPath does some of this; the difference is in cost, time to production, and what happens when the SAP support pack lands. A typical UiPath enterprise deployment for the same SAP B1 chain we replaced was running about $267K per quarter in licenses (Studio Pro, Attended Robot, Unattended Robot, Orchestrator, Document Understanding, AI Center, Premium Support) plus a separate $185K implementation SOW. The CFO told the board they are saving 70% on costs after switching. The other piece is the maintenance economics. UiPath records an XPath-style selector against the GUI control; when SAP renames the field or moves the panel, the selector breaks and a developer has to re-record. The accessibility-tree approach the F&B chain moved to has a four-strategy fallback that absorbs most label renames and panel reorders without a re-record. That is what shifts the maintenance cost curve.

Does the AI run during every posting?

No, and this is the part most explainers get wrong. At authoring time, a human walks the workflow once with the recorder on. A model converts the recording into a deterministic workflow file: a sequence of accessibility-tree node references, conditions, and field bindings. At runtime, the executor replays that file with zero LLM calls in the hot path. You can grep the open-source executor at crates/executor/src/services/typescript_executor.rs in github.com/mediar-ai/terminator for 'gemini', 'openai', 'claude', or 'anthropic' and find zero matches. That is on purpose. Auditors and SOX reviewers do not have to certify a model's behavior; they certify a deterministic replay file. The model is a compiler, not the runtime.

What about audit, SOX, segregation of duties?

Every run writes a row to workflow_executions in Postgres with workflow_id, status, started_at, completed_at, and execution_duration_seconds. The same row drives the bill (apps/web/src/app/api/billing/usage/route.ts groups by workflow and month and multiplies by the rate constant) and the audit trail. The agent runs as a named service account inside the customer's existing security perimeter, on a Windows session that the customer's IT controls. Mediar is SOC 2 Type II certified and HIPAA compliant. For SoD, the typical pattern is the same as any RPA control: the agent has the privileges of one role (AP clerk, cash app clerk), the controller approves above thresholds, and the workflow file plus the execution row plus the SAP document number form the three-way reconciliation an auditor will ask for. On-prem deployment is available for teams whose finance data is not allowed to traverse a vendor cloud.

What workflows should we NOT hand to this kind of agent?

Three categories. First, judgement work: exception triage on a $40K invoice variance, controller signoff, fraud screening on first-time vendors, anything where the right answer depends on knowing why a number looks off. Send those to a human with better context, not a faster agent. Second, free-form discovery in unfamiliar UIs the recorder has never seen; agents in production are recording-shaped, not exploratory. Third, workflows that already have a clean API. If your bank publishes an ACH origination API and your accounting system publishes a journal entry API, integrate them directly. The accessibility-tree agent is the right answer when the data lives behind a Windows GUI that has no API, which is most of legacy enterprise finance and almost none of the SaaS finance stack.

How does a finance team get started without a $500K commitment?

Two paths. The open path: the Terminator SDK at github.com/mediar-ai/terminator is MIT licensed. A finance-systems engineer can record a single workflow, ship it on one Windows session, and prove the unit cost on their own time. No procurement cycle. The sales-led path: the turn-key program is a $10K fee that converts to runtime credits with a small bonus, plus $0.75 per minute of execution. Most teams start with one workflow (cash app or MIRO post are the two most common entry points), measure the per-execution cost against the cycle-time savings, and only widen the rollout once the unit economics are on a dashboard the controller can read. We typically have a workflow in production inside two weeks.

Adjacent reading

If this was useful

Architecture

AI agents on legacy desktop systems with no API

The accessibility tree is the API. The format an agent feeds the model, the four-strategy fallback when SAP support packs ship, and why pixel vision is the wrong default for audited workloads.

Read

Unit economics

Enterprise AI ops vs the finance gap

Why the UiPath bill cannot answer the CFO's question and why a runtime-meter bill can. The 30 lines of route handler that turn execution telemetry into per-workflow unit economics finance can audit.

Read

Deep dive

Automate SAP data entry: the failure taxonomy

The clicks are easy. The exception classifier and retry policy are what decide whether the AP queue is unattended overnight or pages a human at 3am. Field-by-field on a real SAP run.

Read