Buyer's argument

Document workflow automation software stops one step short.

Document workflow automation software, in the way the category sells itself, is a closed loop. A document arrives, classification picks its type, an extractor turns it into JSON, a human approves, the archive copy gets stored, and an e-sign goes out. That description is honest for steps one through four. Step five, the part where the parsed data has to actually post into a system of record, is the step the documented dollar savings come from, and it is the step most platforms in this category leave to someone else.

M
Matthew Diakonov
10 min

Direct answer (verified 2026-05-10)

Document workflow automation software is a category of platforms that capture incoming documents (scan, email, EDI, upload), classify them, extract structured data with OCR or IDP, route the extracted record through approval, and dispose of the result by archiving, e-signing, or posting to a downstream system. Common named products in the category include DocuWare, Laserfiche, M-Files, Hyland OnBase, Kissflow, Process Street, and Adobe Acrobat Sign. Most platforms cover capture through approval inside their own UI; the disposition step into a legacy desktop system of record (SAP GUI, Oracle EBS, Epic, Cerner, Jack Henry, Fiserv, FIS, mainframe terminals) requires a separate desktop-automation runtime, because those destinations have no public write API. Mediar is in that desktop-automation tier and captures the document open as a first-class workflow event alongside the destination keystrokes.

Why the category description ends at approval

Read three vendor pages on document workflow automation. The shape of every page is the same. A capture stage feeds into a classification stage, which feeds into an extraction stage, which feeds into a review and approval stage, which feeds into a disposition stage. The disposition stage is described in two ways: archive the document in a repository, or e-sign it and route to a counterparty. Both are honest descriptions of what the platform does inside its own UI.

The mismatch arrives when the buyer reads carefully. A real AP team is not measuring success by whether an invoice is archived. A real claims team is not measuring success by whether a claim form is e-signed. They are measuring success by whether the AP general ledger now reflects the invoice, or whether the claim is now assigned in the carrier's policy administration system. Those systems are SAP, Oracle, Guidewire, or whatever the back office actually runs on. The disposition step in the vendor diagram does not connect to any of those by default.

On a public-API destination, the gap is closed with a connector. A modern SaaS ERP exposes a write endpoint, a webhook, or both, and the document workflow vendor lists it on a connectors page. On a legacy Windows desktop destination, the gap is closed by either writing a custom CSV importer (when the destination cooperates) or hiring a clerk to type the JSON output into the destination's UI. The clerk is the line item that the documented dollar savings tend to remove. The clerk is also the part of the workflow that no standard document automation platform writes copy about.

The closed-loop diagram vs. the desktop-included diagram

A closed loop inside the vendor's UI. A scanned invoice arrives, an OCR/IDP pass returns JSON, a reviewer approves, the result is archived in the vendor's repository or e-signed. The integration with the system of record is a flat-file export, an SFTP drop, or a list of REST endpoints that exist only on the vendor's preferred destination cloud apps.

  • Capture, classify, extract, route inside the platform UI
  • Archive and e-sign as the canonical end state
  • Posting to SAP, Oracle, Epic, Jack Henry is left to a separate RPA tool
  • If the destination has no API, the IT team writes a CSV importer or hires a clerk

The recording event that makes the desktop-included version real

The thing that makes a desktop-automation runtime work for document workflows, instead of feeling like a bolt-on, is whether the recording loop treats the document arrival as a meaningful event in the same vocabulary as the destination keystrokes. If the recorder treats "file opened" as an OS notification and ignores it, the synthesis stage cannot label the document arrival together with the SAP click that follows, and the resulting workflow file reads as two disconnected halves. If the recorder treats it as a first-class event, the synthesis stage produces one workflow that reads the document and writes the destination.

The decision is encoded in a single function in Mediar's recorder. The function is is_meaningful_event_type in apps/desktop/src-tauri/src/recording_processor.rs. It returns true for exactly six event types. The function is the gate between the raw OS event stream (hundreds of thousands of records per recording hour, mostly mouse moves and individual keystrokes) and the analyzer that produces a labeled workflow.

recording_processor.rs

Five of the six are the obvious ones: a native desktop click, a browser click, a complete text input (the recorder aggregates keystrokes into a single text-input-completed event when focus leaves the field), a browser tab navigation, and an application switch (alt-tab or its equivalent). The sixth is file_opened. That is the one that matters for document workflow automation, because it is the event the recorder fires when the operator opens the source document in Acrobat, in Edge, in Excel, or in whatever else they read invoices and claim forms in.

The wire format is in the sibling module. When the recorder produces a WorkflowEvent::FileOpened, the ingestion layer maps it to a request with a string event type of "file_opened" and an embedded FileOpenedEvent payload that carries the filename. The mapping lives at the FileOpened arm of the match in event_ingestion.rs.

event_ingestion.rs

The result is that a recording of a real workflow (open invoice.pdf in Acrobat, alt-tab to SAP B1, click into the AR invoice screen, type the line items, save) becomes a file_opened event followed by application_switch followed by button_clicks and text_input_completeds. The synthesis stage assigns one label to the file open (read invoice) and labels to each destination interaction (enter line item 1, enter line item 2, save). That is the integration the standard document workflow vendor leaves to a clerk.

What counts as a document, in the recorder's vocabulary

  • PDF invoices and remittance advices opened in Acrobat, Edge PDF, or a browser tab
  • Scanned claim forms and EOBs opened from a network share or an inbox attachment
  • Excel and CSV exports treated as the source the user reads from
  • Word and email body text the user copies fields out of
  • Vendor portal HTML pages, when the user prints to PDF first or copies values directly
  • Faxed documents converted to TIFF or PDF by the multifunction device
$750K/year

One mid-market insurance carrier moved claims intake from 30 minutes per claim to 2 minutes per claim by routing a recorder over the IDP output and into the policy administration screens. The number is the AP-team headcount math, not a press release figure.

Mediar customer reference, 2025-2026

The destinations the desktop tier exists for

The shortlist of legacy destinations is short and stable. They are the systems where the operator's last action is a click in a desktop client, not an HTTP request. Each one is a place document workflow software hands the operator a JSON output and walks away. Each one is a place a recording loop with a meaningful file_opened event can pick the workflow back up and finish it.

Eight desktop systems where document workflow software ends and the desktop tier begins

Where the parsed data has to land

SAP GUI

Order entry, vendor master, and journal posting in classic SAP GUI for Windows.

SAP Business One

AR invoice, AP invoice, sales order, and inventory transactions on B1 Windows client.

Oracle EBS

Forms-based modules for procurement, payables, and order management on Oracle E-Business Suite.

Epic

Patient registration, charge entry, and prior-auth screens in the Epic Hyperspace client.

Cerner

PowerChart and Revenue Cycle screens that read accessibility but expose no public write API.

Jack Henry

SilverLake and Symitar teller, account-open, and loan-origination screens on the Windows client.

Fiserv

Premier and DNA screens for deposits, loans, and exception handling on the Windows desktop.

FIS

IBS and Horizon teller and back-office screens that share a green-screen lineage.

The honest counterargument

There are workflows where this whole argument is wrong, and they should be named on the page so the buyer can recognize their own situation.

When the destination is a modern SaaS app with a usable public write API (Bill.com, Coupa, Workday for many flows, NetSuite for most flows, current Salesforce modules), the right pattern is the standard document workflow vendor's connector. A direct JSON to API call is faster, cheaper, and more reliable than driving any UI. A team that buys a desktop-automation runtime for a workflow that ends in a public API is paying twice.

When the destination is a custom OpenGL surface or a legacy Java Swing theme that publishes no useful accessibility tree, the recording loop has nothing to read. The selector vocabulary cannot describe an element the OS does not expose. Computer-vision RPA can sometimes work on these surfaces, with the cost shape that comes with treating the screen as pixels.

When the volume is genuinely low, say a few documents a week, the right answer is a person. The fixed cost of any automation platform, including this one, is hard to amortize against a workflow that adds up to fifteen minutes a day. The case for adding a desktop layer to a document workflow stack starts at roughly 100 repetitive workflows per week, in the buyer profile where the destination is one of the eight systems above.

What to ask a document workflow vendor before signing

Three concrete questions that surface whether the platform plans for the last keystroke or hands it off.

  • Show me the exact integration with [our destination], not the connector list. If the answer is a flat-file export to SFTP, the platform stops at parsed data and your AP team is going to spend the savings on reconciling the import. If the answer is a write API that does not exist on your destination, the platform is selling a connector to a different system than the one you operate.
  • What does the audit trail look like across the parsing step and the posting step? If the document workflow tool ends at parsed data and a separate RPA tool does the posting, the audit trail is two artifacts. The compliance team has to reconcile them. The desktop-included pattern produces one trace for the same recording session.
  • When the destination UI changes (a SAP transport, an Epic upgrade, a Jack Henry service pack), what breaks and who fixes it? Connector-based answers fail silently when a vendor renames a field. Selector-based runtimes that publish their selector vocabulary (Name, Role, Id, Text, Path, NativeId, ClassName, Attributes, plus Chain) give you an honest list of where the replay engine can and cannot adapt.

The point of all three questions is to surface what the vendor actually owns and what they are handing back to the buyer. The last keystroke is the one that determines whether the dollar savings show up on the next quarterly report.

Watch the recorder open a document and finish in your system of record

Bring one document type and one destination screen. We record the operator doing it once, you leave the call with a YAML workflow that reads the document and posts the data into SAP, Oracle, Epic, or Jack Henry on replay.

Frequently asked questions

What is document workflow automation software, in plain terms?

It is a software category that handles the lifecycle of an incoming document: capture (scanner, email, upload, EDI feed), classification (what kind of document is this), extraction (OCR or IDP turns the document into structured data), routing (review and approval steps), and disposition (archive, e-sign, or post to a downstream system). Most vendors in the category cover steps one through four inside their own UI and call the loop closed. The fifth step, posting the data into a system of record, is where the documented dollar savings live for enterprise deployments and is also the step most platforms hand off to a separate runtime.

Where do most document workflow tools stop, and why does it matter?

They stop at parsed data. The vendor returns a JSON object with the invoice number, the vendor id, the line items, and a confidence score per field. The next step, posting that JSON into the buyer's ERP, is left to whatever integration is convenient: a flat file dropped to SFTP, a list of REST endpoints that exist only on the vendor's preferred destination apps, or a manual export the AP team imports by hand. In an environment where the destination is a public-API SaaS app (Coupa, Bill.com, NetSuite for some flows), this is fine. In an environment where the destination is SAP GUI, Oracle EBS, Jack Henry, Fiserv, FIS, or Epic, the buyer is paying for the first 80 percent of the workflow and re-keying the last 20 percent.

What is the literal answer to 'how does the parsed data get into the legacy desktop app'?

Three options. First, write a CSV importer custom to the destination app, if the destination supports CSV import (most do not, in their newer modules). Second, hire a clerk to type the JSON output into the destination UI. Third, use a desktop-automation runtime that drives the destination UI on the operator's behalf. UiPath, Power Automate Desktop, Automation Anywhere, and Mediar are all in the third category. Mediar's edge is that the runtime captures the document open and the destination keystrokes as one continuous recording, instead of asking the buyer to wire a JSON-to-keystroke bridge.

What is the verifiable detail that makes this approach different?

The recording loop's meaningful-event filter. In apps/desktop/src-tauri/src/recording_processor.rs the function is_meaningful_event_type returns true for exactly six event types, and file_opened is one of them. The other five are button_click, browser_click, text_input_completed, browser_tab_navigation, and application_switch. The filter sits between the raw OS event stream (hundreds of thousands of records per recording hour) and the LLM step analyzer (which sees only meaningful events). Because file_opened is in the filter, opening a PDF in Acrobat is captured as a labeled boundary alongside the destination's clicks and types, and the synthesis stage produces a single workflow that reads the document and writes the system of record. The source map for FileOpened to the wire format lives in event_ingestion.rs at the FileOpenedEvent variant.

Does this replace IDP tools like Rossum, Hypatos, Ocrolus, or Adobe Acrobat AI?

No. Those tools are good at the parsing step and Mediar does not try to compete with them on extraction quality. The honest pattern is to use whichever IDP your team already trusts to turn a document into JSON, and use Mediar's runtime for the posting step into the legacy desktop app. The recorder will pick the IDP's UI back up at the point where the operator opens the parsed JSON or the IDP's review screen, and follow them into the destination. The two layers are complementary, not competitive.

Where does this approach refuse to be the answer?

Two places. First, when the destination is a modern SaaS app with a clean public API. A direct JSON to API call is faster, cheaper, and more reliable than driving a UI, and that is the right answer for any flow where the destination is Bill.com, Coupa, Workday, or a current Salesforce module. Second, when the destination is a custom OpenGL or Java Swing surface that publishes no useful accessibility tree. Computer-vision RPA can sometimes work there. The accessibility-tree approach is sized for the legacy Windows desktop layer where SAP GUI, Oracle EBS, Epic, Cerner, Jack Henry, Fiserv, and FIS live, because those vendors maintain a useful accessibility tree for screen-reader compliance reasons.

How is this different from the data entry automation page on this site?

The data entry automation page is a buyer's argument about the four-stage shape of a data-entry pipeline (document arrives, extract, validate, enter), aimed at someone weighing OCR vendors against RPA vendors. This page is narrower: it names the specific event in Mediar's recording loop that lets the document open and the destination keystrokes be captured as one workflow, with the file path and function name to back it. The two pages are independent reads.

What does this cost compared to a traditional document workflow stack?

Mediar's runtime is $0.75 per minute of execution, drawn against a $10,000 turn-key program fee that converts to credits. A three-minute SAP B1 invoice posting costs $2.25 of meter time. The number to compare is not the per-minute rate but the line item that gets removed from the existing budget: the AP clerk hour, the UiPath maintenance contract, or the offshored-keying line on the BPO invoice. Documented numbers from real deployments include 70 percent cost reduction at an LG-customer F&B chain that moved from UiPath to Mediar, $750,000 a year saved on insurance claims intake at one mid-market carrier, and $210,000 a year on patient intake at a regional healthcare provider.

Is the recording loop open source so we can audit it?

The Terminator runtime that powers the loop, including the Selector vocabulary and the Locator with timeout-and-fallback semantics, is published as an SDK at github.com/mediar-ai/terminator with TypeScript bindings on the Rust core. The product UI on top of it (the no-code recorder, the workflow synthesis pipeline, the cloud executor) is closed source. The split is the honest answer for a security review that needs to inspect the runtime end to end, since the part that touches the OS accessibility tree is the part that is open.