A category map, not a vendor pitch

Document automation tools split into four layers. Most lists fuse them.

The phrase “document automation tools” covers four distinct jobs: capture (move the doc off paper or email), extract (parse the fields), route (review, approve, sign), and post (write the parsed data into the system of record). Nothing in the market does all four well, and the layer where the buyer has a real bottleneck decides which vendor is even relevant. This guide separates the layers, names the tools in each, and lingers on the fourth layer most published lists pretend doesn’t exist.

Matthew Diakonov, Written with AI

Published May 12, 202610 min

Direct answer (verified 2026-05-12)

Document automation tools split into four layers: capture (move the doc), extract (parse it into JSON), route or sign (approve and archive), and post (write the parsed data into the system of record). Pick by the layer where you actually have a bottleneck.

If the destination is NetSuite, Workday, Bill.com, or Coupa, the posting layer collapses into an API call and you only need layers one through three. If the destination is SAP GUI, Oracle EBS, Jack Henry, Fiserv, FIS, Epic, or Cerner, the posting layer is its own tool category. Mediar lives in that fourth category. The recording loop is documented in the public source at github.com/mediar-ai/terminator.

The four layers, side by side

Each layer has its own job, its own vendor pool, and its own failure mode. A buyer who has a layer-2 problem (parsing accuracy) and a layer-4 problem (legacy desktop posting) needs two tools, not one. A buyer who has only a layer-2 problem is wasting cycles reading about RPA platforms.

1. Capture

Move the document off the scanner, the email attachment, the upload, the EDI feed, the fax. The job is to land a digital artifact (PDF, TIFF, image, JSON) somewhere a downstream tool can read it.

2. Extract

Turn the artifact into structured fields. OCR is the old word, IDP is the new one. The output is a JSON object with the invoice number, the vendor id, the line items, plus a confidence score per field.

3. Route, approve, sign

Decide what happens to the parsed data: who reviews it, who approves it, whether it needs an e-signature, where the audit copy is archived. This is the part most buyers see in a demo.

4. Post into the system of record

Get the parsed JSON into the ERP, the EHR, the banking core, the policy admin system. If the destination has a clean API, this is a one-liner. If the destination is SAP GUI, Oracle EBS, Jack Henry, Fiserv, FIS, Epic, Cerner, or a green-screen, it is the keystrokes.

The end-to-end pipeline, named by layer

Doc arrives

PDF, scan, email, EDI

Capture tool

Lands in an inbox or queue

IDP tool

Returns parsed JSON

Route tool

Reviewer approves

Post tool

Types into the system of record

Layer one: capture tools

Move the document from paper, email, fax, or upload into a digital queue downstream tools can read. The job is largely solved in mid-market and enterprise: ECM and IDP vendors ship capture front ends, and most modern scanners write to email or SFTP out of the box. The capture layer rarely sells on its own anymore; it is bundled into the IDP or the ECM.

Named tools, layer 1

Where documents are first landed

ABBYY FlexiCapture

Scan plus classify, old-school IDP cornerstone.

Hyland OnBase Capture

Inbound capture inside the Hyland ECM stack.

OpenText Capture Center

Capture front-end for OpenText archive estates.

DocuWare

Mid-market capture plus archive in one box.

If a buyer’s primary problem is capture, the conversation is usually about hardware (multifunction devices), inbound channels (shared mailboxes, SFTP, EDI), and classification rules. Adding an IDP usually subsumes this layer for free.

Layer two: extract tools (OCR and IDP)

Turn the captured artifact into a structured JSON object. The modern IDPs use a mix of OCR, layout models, and LLMs to pull invoice numbers, vendor ids, line items, totals, and dates, with a confidence score per field. The output is usually a JSON object plus a copy of the document with field highlights. This layer is where the deep-learning hype landed.

Named tools, layer 2

Where documents become structured data

Rossum

Cloud IDP focused on invoices and remittance docs.

Hypatos

Deep-learning IDP for accounting documents.

Ocrolus

Bank statement and pay-stub extraction for lending.

Adobe Acrobat AI Assistant

Direct PDF read and field extract inside Acrobat.

Google Document AI

Hosted IDP with parsers per document family.

AWS Textract

OCR plus table and form extraction as a service.

Azure Document Intelligence

Form Recognizer renamed, prebuilt invoice and receipt models.

Tungsten (Kofax)

Enterprise IDP with on-prem and cloud deployments.

The honest accuracy bar in production is 90 to 98 percent at the field level for common invoice families and lower for messy handwriting or scanned remittance advice. The published list of parsers is not where the real differentiation is anymore; the real differentiation is the volume of training data each vendor has on your specific document family and the review tooling for field corrections.

Layer three: route, approve, sign

Once the document is parsed, decide who reviews and approves the fields, who signs the final artifact, and where the audit copy is archived. This is the layer most generalist buyers think of when they hear “document automation” because it has the most visible UI: an approval queue, an e-signature box, an archive folder.

Named tools, layer 3

Where humans interact with the parsed document

DocuSign CLM

Contract lifecycle plus e-signature in one platform.

PandaDoc

Proposals, quotes, contracts with built-in signing.

Adobe Acrobat Sign

Enterprise e-signature, deep ties to Adobe documents.

Conga

Document generation and CLM, originally a Salesforce add-on.

Nintex

Workflow and approval routing across forms and documents.

Laserfiche

ECM with built-in approval routing for regulated industries.

The output of layer three is approval state and a signed document. That state has to go somewhere: an ERP, an EHR, a policy admin system, a banking core. That final write is layer four.

70%

“An LG-customer F&B chain moved its SAP B1 invoice posting from UiPath to Mediar. The CFO told the board they are saving 70 percent on costs at the posting layer.”

Documented deployment, layer-4 swap

Layer four: post into the system of record

The last keystroke. The parsed JSON from layer two needs to land in the ERP, the EHR, the banking core, or the policy admin system. If the destination publishes a clean write API, this is a one-line HTTP call and the layer is invisible. If the destination is SAP GUI, Oracle EBS, Jack Henry, Fiserv, FIS, Epic, Cerner, or a green-screen terminal, the layer is keystrokes. The posting layer is the part most public roundups fudge over because the visible action looks the same as the route layer (a tool clicking through a UI) but the work is completely different.

Named tools, layer 4

Where parsed data becomes a posted transaction

UiPath

Enterprise RPA, the historical default for legacy desktop posting.

Automation Anywhere

RPA platform aimed at large enterprises with COE programs.

Microsoft Power Automate Desktop

Microsoft's desktop RPA layer, bundled with Windows 11.

Blue Prism (SS&C)

Long-running RPA platform, strong in banking.

Mediar

Accessibility-API agent, captures the document open and the destination keystrokes as one workflow.

Four of the five names in that list are general-purpose RPA. The fifth, Mediar, is the same job done with OS accessibility APIs and an AI agent recording loop. The reason the fifth approach matters for documents specifically is in the next section.

The anchor fact: how a PDF open and a destination keystroke get tied into one labeled workflow

Every layer-4 tool clicks and types. What separates them is how the authored workflow gets built. Mediar’s recording loop watches the operator do the job once, then synthesizes a labeled workflow file. The non-obvious mechanic is how the recorder labels a single step in the context of the surrounding interactions, which is what lets a PDF open get linked to the destination keystrokes as one named step. The code lives in apps/desktop/src-tauri/src/recording_processor.rs in the Mediar product monorepo.

How the labeling pipeline names one step in the context of eleven

Raw event stream lands at the recorder

An hour of work produces tens of thousands of OS-level events: window focus, mouse moves, key down, ui_tree snapshots, file system notifications. Most are not useful for labeling intent.

is_meaningful_event_type filters to six types

In apps/desktop/src-tauri/src/recording_processor.rs the function returns true for exactly six event types: button_click, browser_click, text_input_completed, browser_tab_navigation, application_switch, and file_opened. Everything else is dropped from the analysis spine. file_opened is the entry point for documents.

Each meaningful event gets a step analysis in parallel

Once a meaningful event lands, check_analysis_gate returns true immediately and the event is queued for parallel step analysis. The analyzer reads the event, the last three step analyses for context, and the surrounding ui_tree snapshots.

check_labeling_gate waits for an 11-event window

Labeling event N requires analyses [N-5, N+5] to all be complete. That is an 11-event sliding context window around any single labeled step. A PDF open at index N gets labeled in the context of the five interactions before and after it, which is what lets the synthesis stage say 'open PDF, then type vendor name into SAP B1 vendor master, then post' as one step.

Synthesis stage emits a labeled workflow

After step_analysis and labeling complete, synthesis produces a typed workflow file. The labels tie the document open to the destination keystrokes, so the generated playback drives both as a single audited run.

The two functions that make this work

is_meaningful_event_type is the filter that keeps the analysis spine small. It returns true for exactly six event types: button_click, browser_click, text_input_completed, browser_tab_navigation, application_switch, and file_opened. The presence of file_opened in that list is what makes the document layer first-class for the recorder.

check_labeling_gate is the function that decides when an event can be labeled. It walks from index N minus 5 to N plus 5 and requires every analysis in that window to be complete. That eleven-event window is what gives the synthesizer enough surrounding context to label a PDF open and the destination keystrokes as one workflow step instead of three unrelated ones.

No other layer-4 tool publishes a comparable mechanic. UiPath, Power Automate Desktop, Automation Anywhere, and Blue Prism all author at the activity level, where the document open and the destination keystrokes are separate activities the developer has to wire together by hand. The recorder loop here authors them together by construction.

A four-question checklist for picking by layer

The right move when a buyer asks “which document automation tool should we buy” is to walk back to the bottleneck. Four questions usually settle it.

Find the layer where the minutes actually go

1
Where does the doc come from
If the answer is 'a scanner in a field office,' you need a capture tool.
2
What is the field-level accuracy bar
If the answer is 'four 9s on invoice totals,' you need an extract tool, not RPA.
3
Does the destination have a stable API
If yes, your post layer is one HTTP call. If no, your post layer is keystrokes.
4
Who signs and audits
If signatures and approval chains are the bottleneck, your gap is at the route layer.

Two of the four questions are about the destination, not the document. That is on purpose: the document side of the pipeline is largely commoditized between IDP vendors, and the differentiation left in the category is what happens after the JSON is produced.

The legacy desktop destinations the posting layer actually has to serve

A layer-4 tool only earns its keep if the destination has no first-class API for the action the buyer is paying a clerk to do. The list is concentrated in a small number of vendors with long tails of regulated buyers, which is part of why the category gets underweighted in public roundups.

The layer-4 destination list

Where the keystrokes have to land

SAP GUI

Order entry, journal posting, vendor master, no REST API for many transactions.

SAP Business One

Windows client, AR and AP invoice screens, no public B1 Service Layer for several flows.

Oracle EBS

Forms-based modules for AP and procurement on the Java client.

Epic Hyperspace

Patient registration, charge entry, prior auth, accessibility tree exposed for screen readers.

Cerner PowerChart

Documentation and orders, no public write API in many flows.

Jack Henry SilverLake

Teller, account open, loan origination, Windows desktop client.

Fiserv Premier and DNA

Deposit, loan, exception handling screens on the legacy desktop.

FIS IBS and Horizon

Green-screen lineage, back-office teller and operations screens.

All eight publish an accessibility tree (the same one screen readers use) for compliance reasons, which is the API surface the accessibility-based layer-4 tools use. Pixel-template RPA can be made to work here too, but it breaks on any UI patch, dark mode flip, or DPI change. The accessibility tree is the durable one.

Pick the layer first, then the tool

If you can name the layer where the minutes go, we can usually tell you in a 20-minute call whether layer-4 work is the right swap for you. We say no to about half the calls because the bottleneck is upstream.

Frequently asked questions

Quick definition: what are document automation tools?

Software that handles part or all of a document's journey from arrival to settled state in a system of record. The category covers four layers in practice: capture (scanner, email, upload), extract (OCR or IDP turns the PDF into JSON), route and sign (approval, e-signature, archive), and post (the parsed data is written into the system of record). Most public listicles call any of the four 'document automation,' which is true at the marketing level and unhelpful at the buying level.

Why does it matter how I split the categories?

Because nothing in the market does all four well, and the layer you have a bottleneck at decides which vendor is even relevant. A team where the destination is NetSuite or Coupa (clean APIs) has no posting problem and an IDP plus a route tool is the whole stack. A team where the destination is SAP GUI, Jack Henry, or Epic has the IDP figured out and a posting problem worth six figures a year. Lists that fuse the layers will recommend an IDP for a posting bottleneck or an RPA platform for a parsing bottleneck, and the buyer wastes a cycle.

What is the literal answer to 'which document automation tool should I pick'?

Pick by the layer where you have actual minutes-of-human-work to remove. Capture bottleneck (scanning, classification): ABBYY, Hyland, DocuWare, OpenText. Extraction bottleneck (parsing): Rossum, Hypatos, Ocrolus, Google Document AI, AWS Textract, Azure Document Intelligence, Tungsten. Route or sign bottleneck: DocuSign, PandaDoc, Adobe Sign, Conga, Nintex, Laserfiche. Posting bottleneck where the destination is legacy desktop: UiPath, Automation Anywhere, Power Automate Desktop, Blue Prism, Mediar. The right answer is usually two tools, one at the bottleneck layer and one that picks up the rest.

Where do most published comparison guides go wrong?

They list ten vendors side by side without naming which layer each one operates at, so a buyer with a posting bottleneck reads about Rossum's classification accuracy and a buyer with an extraction bottleneck reads about UiPath's enterprise licensing. The layer split is rarely visible in the chart. The honest version is a layered category map first, then a tool comparison inside the layer that matches the bottleneck.

Why is the posting layer the one most lists pretend doesn't exist?

Three reasons. First, it is unglamorous: typing into a 1990s WinForms or Hyperspace screen is not what most marketing wants to lead with. Second, the modern SaaS world (NetSuite, Workday, Bill.com) does have APIs, so the posting layer collapses into one HTTP call and stops being a category. Third, the buyers with the posting problem are concentrated in F&B on SAP B1, banking on Jack Henry and Fiserv, healthcare on Epic and Cerner, and insurance on legacy policy admin systems. They are not the audience writing the public tools roundups. The result is a category gap.

Verifiable detail: how does Mediar tie a document arrival to a destination keystroke?

In apps/desktop/src-tauri/src/recording_processor.rs the function is_meaningful_event_type returns true for exactly six event types: button_click, browser_click, text_input_completed, browser_tab_navigation, application_switch, and file_opened. Opening a PDF lands as a labeled boundary in the same stream as the destination clicks and types. The labeling stage then waits on check_labeling_gate, which requires analyses for indices [N-5, N+5] to be complete before labeling event N. That eleven-event window is what lets the synthesizer label 'PDF open, then type vendor code into SAP B1 vendor master, then save' as one step, instead of three unrelated ones. No other layer-4 tool surfaces that structure.

Do I still need an IDP if I use a layer-4 tool?

Yes, for any document that needs structured extraction. The honest pattern is to keep whichever IDP your team already trusts (Rossum, Hypatos, Ocrolus, Adobe Acrobat AI, Google Document AI, AWS Textract, Azure Document Intelligence, Tungsten) and pair it with a layer-4 tool for posting. The two layers are complementary. Mediar specifically does not compete on extraction accuracy; the recorder picks the workflow up at the point where the operator opens the parsed file or the IDP's review screen and continues into the destination.

When is the posting layer the wrong layer to invest in?

When the destination is a modern SaaS application with a public write API. Posting through a UI is slower and less reliable than a direct API call for any flow that lands in NetSuite, Workday, Salesforce (current), Bill.com, Coupa, or HubSpot. A team where every destination is in that list does not need a layer-4 tool at all. The posting category is sized for legacy Windows desktop systems where SAP GUI, Oracle EBS, Epic, Cerner, Jack Henry, Fiserv, and FIS live, because those vendors do not publish write APIs for many of the screens the operators use every day.

What does layer-4 work cost compared to a posting clerk or a UiPath license?

Mediar's executor is $0.75 per minute of execution drawn against a $10,000 turn-key program fee that converts to credits. A three-minute SAP B1 invoice posting costs $2.25 of meter time. The honest comparison is to the line item that gets removed: an AP clerk hour, a UiPath maintenance contract for the workflow, or the offshore-keying line on a BPO invoice. Documented numbers from real deployments include 70 percent cost reduction at an LG-customer F&B chain that moved from UiPath, $750,000 a year saved on insurance claims intake at one mid-market carrier, and $210,000 a year on patient intake at a regional healthcare provider.

How is this guide different from the workflow software guide on this site?

The document workflow automation software guide is angled at the recording loop, focusing on file_opened as a meaningful event and how the recorder treats it. This guide is the wider category map: it places the recording loop inside the broader four-layer landscape, lists the named tools in each layer, and explains how to pick by bottleneck. Read this one first if you are scoping a buying decision and the workflow page next if you have already narrowed to the posting layer.

Is any of this open source so I can audit it?

The Terminator SDK that powers the runtime is published at github.com/mediar-ai/terminator. The selector primitives, the locator with timeout-and-fallback semantics, and the workflow executor are all in that repo. The hosted product UI on top (no-code recorder, workflow synthesis pipeline, cloud executor) is closed source. The split is the honest answer for a security review that needs to inspect the part of the stack touching the OS accessibility tree; the OS-facing part is open.

Adjacent reads

Keep going

Loop detail

Document workflow automation software, audited at the recording loop

Deeper read on the recording loop: how file_opened is captured alongside destination clicks and types, with file paths to verify. Companion to this category map.

Read

Buyer argument

Data entry automation: the four stages and where each tier stops

The buyer argument on document, extract, validate, enter. Names the OCR layer, the validation rules layer, and the destination-UI layer as separate problems with separate vendors.

Read

Architecture

AI agents on legacy desktop systems with no API

Why the accessibility tree is the API for the posting layer, what the format looks like to the model, and why pixel-vision agents are the wrong default here.

Read

The four layers, side by side

1. Capture

2. Extract

3. Route, approve, sign

4. Post into the system of record

Layer one: capture tools

Where documents are first landed

Layer two: extract tools (OCR and IDP)

Where documents become structured data

Layer three: route, approve, sign

Where humans interact with the parsed document

Layer four: post into the system of record

Where parsed data becomes a posted transaction

The anchor fact: how a PDF open and a destination keystroke get tied into one labeled workflow

How the labeling pipeline names one step in the context of eleven

Raw event stream lands at the recorder

is_meaningful_event_type filters to six types

Each meaningful event gets a step analysis in parallel

check_labeling_gate waits for an 11-event window

Synthesis stage emits a labeled workflow

A four-question checklist for picking by layer

Find the layer where the minutes actually go

The legacy desktop destinations the posting layer actually has to serve

Where the keystrokes have to land

Pick the layer first, then the tool

Frequently asked questions

Keep going

Document workflow automation software, audited at the recording loop

Data entry automation: the four stages and where each tier stops

AI agents on legacy desktop systems with no API

Comments (••)

Comments ()