An argument

AI form recognition and compliance: which step is the AI, and what does the auditor read?

The phrase “AI form recognition” covers two different pipelines that compliance teams keep treating as one. One pipeline recognizes fields in a source document. The other recognizes fields in a destination application. Under HIPAA, SOX, KYC, and NYDFS the shape of the artifact a regulator opens at the end is so different between the two that a single compliance answer is always partly wrong. This page draws the line, then traces what a deterministic runtime puts in front of an auditor on the destination side.

Matthew Diakonov, Written with AI

Published May 5, 202612 min

Direct answer (verified 2026-05-05)

Is AI form recognition compliant for regulated industries? Depends entirely on which step of the pipeline is the AI.

Vision and OCR on the source PDF is probabilistic. It is allowed under HIPAA and SOX as long as the entity using it keeps confidence scores per field, runs a human review queue for low-confidence rows, maintains a model card, and re-reviews a blind sample on a documented cadence. The model itself is part of what has to be explained on audit day.

Recognition of fields in the destination application (the Epic registration screen, the Fiserv onboarding flow, the SAP GUI form) is a different question. There is no source image to OCR. The deterministic answer is to read the Windows UI Automation accessibility tree directly, the same surface a screen reader consumes, and to ship the resulting workflow file as the audit artifact. The runtime is then a Rust program executing that file, with no LLM call between read and write.

Authoritative sources for the two halves: Azure AI Document Intelligence overview for the OCR/vision side, and github.com/mediar-ai/terminator for the accessibility-tree runtime, where the executor crate has zero references to any inference library (verifiable by ripgrep).

The two pipelines

Strip the marketing off the category and there are two pipelines hiding under one phrase.

Pipeline one is source recognition. A PDF arrives. A scan arrives. A photo of a paper claim arrives. A model converts pixels and text into a structured record. This is where Azure AI Document Intelligence, AWS Textract, Google Document AI, Rossum, Docparser, Affinda, and Nanonets sit. The thing being recognized is the source. The output is JSON or a row in a queue. The compliance question on this pipeline is the standard one for any AI inference: confidence thresholds, human review for the long tail, model cards, retraining cadence, blind sampling. Section 164 of HIPAA does not name OCR specifically, but the Office for Civil Rights audits the process the covered entity wraps around it.

Pipeline two is destination recognition. A claims rep needs a value typed into screen four of a Guidewire first notice of loss. A patient access coordinator needs a field set in an Epic registration session. A bank associate needs a record opened in Fiserv DNA. The thing being recognized is the live form in the desktop application. There is no source image here; the form is the Win32 control. The compliance question on this pipeline is different: what reads the form, what writes to it, and what audit artifact does the writer leave behind. A model that hallucinates a field name is a much bigger problem here than in the source pipeline because there is no human reviewer staring at the JSON in between.

What changes when the AI moves from the source to the destination

The toggle below is the same data, recognized two ways. On the left, an OCR pipeline reads a CMS-1500 PDF and emits JSON. On the right, an accessibility-tree pipeline reads the corresponding fields in the live Epic registration screen and types values into them. Same patient, same record, two completely different audit shapes.

OCR on the source PDF vs UIA on the destination form

A vision model reads a CMS-1500 scan. It returns a structured record with patient_last_name, member_id, date_of_service, ICD-10 code, and a per-field confidence score. Anything below the threshold is queued for a human reviewer. The artifact a HIPAA auditor opens is the configuration plus the review queue plus the model card.

Probabilistic at every field; confidence per row.
Requires a human-in-the-loop queue for the long tail.
Audit artifact is a model card plus a review log.
Two identical scans can produce different JSON.

The point is not that one pipeline is better than the other. The point is that they have different audit shapes, and the answer to “is AI form recognition compliant?” depends on which one a buyer is actually choosing.

Why HIPAA and SOX care about determinism on the destination side

The regulator is not auditing the model. The regulator is auditing whether the same input always produced the same record. That property is cheap to argue when the writer is a SQL stored procedure, a payroll calculation, or a cron job. It is harder when the writer is a probabilistic model that picks the next click on the fly. Two identical claims can produce two different field values. Two identical KYC packets can produce two different beneficiary records. Two identical journal entries can produce two different account postings. None of those are illegal by themselves; all of them are difficult to defend in a control walkthrough.

A deterministic runtime restores the property. The workflow file is the testable unit, the same way a database migration is the testable unit for a schema change. A reviewer reads it, redlines one step, runs it against a test environment, and signs it. Whatever model wrote that file in the first place is gone by the time the reviewer opens the artifact.

“The production executor crate has zero references to gemini, claude, openai, or any inference library. The model runs once during recording. The runtime is deterministic Rust calling Windows accessibility APIs.”

LLM call sites in crates/executor (verifiable via ripgrep on github.com/mediar-ai/terminator)

The audit artifact

What an auditor opens on the destination side, in literal form, is a TypeScript file. Each step is a structured record with the field names the live application uses. A reviewer reads it, diffs it, and signs it. The runtime executes that file byte for byte on each run.

terminator.ts (excerpt)

The locator resolver lives in apps/desktop/src-tauri/src/focus_state.rs. It walks four strategies in order: the recorded automation id, the window handle plus bounds, the visible text content, and the parent window as a last fallback. Three of those four do not depend on absolute pixel position, so a routine UI tweak (a button moves down a row, a panel reorders, a form gains a tab) usually still resolves through strategies one, two, or three. Only when all four miss does the step fail loudly into the trace. The failure is queueable for re-recording, not silently retried with a guessed element.

The split, in numbers

0LLM call sites in the production executor crate

0locator strategies the resolver tries before failing

0vertex_ai call sites in recording_processor.rs (offline pass only)

0semantic fields recorded per step in the workflow file

Counts come from the open-source Mediar repo. Zero LLM call sites in crates/executor (a single ripgrep on github.com/mediar-ai/terminator confirms it). Four locator strategies in apps/desktop/src-tauri/src/focus_state.rs. Four call sites for vertex_ai::call_vertex_ai inside recording_processor.rs, all of which run in the offline recording pass. Eight semantic fields per step in the recorded workflow format. The split between the offline AI and the runtime is mechanical, not aspirational.

What the auditor walkthrough actually looks like on each side

The two checklists below are the two control walkthroughs an internal auditor or external assessor would do for the two pipelines. They are not interchangeable. A team that runs an OCR pipeline on the source and skips the second walkthrough still has a regulated form in their system of record being filled by something. A team that runs a UIA pipeline on the destination and skips the first walkthrough still has a vision model on the source PDF.

OCR / vision walkthrough on the source pipeline

Per-document confidence score retained alongside the extracted record.
Human-in-the-loop queue for any field below the confidence threshold.
Model card or system card kept current as the model is retrained.
Periodic blind sample re-review by a clinician, adjuster, or analyst.
Documented retraining cadence, with regression testing on the last quarter of forms.

Boxes left unchecked deliberately: each item is a real obligation the buyer takes on when they choose an OCR-based source pipeline. They are not failures, they are the price of the probabilistic step.

UIA walkthrough on the destination pipeline

Recognize that the workflow file (TypeScript) is the artifact, not the model.
Diff each step the way you would diff a stored procedure.
Run the runtime against a test environment and capture a deterministic trace.
Confirm the executor crate has zero LLM call sites (a single ripgrep on github.com/mediar-ai/terminator).
Sign off the file, not the model. The runtime executes that file byte for byte on each run.

Boxes pre-checked here for a different reason: every item is something the workflow file already exposes, so the obligation on the buyer is to read and sign the file, not to maintain a separate model governance layer.

Counterargument: when an OCR-shaped answer is the right one

The honest case for an OCR-only pipeline is real and worth naming. If the only thing the buyer needs is to turn a stack of PDFs into a CSV that lands in a data warehouse, the destination pipeline does not exist as a problem. A vision model on the source, a confidence threshold, a review queue for the long tail, and a documented retraining cadence is a complete answer. Azure AI Document Intelligence, AWS Textract, Google Document AI, Rossum, and Nanonets all do this work well. The tooling is mature and the audit shape is well understood.

The mistake is buying that tool and assuming it has also recognized the form on the destination side. It has not, because the surface it integrates against simply does not exist on a Win32 application. The two pipelines want different products. A reasonable stack uses one tool from each side and is honest about the audit shape of both.

Resolution

So back to the literal question. Is AI form recognition compliant? On the source, yes, with the standard OCR governance package: per field confidence, human review queue, model card, blind sampling, retraining cadence. On the destination, the question is the wrong shape: there is no probabilistic step to govern, because the deterministic answer is to read the Windows UI Automation tree directly and ship the workflow file as the artifact. The model is gone before the auditor walks in.

A compliance team that buys one tool and ignores the other layer ends up with either a beautiful PDF extraction and a regulated form being filled by something they cannot describe, or a beautiful workflow file and a backed-up queue of source documents. Neither half on its own is a form-recognition strategy under regulation. The right answer is one tool from each layer, with a clean hand-off, and a clear story for what the auditor reads at each end.

See the destination pipeline filled in your own environment

Bring one regulated form (Epic, Cerner, Fiserv, Jack Henry, Guidewire, Oracle EBS, SAP GUI, or any Win32 app under audit) and we will record it live, show the TypeScript file the recording emits, and run the deterministic replay against your test environment in the same call.

Frequently asked questions

What does 'AI form recognition' usually mean, and why is the compliance answer never one answer?

In most articles 'AI form recognition' means OCR plus a layout model on a source document: an invoice, a CMS-1500, a 1099, a referral letter. Azure AI Document Intelligence (formerly Azure Form Recognizer), AWS Textract, Google Document AI, Rossum, and Docparser all sit there. The compliance answer is never one answer because the same phrase also gets used for recognizing fields in a destination application (Epic, Fiserv, Jack Henry, Guidewire, SAP GUI). The pipelines are different, the failure modes are different, and what an auditor opens at the end is different. A page that gives one compliance answer is silently picking one of the two and ignoring the other.

Is OCR-based form recognition compliant under HIPAA?

It can be, but only with explicit guardrails: per-document confidence scores retained with the record, a human-in-the-loop queue for low-confidence fields, a model card kept current, periodic blind sample re-review, and a documented retraining cadence. The HHS Office for Civil Rights does not certify OCR engines; it audits the process the covered entity wraps around them. The accountable party is the entity, the artifact under audit is the configuration plus the review queue, and the model itself is part of what has to be explained.

Where does AI form recognition fail compliance most often in practice?

Two places. First, at the boundary between the OCR output and the destination system, when teams paste an extracted JSON into a desktop app without preserving which fields came from which page coordinates of the source. The chain of custody breaks. Second, at the destination itself: an OCR pipeline that ends with a CSV import is fine when the destination accepts a CSV import, but the regulated forms in Epic, Cerner, Guidewire, and SAP do not. Treating those forms as if they were a CSV import is the most common audit finding I have seen on this kind of project.

How does Mediar treat the destination form differently from an OCR pipeline?

Mediar reads and writes the destination through the Windows UI Automation accessibility tree, the same tree a screen reader consumes. Each field is identified by its UIA automation id (or a four-strategy fallback), each value is set by calling EditPattern.SetValue on that node, and the whole sequence is captured in a TypeScript workflow file checked into source control. There is no image template, no pixel match, and no LLM call between read and write. The runtime is deterministic at the OS layer, and the workflow file is what compliance signs off, not the model.

Where does the AI live, then, if it is not in the runtime?

It lives in the offline recording pass. When an operator records a workflow, a vision-capable model reads the captured event stream and rendered pages, classifies the steps, and writes the TypeScript file. That call site is in apps/desktop/src-tauri/src/recording_processor.rs and runs once per workflow. After that file is reviewed and checked in, the runtime executes the file deterministically. A reviewer can confirm there is no LLM in the runtime by running ripgrep on the production executor crate at crates/executor in github.com/mediar-ai/terminator: zero references to gemini, claude, openai, or any inference library. That zero is the entire architectural bet for regulated industries.

What does an auditor actually open?

The TypeScript workflow file. Each step is a createWorkflow entry with a structured semantic record (step title, user intent, what was clicked, what was typed, target element, parent window, screenshot id, side effect observed). A reviewer can diff that file the way they would diff a SQL stored procedure, redline a specific step, run the runtime against a test environment, and capture a deterministic trace. If a step fails the four-strategy match, the runtime emits a failure record into the same trace. Nothing is silently retried with a different element.

Why does determinism matter so much for HIPAA, SOX, KYC, and NYDFS?

Because the regulator is not auditing your model. The regulator is auditing whether the same input always produced the same record. A probabilistic step in the runtime breaks that property: two identical claims can produce two different field values, two identical KYC packets can produce two different beneficiary records, two identical journal entries can produce two different account postings. A deterministic runtime makes the workflow itself the testable unit, the same way a SQL migration is the testable unit for a database change. Two different artifacts to govern, two different stories to tell on audit day.

Does Mediar use vision models at all when reading the source PDF?

Yes. For unstructured input (a PDF, a scanned referral, an emailed quote), the recording processor uses a vision model in the offline pass to extract the fields the destination workflow named. The extraction schema is shaped by the destination, not by a generic invoice or claim ontology. So the model returns JSON whose keys are exactly the field names the recorded form uses, which removes the human remapping step that most OCR walkthroughs end with. That extraction is still a model call, with the usual confidence and review obligations. The hand-off between the extracted JSON and the deterministic runtime is where the audit shape changes.

How does this hold up when a vendor ships a UI change?

Better than pixel-based or selector-based RPA. The locator resolver in apps/desktop/src-tauri/src/focus_state.rs walks four strategies: the recorded automation id, the window handle plus bounds, the visible text content, and the parent window. Three of those four do not depend on absolute position, so a routine UI tweak (a button moves down a row, a panel reorders, a form gains a tab) usually still resolves. Only when all four strategies miss does the runtime mark the step for re-recording. That failure is loud and queueable, not silent and statistical.

Is the form-fill runtime open source enough to verify these claims?

The execution layer is. The Terminator SDK that performs the UIA calls and the four-strategy element resolution is published as terminator-rs on crates.io and lives at github.com/mediar-ai/terminator under MIT. A compliance team can clone the repo, ripgrep for inference libraries in the executor crate, read the four-strategy match cascade, and confirm the runtime shape independently. The orchestration layer, the cloud workflow runner, and the recording pipeline are commercial. A team that wants to wire form-fill primitives into their own queue can build directly on Terminator.

Adjacent reading

Field guide

AI tools for filling complex compliance forms

The questionnaire-AI tools (Vanta, Drata, Sprinto) and the system-of-record automation tools answer two different questions. This page draws the line.

Read

Walkthrough

AI data entry from PDF

Most guides stop when the extractor returns JSON. This one traces the rest of the round trip into the destination desktop form, schema-shaped and field-by-field.

Read

Governance

Enterprise AI agent governance for legacy systems

How to govern an AI agent whose runtime is deterministic and whose model is gone by inference time. Different shape from governing an OCR model in the hot path.

Read