Field guide for compliance officers

AI tools that pre-fill regulatory forms are easy to ship. Keeping the values they type out of your own logs is the hard part.

The category page for this topic is full of vendors that promise HIPAA or GDPR compliance the way a SaaS startup promises uptime. The audit question that survives a healthcare privacy officer, a bank's model risk team, a pharma quality lead, or a regulated customer-service desk is more specific. When the AI types a member id, an MRN, an EU national identifier, or a regulated case number into a desktop form, where exactly does that string travel afterward? This page names the piece of code that decides the answer for Mediar.

M
Matthew Diakonov
14 min

Direct answer (verified 2026-05-08 against github.com/mediar-ai/terminator)

Can AI safely pre-fill regulatory forms with PHI or PII?

Yes, but only if the pre-fill runtime never persists the values it typed. In Mediar that guarantee is one function: redact_secrets_from_output in crates/executor/src/services/secrets.rs. It runs after the workflow finishes typing and before the output JSON lands in the database, replacing every encrypted org-secret value with [REDACTED:NAME]. The ciphertext stays in the org_secrets Postgres table, encrypted under AES-256-GCM with a 12-byte nonce derived from a SHA-256 hash of the master key.

That single mechanism is what makes the difference between a vendor that says “HIPAA compliant” and a runtime a healthcare privacy officer or a GDPR data protection officer can sign off on. Everything else on this page is a consequence of that one primitive.

Four regulators, one runtime question

The keyword that brought you here lumps four regulated environments together. They have different statutes and different auditors, but they ask the runtime the same question.

Customer service. A regulated contact center handling refunds, disputes, or complaints has to log enough to satisfy the regulator (Reg E disputes in banking, telecom CPNI handling under FCC rules, GDPR rights requests in EU markets) without retaining more identifying data than necessary in a place customer-service supervisors can browse. The forms live in Salesforce Service Cloud, Zendesk, Genesys, or a vintage Win32 CRM.

Healthcare. The forms are inside Epic, Cerner, Athena, eClinicalWorks, NextGen, or Allscripts. The regulator is HHS OCR. The acronym is HIPAA. The critical fields are MRN, member id, DOB, ICD-10 codes, NDC, NPI, and claim number. The minimum-necessary rule (45 CFR 164.502(b)) applies to every typed value.

Finance. Reg E and Reg Z deadlines drive bank dispute pre-fill in Jack Henry SilverLake, Fiserv DNA, FIS Profile, and Symitar. SOX governs the controls. NYDFS Part 500 governs the cybersecurity posture. KYC and BSA AML drive onboarding. The values pre-filled are SSN, account numbers, balances, and transaction descriptors that double as PII under state laws.

Pharma. Adverse event triage and case intake land in Veeva Vault Safety, ArisGlobal LifeSphere, or an EudraVigilance E2B(R3) submission. The regulators are FDA, EMA, and the local health authority. The standards are 21 CFR Part 11 (electronic records and signatures), GVP Module VI (EU adverse event reporting), and ICH E2B(R3). The values pre-filled include patient initials, narrative text, MedDRA terms, and product batch numbers.

The mechanism, in code

Here is the relevant excerpt from crates/executor/src/services/secrets.rs. Two specific lines are doing the heavy lifting for compliance: the length floor on line 178, and the descending-length sort on line 197. Both are subtle and both are checkable by any reviewer with ripgrep.

crates/executor/src/services/secrets.rs

The four-character minimum (v.len() >= 4) is a false-positive guard. Without it, an org that defines a one-character flag value as a secret would see every occurrence of that letter in the output replaced with [REDACTED:NAME], corrupting the trace. The values that matter for regulatory pre-fill (MRNs, member ids, NPIs, claim numbers, account numbers, EU national identifiers) are well above the floor, so the redaction triggers reliably.

The descending-length sort handles the case where one regulated value is a substring of another. If a workflow uses PATIENT_NAME with the value “Maria” and PATIENT_FULL_NAME with “Maria Garcia”, an unlucky iteration order would partial-match the shorter value first and leave the longer one partially exposed. Sorting descending by length makes that impossible.

Where the typed value travels, step by step

Six events between “the workflow is about to type into Epic” and “the audit log is on disk”. Only one of them is allowed to see the raw value, and only the last two persist anything to the long-term store.

1

1. Decrypt the secret values for this org

On the executor side, AES-256-GCM ciphertext from the org_secrets table is decrypted into a plain HashMap<String, String>. The master key is derived by SHA-256 hashing SECRETS_ENCRYPTION_KEY, and the nonce is the first 12 bytes of the stored blob.

2

2. Substitute placeholders into the workflow input

The workflow file references secrets as ${PATIENT_MRN} or ${TAX_ID}. Those placeholders are expanded into real values inside the input JSON, just before the runtime starts typing into the desktop form.

3

3. Type into the destination form via UIA

The single MCP tool type_into_element in apps/desktop/src-tauri/src/mcp_converter.rs invokes Windows UI Automation EditPattern.SetValue on the matched node. No pixels, no selectors. The locator walks four strategies in focus_state.rs before failing.

4

4. Capture the workflow output

The runtime returns a JSON output describing what happened, what was clicked, and what was typed. At this point the JSON contains the raw PHI or PII the workflow used.

5

5. Redact secret values from the output

Before the output is persisted to the database or shown in the dashboard, redact_secrets_from_output recursively walks every string in the JSON and replaces any occurrence of a known secret value with [REDACTED:NAME]. Only values four characters or longer are considered, sorted by length descending so long values cannot be partial-matched by short ones.

6

6. Write the sanitized output to ClickHouse and Postgres

The version that lands in the audit store has the redaction markers, not the original values. A compliance reviewer querying the trace sees [REDACTED:PATIENT_MRN] in place of the actual MRN, and the SHA-256-anchored AES-256-GCM ciphertext stays in org_secrets where it started.

Runtime path of a regulated value

Executororg_secretsWin32 formAudit storedecrypt org_secrets rowHashMap<name, value>type_into_element(MRN field)EditPattern.SetValue OKbuild output JSON (raw values)redact_secrets_from_outputwrite [REDACTED:MRN] tracerow id, no PHI persisted

The arrow that crosses from the executor to the audit store carries a redaction marker, not the value. The original ciphertext stays in the org_secrets Postgres row where it started, and the destination Win32 form holds the typed value as part of the system of record (Epic, Fiserv, Veeva), where the customer's existing controls already govern access.

Why this matters more for desktop pre-fill than for browser pre-fill

Most AI form-fill products live in the browser. They watch a tab, they type into HTML inputs, and the pre-fill rarely involves a value the regulator is going to ask about because the SaaS form is already a regulated artifact owned by a vendor with its own audit story.

The pre-fills that compliance officers are actually nervous about are the ones into legacy Windows desktop apps with no API and no DOM. The form that records a HIPAA grievance lives in Epic. The form that captures a Reg E dispute decision lives in Jack Henry SilverLake. The form that records a serious adverse event lives in Veeva Vault Safety or LifeSphere. The form that closes a regulated customer complaint lives in a Salesforce or Zendesk record that has to feed a quarterly regulator filing. None of those have a public REST API the AI can POST to. The only way an AI can pre-fill them is by typing into the UI through Windows accessibility primitives, which is exactly where Mediar runs.

That architectural choice changes the privacy story. The values are not flowing through a vendor API where the vendor owns the audit posture. They are typed locally, on a Windows machine, into a system of record the customer already owns. The only place the pre-fill tool can leak them is its own log. So the only question that matters is what happens to the workflow output before it is persisted, which is the question redact_secrets_from_output answers.

What the audit conversation looks like in practice

A healthcare privacy officer reviewing this for a HIPAA pre-fill workflow does three things. They read the TypeScript workflow file and check that the typed values are minimum-necessary for the task. They run a sample workflow against a non-production EHR and pull the execution trace from the dashboard, then confirm every field that should be redacted shows [REDACTED:NAME]. Finally they clone github.com/mediar-ai/terminator and ripgrep for gemini, claude, openai inside the executor crate to confirm there is no model call in the hot path. Three steps, all checkable, none of them require trusting a marketing claim.

A bank's model risk team running through SR 11-7 on the same deployment cares about a different question: is the system deterministic, and is the deterministic part documented? Yes to both, because the workflow file is the documented artifact and the executor has no inference call between input and output. The model risk team signs off on the recording-time model the same way they sign off on a batch-time analytics pipeline, separately from the runtime.

A pharma quality lead reviewing for 21 CFR Part 11 and EU GVP Module VI cares about who, what, when, and immutability. The orchestration layer captures who. The eight-field semantic record per step captures what and when. Immutability is a property of the audit store (commonly an append-only ClickHouse instance with role-gated writes), and the redacted trace makes the audit reviewable without exposing patient identifiers to the reviewer.

Walk through the redaction primitive against your own workflow

Bring a recorded form-fill into Epic, Fiserv, Salesforce, or Veeva and we will run the trace, show you where each regulated value travels, and what the audit log looks like when it lands.

Frequently asked questions

Frequently asked questions

What does 'regulatory form pre-fill' actually mean for a HIPAA-bound team?

Two screens at once. There is the source document, which is usually a referral, a prior authorization PDF, an insurer denial letter, or a structured payload from a payer portal. And there is the destination form inside the system of record: a registration screen in Epic, a charge entry in Cerner, a claim in eClinicalWorks, an authorization screen in Salesforce Health Cloud. The pre-fill is the AI typing values from the source into the destination, in the right order, respecting the destination's validation rules. HIPAA does not care what the source looks like. It cares that the values typed (MRN, DOB, diagnosis codes, insurer member id) never appear in a place a non-treatment-team person can read them. That is what makes the runtime mechanism more important than the marketing posture.

And for a GDPR-bound team in customer service or pharma?

The same shape, with a different threat model. The values are now data subjects' names, addresses, dates of birth, telephone numbers, and identifiers a regulator can tie back to a person under Article 4(1). For a pharma post-market safety team, the destination form is a vigilance case in Veeva Vault Safety or ArisGlobal LifeSphere, or an EudraVigilance E2B(R3) submission. For customer service, it is a CRM record, a refund authorization, or a regulator-facing complaint log. GDPR adds two specific demands: data minimization (Article 5(1)(c), only type the fields you actually need) and the right to be forgotten (Article 17, the data subject can ask you to erase the values that landed in the destination system). A pre-fill tool that keeps copies of the typed values in its own logs creates a second erasure surface the data controller now has to manage.

How does Mediar prevent the typed values from landing in its own logs?

By redacting them in the executor, before the workflow output is persisted. The function is redact_secrets_from_output in crates/executor/src/services/secrets.rs. After the workflow runs, the executor takes the JSON output, walks every string recursively, and replaces any occurrence of a known secret value with the marker [REDACTED:SECRET_NAME]. The version that lands in the dashboard, the audit store, and ClickHouse is the redacted one. The original ciphertext stays in the org_secrets Postgres table where it started, encrypted under AES-256-GCM with a 96-bit nonce.

Why is the four-character minimum (v.len() >= 4) on line 178 of secrets.rs there?

False-positive safety. If the org defines a one-character or two-character secret value (a status code, a Y/N flag), the redactor would otherwise greedy-match every character in the output that happened to equal it, turning every J into [REDACTED:STATUS]. The four-character floor is a pragmatic guard. For the regulatory values that matter (MRNs, member ids, tax ids, NPIs, EU national identifiers, claim numbers, policy numbers, device serial numbers), the floor is well below the natural length, so the redaction triggers reliably.

What does the length-descending sort on line 197 protect against?

Nested secret values. Imagine a workflow that uses two secrets: PATIENT_NAME = "Maria" and PATIENT_FULL_NAME = "Maria Garcia". Without the sort, an unlucky pass order would replace "Maria" first, leaving the output reading "[REDACTED:PATIENT_NAME] Garcia" and never matching the longer secret at all. By sorting secrets by length descending and substituting the longest first, the redactor handles the full name as a unit and the shorter secret never has a chance to partial-match. A reviewer verifies this on github.com/mediar-ai/terminator with a single ripgrep.

Is there a model in the loop while the workflow is typing into Epic, Fiserv, or Veeva?

No. The production executor crate at crates/executor in github.com/mediar-ai/terminator has zero references to gemini, claude, openai, or any inference library. A compliance reviewer can verify that with one ripgrep. The model only runs once, during the offline recording-processing pass that authors the workflow file. After the file is checked in, the runtime is deterministic. That is what lets a healthcare compliance officer or a pharma quality lead sign off on the workflow itself the way they would sign off on a SQL stored procedure.

What does the runtime do when the destination form changes (Epic upgrade, Veeva release, Salesforce Health Cloud rollout)?

The locator resolver in apps/desktop/src-tauri/src/focus_state.rs walks four strategies before declaring a step failed. It tries the recorded automation id first, then the window handle plus bounds, then the visible text content, then the parent window as a last fallback. Three of those four strategies do not depend on absolute screen position, so a quarterly UI tweak (a panel reorders, a tab gets renamed, a dialog gains a row) usually resolves through strategies one through three. If all four miss, the step fails into the deterministic trace and a human queues that step for re-recording. Nothing is silently retried with a different element.

How does this interact with HIPAA's 'minimum necessary' rule?

The TypeScript workflow file is the artifact that defines minimum necessary. Each step has the eight-field semantic record (step_title, user_intent, what_was_clicked, what_was_typed, target_element, parent_window, screenshot_id, side_effect_observed). A privacy officer reads the file the same way they would read a SOP. If the workflow types a diagnosis code into a billing system, the privacy officer sees that step and can challenge it. If the workflow types only the policy number into a claims system, that is also visible. The audit artifact is code, not a black-box model trace.

What about GDPR's right of access and right to erasure?

Two surfaces matter. The first is the destination system of record (Epic, Veeva, Fiserv): erasure there is the system owner's responsibility and is unaffected by Mediar. The second is the workflow execution log itself. Because redact_secrets_from_output replaces every typed regulated value with [REDACTED:NAME] before the log is persisted, an Article 17 request does not require purging Mediar's audit store of values that were typed; the values were never written there in the first place. The ciphertext in org_secrets is what an erasure request operates on, and that row can be deleted by org admins through the standard secrets API.

What about pharma 21 CFR Part 11 and GxP audit trails?

Part 11 cares about who did what, when, and whether the record can be modified after the fact. The TypeScript workflow file plus the deterministic execution trace give a Part 11 reviewer two of those for free: who (the user that triggered the run, captured at the orchestration layer) and what (the eight-field semantic record per step). The third, immutability, is a property of where the trace is stored, not of the runtime. Most teams write the trace to an append-only ClickHouse instance and gate write access behind a separate role. The trace contains the redacted values, so a Part 11 review of a vigilance case pre-fill in Veeva Vault Safety can run without exposing patient identifiers to the reviewer.

Where is the boundary with questionnaire AI tools (Drata, Vanta, Sprinto, Inventive)?

Those tools auto-draft answers in vendor security questionnaires, RFPs, SIG, CAIQ, NIST 800-171. The destination is a SaaS questionnaire portal with HTML inputs. They do real work and pair well with Mediar, but they do not type into the system-of-record forms a regulator audits in healthcare, finance, or pharma. Treating the two as the same product is the most common category mistake a compliance team makes when shopping for this. The pairing pattern is straightforward: questionnaire AI handles the SIG a prospect sends, system-of-record automation handles the SOX, HIPAA, NYDFS, GDPR, or 21 CFR Part 11 evidence trail that has to be written back into Epic, Fiserv, Salesforce, or Veeva.

Is the executor open source so a compliance team can verify the redaction themselves?

The execution layer is. The Terminator SDK that performs the UIA calls and the four-strategy element resolution is published as terminator-rs on crates.io and lives at github.com/mediar-ai/terminator under MIT. The redact_secrets_from_output function in crates/executor/src/services/secrets.rs is in scope. A compliance team can clone the repo, run cargo test on the redact_recursive tests, and confirm by reading the source that no LLM is called between the typed value and the log write. The orchestration layer, the cloud workflow runner, and the recording pipeline are commercial.