Audit-the-loop guide

Workflow automation software, audited at the recording loop.

Workflow automation software is a recording loop plus a replay engine. The loop captures user actions on some surface; the replay engine reads what was recorded and acts on the same surface. The choice that determines whether a tool can automate a given workflow is which input vocabulary the loop is allowed to capture, and at what cadence. This piece is the audit of one specific recording loop, with the constants and gate functions named in the source tree, because that is the level the buying decision actually lives on.

Matthew Diakonov, Written with AI

Published April 29, 202610 min

Direct answer (verified 2026-04-29)

Workflow automation software records repetitive tasks on a specific surface (HTTP webhooks for Zapier and n8n; browser DOM for Skyvern, Browser Use, and CloudCruise; pixel patterns for vision-based RPA; OS-level accessibility events for UiPath, Power Automate Desktop, and Mediar) and replays them on a schedule or a trigger. Mediar's recording loop is on the OS-level surface and its capture cadence is published verbatim in apps/desktop/src-tauri/src/constants.rs: focus debounce 500 ms, property debounce 200 ms, queue cap 50 events, batch threshold 12 or 15 s, six meaningful-event variants, and a [N-5, N+5] labeling window. The published constants are the answer to “what does ‘AI watches your workflow once’ actually do?” Source code lives at github.com/mediar-ai/terminator.

The 'AI watches once' claim, broken down

Most pages selling workflow automation software repeat a single line. The AI watches you do the workflow once, and then the workflow runs by itself. That sentence has been on a marketing page for every recent agentic-RPA vendor. The sentence is a compression of a four-stage pipeline that has to be defined precisely or the recording produces a workflow that does not generalize.

Stage one is event ingestion. The recorder hooks the OS event stream and writes every input event to a circular buffer. The buffer is dense, hundreds of thousands of records per recorded hour, because mouse moves at 60 Hz and key downs and key ups and focus changes all show up. Most of the buffer is noise. Stage two is the meaningful-event filter. This is where most of the noise gets dropped and only the events that count as a step survive. Stage three is step analysis: the surviving events get a Gemini pass that turns each one into a StepAnalysis (what was clicked, what was typed, what content changed). Stage four is labeling and synthesis: the analyses get grouped into substeps and steps, and the synthesis output becomes a runnable workflow file in YAML or TypeScript.

The stages share a recording session. The session id flows from stage one (where the buffer is keyed by it) through stage four (where the workflow file is written under the same id). Inside each stage there are gate functions that decide when the next stage can start, and a small set of integer constants that decide which events survive each filter. Those gate functions and those constants are the load-bearing details. The rest of this piece is a walk through the ones the recorder publishes.

The constants file, end to end

Every integer the loop relies on lives in one file: apps/desktop/src-tauri/src/constants.rs. There are nineteen constants spread across five sections (API configuration, analytics, LLM processing, event queue manager, workflow recorder, app context manager). The recording-loop ones are below.

apps/desktop/src-tauri/src/constants.rs

A few of these are worth a paragraph each.

FOCUS_DEBOUNCE_MS = 500. On Windows, a single click in a complex control like a SAP grid fires several focus events in rapid succession (parent grid, row, cell, in-cell editor). Without a debounce, the recorder treats each one as a separate event and emits four near-duplicate steps. The 500 ms threshold is wide enough that a normal click sequence collapses to one focus event, narrow enough that a deliberate click-pause-click stays as two.

PROPERTY_DEBOUNCE_MS = 200. Some controls fire multiple property changes per logical edit (a numeric input echoes both the value and the formatted-display property; a list view echoes both the selected-item and the highlighted-item). Coalescing them inside a 200 ms window keeps the recorded event count down to one per logical edit.

MAX_QUEUE_SIZE = 50, MIN_BATCH_SIZE = 12, BATCH_TIMEOUT_SECONDS = 15. The event queue manager either flushes when twelve events have arrived or when fifteen seconds have passed since the last flush, whichever comes first. The queue is capped at fifty so a long stall in the LLM pass cannot run the recorder out of memory on a long session. These three are the cadence at which raw events become candidate steps.

MAX_STATES_PER_APP = 2, MAX_TOTAL_STATES = 25, MIN_CAPTURE_INTERVAL_SECONDS = 15. The app context manager captures application-level state snapshots (the current window's full UI tree at the moment of an interesting transition) at most twice per app, at most twenty-five times across the session, with a fifteen-second floor between consecutive captures of the same app. The point is to give the synthesis stage enough cross-step context to label a step correctly without ballooning the recording size on a workflow that spends an hour inside a single application.

RECENT_EVENTS_LIMIT = 5. The Gemini step-analysis pass receives the previous five meaningful events as context for each new event. Five is the floor that lets the analyzer disambiguate a click that depends on the prior fields being filled in. It is the same five that appears on the trailing side of the labeling gate.

The meaningful-event filter

Stage two of the loop is where the noise drops out. The recorder stores every event for replay-time fidelity, but the analyzer only sees the ones that pass is_meaningful_event_type. There are exactly six types in the match.

recording_processor.rs

The six are: a native desktop click on a button or button-like control, a click captured inside a browser via the chrome extension, a complete text input (the recorder aggregates keystrokes into a single text-input-completed event when focus leaves the input or after a typing-idle timeout), a browser tab navigation, an application switch (alt-tab or the equivalent), and a file open. Anything not in this list (mouse moves, hover focus changes, scroll wheel ticks, individual keystrokes that have not yet aggregated into a text-input-completed) does not get analyzed. It still ends up in the raw log and is available to the replay engine if the saved workflow needs it, but it is invisible to Gemini.

The six-variant list is not arbitrary. It is the set of events that the four-stage pipeline can label without inventing context. Adding a seventh variant (say, a hover) would require a label schema that names what a hover means, and the synthesis stage would need to decide whether a hover-then-click counts as one step or two. Six is the floor that keeps the workflow file readable. Vendors who ship a richer event vocabulary tend to also ship more unreviewable steps; vendors who ship a smaller vocabulary tend to miss steps. Six is the published answer.

The labeling gate

The third gate function in the loop is the labeling gate. It is the one that explains why a workflow recording does not finish processing the moment the user stops recording. The labeler waits for a sliding window of analyses to complete before it assigns a label to any step, because labels depend on trailing context.

recording_processor.rs

The window is five events on each side. Labeling for event N waits until events N-5 through N+5 have all completed step analysis. Labels assigned without trailing context tend to be wrong on a predictable set of cases. A click on a Save button in a SAP transaction sometimes triggers a confirmation dialog and sometimes finalizes the transaction directly; the right label depends on what the next two or three events look like. A type in a vendor-id field is sometimes followed by a tab and sometimes by an enter; the right label depends on whether the next event was a focus change or a submit. Without the window the labeler races ahead and gets these wrong.

The reason this matters for a buyer is that recording playback time is not just the duration of the recording. There is an additional pass at the end where the trailing events of the last ten or so steps get labeled. On a forty-step recording the trailing pass adds maybe ninety seconds of LLM time. Vendors who claim “your workflow is ready the instant you stop recording” are either skipping the trailing labels (and paying for that with worse step naming) or running the labeler locally without the trailing window (and paying for that with mis-labeled steps in the workflow file).

One recorded session, end to end

The selector vocabulary the replay engine actually has

The other half of any workflow automation software is the replay engine. The replay engine reads the saved workflow and finds the element on the screen that the recorder originally captured a click against. The vocabulary the engine has for finding elements is what decides whether the saved workflow still works after the UI changes. Mediar's vocabulary, defined in the public Terminator SDK at github.com/mediar-ai/terminator, is eight selector factories plus a chain combinator.

Name. Match by the accessibility name of the element. The most stable selector when the underlying widget keeps its accessible label across releases.
Role plus optional Name. Match by accessibility role (Button, Edit, ComboBox) and optionally narrow to a specific name. The default workhorse, because role is the most-stable accessibility property and name is the second-most-stable.
Id. Match by accessibility id. The strongest selector when the application cooperates by stamping stable ids; useless when the ids are session-generated.
Text. Match by the text the element displays. Useful when the visible label is a stable copy and the accessibility name is not.
Path. An XPath-like string describing the position in the accessibility tree. The fallback when nothing else uniquely identifies the target.
NativeId. The platform-specific automation id (AutomationID on Windows). When present, the most precise selector; when absent (legacy controls without UIA support), Role plus Name takes over.
ClassName. The widget class. Mostly used to disambiguate look-alike controls inside a single window.
Attributes. An attribute map. The escape hatch for custom-control attributes the other selectors cannot reach.
Chain. Combine two selectors with the within-relationship. Used when the target is a Role plus Name pair that only resolves uniquely under a specific parent.

The Locator on top of the selectors carries a default timeout of ten seconds and a wait-condition pattern. When the replay engine hits a step, it does not just call “click element X.” It calls “wait for element matching selector S to satisfy condition C, with timeout T, then click.” That is the layer that absorbs almost all of the brittleness an RPA tool would otherwise have, because the screen is allowed to take its time to render before the engine commits to a click.

$0.75/min

“Per-minute pricing matches the unit the desktop replay engine actually consumes. The $10K turn-key fee converts to credits with a bonus, so the cost shape is prepaid usage, not a license.”

Mediar pricing — mediar.ai/pricing

What the published constants tell a buyer

The actual buyer's question is not which tool has the prettiest graph view, the largest connector library, or the loudest demo. It is whether the tool can be made to do the thing on a real workflow that lives inside a real Windows desktop, and whether the recording will still work three months later when the IT team upgrades a SAP transport or a Jack Henry teller workstation picks up a service pack.

The signal that the tool can answer that question honestly is whether the recording loop is documented at the level of integer constants and gate functions. A vendor who publishes FOCUS_DEBOUNCE_MS = 500 has decided what its recorder believes is one click, and has committed to that decision in a way that the buyer can audit. A vendor who refuses to publish the equivalent number is asking the buyer to trust the demo.

The two camps have different failure modes. The published-loop camp ships occasional brittleness around edge cases that the constants do not yet cover (a custom control with five focus events instead of four), and the path to fix those is to propose a new constant. The closed-loop camp ships brittleness around edge cases that nobody can see, and the path to fix those is to wait for the next release. The first failure mode is the one a serious ops team can plan around. The second is the one that produces stalled UiPath rollouts.

Where this approach refuses to be the answer

Two cases. First, applications that genuinely expose nothing through Windows UI Automation. A small handful of custom OpenGL surfaces and certain legacy Java Swing themes do not publish a useful accessibility tree. The selector vocabulary cannot describe an element the OS does not expose, and the recording loop cannot match meaningful events on a window whose role is always “Pane” and whose names are always empty. Computer-vision RPA can sometimes work on those.

Second, pure browser-tab work where every screen is a public web app and the bundled cost of automation is anti-bot fabric (CAPTCHA solving, residential proxies, geo-targeting). A browser-native agent like Skyvern, Browser Use, or CloudCruise is sized for that surface. Mediar's runtime can drive Chromium in addition to desktop apps via the Chrome extension, but the cost shape is wrong if every step lives in a browser tab and the workflow needs dedicated proxy infrastructure.

Outside those two cases, the OS-accessibility surface is the right one for the canonical legacy desktop targets: SAP GUI, Oracle EBS, mainframe terminal emulators on Windows, Epic, Cerner, eClinicalWorks, Jack Henry, Fiserv, FIS, Excel-based workflows, and the long tail of vendor-specific Windows desktop apps that 100+ repetitive workflows per week tend to live inside. That is the surface the recording loop documented above is sized for.

See the recording loop run on your workflow

A 30-minute pilot call. We watch you do the workflow once on a SAP, Oracle, Epic, or Jack Henry screen, and you leave with the YAML or TypeScript file the loop produced, the constants and gates above visible in the trace.

Frequently asked questions

What is workflow automation software, in concrete terms?

Workflow automation software is a recording loop plus a replay engine. The loop captures user actions on some surface (HTTP webhooks for tools like Zapier and n8n, browser DOM for tools like Skyvern, OS accessibility events for tools like Mediar, UiPath, and Power Automate Desktop). The replay engine reads what was recorded and acts on the same surface. The choice that determines whether a tool can automate a given workflow is which input vocabulary the loop is allowed to capture; everything else is downstream of that. Mediar's recording loop captures Windows accessibility events with the constants and gates listed on this page.

Why do the FOCUS_DEBOUNCE_MS and PROPERTY_DEBOUNCE_MS values matter for buyers?

Because they decide what the recorder believes is one user action. Without a focus debounce, a single click that fires four focus events (one for the parent grid, one for the row, one for the cell, one for the editor inside the cell) becomes four steps in the recorded workflow. Mediar sets FOCUS_DEBOUNCE_MS to 500 ms and PROPERTY_DEBOUNCE_MS to 200 ms in apps/desktop/src-tauri/src/constants.rs. Vendors who refuse to publish their debounce numbers are leaving you to discover them experimentally on a live workflow. The values are not load-bearing on their own; the discipline of publishing them is.

What does the [N-5, N+5] labeling gate actually do?

It is a sliding window. When the recording loop has captured N meaningful events and a stage analyzer has produced a StepAnalysis for each, the next stage (label assignment) cannot run on event N until the analyzer has finished events N-5 through N+5. The reason is that labels are not local; the label for 'click Save' depends on what happened after, because some 'Save' clicks are followed by a confirmation dialog and others are not. Without the window, the labeler races ahead and assigns labels that get overwritten when the trailing context arrives. With the window, the labeler waits for the trailing five events. The function is check_labeling_gate in recording_processor.rs.

Why is meaningful_event_type only six strings? Other tools list dozens.

Because most of what a user does is not a step. A workflow recorded across a SAP order-entry pass produces hundreds of thousands of low-level events: mouse moves at 60 Hz, every individual keystroke, hover focus changes, scroll wheel ticks, and tooltips. The recorder ingests all of them, but is_meaningful_event_type returns true for exactly six event types: button_click, browser_click, text_input_completed (an aggregation of keystrokes), browser_tab_navigation, application_switch, and file_opened. Everything else stays in the raw event log for replay-time fidelity but does not get pushed into the LLM analysis queue. The shorter list is what makes the LLM step explainable; if it had to reason about every event, the cost per recording would balloon and the resulting workflow would be unreadable.

How is this different from the Mediar pages on workflow automation tools, workflow automation platform, and workflow and automation?

Those pages each take a different slice. The tools page covers the runtime surface choice (cron, public APIs, browser DOM, OS-level desktop). The platform page compares the YAML and the TypeScript output formats and where each one is loaded from. The workflow-and-automation page walks through the trigger configuration types (Cron, Manual, Webhook). This page is about the recording loop itself, the constants and gate functions that determine what gets captured and when. The four pages are independent reads; you can pick whichever question you have first and the others are linked at the bottom of each.

Is the recorder open source?

The Terminator runtime that powers it, including the Selector vocabulary, the Locator with timeout-and-fallback semantics, and the accessibility-tree readers, is published as an SDK at github.com/mediar-ai/terminator with TypeScript bindings on the Rust core. The product UI on top of it (the no-code recorder, the workflow synthesis pipeline, the cloud executor) is closed-source. The split matters when your security team needs to audit the runtime end-to-end, when you want to embed the recorder in a different agent loop, or when you want to add a custom event variant alongside the published six.

What does Mediar charge per minute and how does that compare to per-task tools?

$0.75 per minute of runtime, drawn against a $10,000 turn-key program fee that converts to credits. The unit is wall-clock time on the OS, which is what the desktop replay actually consumes. API-orchestration tools charge per task or per connector node, which is the right unit for steady high-volume webhook chains and the wrong unit for occasional but expensive desktop replays. A three-minute SAP order-entry pass is $2.25 of meter time at Mediar; the same workflow run as a per-task chain on a connector-based tool is priced by the number of API hops, which can be larger or smaller depending on the task graph. The honest read is to match the price unit to the work the tool does, not to pick the smaller per-unit number.

Where does this approach refuse to be the answer?

Two places. First, applications that genuinely expose nothing through Windows UI Automation. Some custom OpenGL surfaces and certain legacy Java Swing themes do not publish a useful accessibility tree, and the Selector vocabulary (Name, Role plus Name, Id, Text, Path, NativeId, ClassName, Attributes, plus chains) cannot describe an element that the OS does not expose. Computer-vision RPA can sometimes work on those. Second, pure browser-tab work where every screen is a public web app and the bundled cost is anti-bot fabric. A browser-native agent like Skyvern is sized for that surface. Mediar's runtime is sized for the desktop surface where SAP GUI, Oracle EBS, Jack Henry, Fiserv, FIS, Epic, and Cerner live, and where the accessibility tree exists because vendors maintain it for compliance reasons.

On the same recording-and-replay pipeline

The other audits

Surface choice

Workflow automation tools split by surface, not by feature count

The companion buyer's argument. Names the four runtime surfaces (cron, public API, browser DOM, OS desktop) and walks through the keystroke-level spec that decides which one a workflow lives on.

Read

Output formats

Workflow automation platform: the YAML and TypeScript formats Mediar persists

The output side of the same pipeline. What a recorded workflow looks like once the synthesis pass has run, where it gets stored, and how the YAML and TypeScript variants differ at replay time.

Read

Triggers

Trigger configuration in Mediar: Cron, Manual, Webhook, and the jitter knob

The runtime side: how a saved workflow gets fired. The TriggerConfig enum walks through Cron with optional jitter, Manual, and Webhook with an optional path.

Read