Mediar AI, opened up

The AI in “Mediar AI” runs once, offline. Then it leaves the room.

Almost every description of this product reads the same way: “an AI agent watches you do a task once, then runs it 24/7.” The first half of that sentence is correct. The second half quietly implies an LLM is in the loop on every replay, and that is not how the runtime actually works. The model authors the workflow during recording. The runtime that executes it a million times after that is plain Rust calling Windows accessibility APIs, and it never opens a socket to a model. This page walks through that split using the source.

M
Matthew Diakonov
9 min

A small note on which Mediar this is

The name is shared with at least three unrelated companies (an ad-sales platform, a Boston biotech, and a Brazilian retail-analytics firm). This page is about Mediar AI, the Y Combinator backed company at mediar.ai building Windows desktop automation. The open-source executor underneath ships as terminator-rs. If you wanted the company background instead of the architecture, the company page is over here.

The thesis: two phases, one model boundary

Look at the lifecycle of one workflow as a sequence diagram. There are two distinct actors, the desktop recorder and the production executor, and the LLM only talks to one of them. Everything to the left of the boundary is intelligence. Everything to the right is mechanical.

Where the model lives, and where it does not

YouDesktop recorderVertex AI GeminiWorkflow fileProduction executorWindows appRecord onceStep JSON (Flash)8 fieldsRe-label (Pro)Context labelEmit .tsRead on every runUIA calls (deterministic)Tree + result

The first six messages happen once per workflow, at authoring time. The last three repeat every time the workflow runs. Notice that the production executor never sends anything to Vertex AI. That is the part most write-ups about this product skip, and it is the only part that matters when a CTO asks about latency, predictability, or what happens when a model provider has an outage.

Evidence one: the AI calls in the recording path

The desktop agent calls Vertex AI in exactly one file: apps/desktop/src-tauri/src/recording_processor.rs. That file imports crate::vertex_ai and calls vertex_ai::call_vertex_ai from four points in the file. Each call is part of the four-stage processing pipeline described in recording_prompts.rs: step analysis, label suggestion, workflow synthesis, and per-step generation.

ripgrep over the recording path

Two models do that work. The default for step analysis is gemini-2.5-flash, because the volume is one call per meaningful click and Flash is fast enough to keep up with a recording session. The re-labeling pass that turns “Clicked Submit” into “Submitted the new vendor master record” runs on gemini-2.5-pro because the prompt is reading three neighboring steps at once. Both are wired up in vertex_ai.rs with an explicit allowlist of model names; an unknown name falls back to Flash.

Evidence two: the AI calls in the runtime path

Here is the same exercise on the production runtime crate. This is the binary that picks a workflow off a queue, executes it against a Windows session, and writes the result back to the database. It runs in the cloud, it runs on customer infrastructure, and on heavy days it runs the same workflow tens of thousands of times. If a model were involved in the hot path, its name would show up here.

ripgrep over the runtime path

Zero matches. The executor crate is a Rust binary whose only network surface is to the internal database, the MCP server that wraps Terminator, and a workflow downloader for fetching the .ts file from cloud storage. Replace the model layer above with a different provider tomorrow and this binary does not change a line. That decoupling is what makes the system survive an OpenAI outage, an Anthropic price hike, or a frontier-model deprecation notice. The model only matters during recording.

0

Compliance teams in banking and healthcare cannot sign off on a workflow whose action sequence is non-deterministic. Pulling the model out of the runtime is not a performance optimization. It is the reason regulated industries can use this at all.

LLM calls in the production executor crate

What the recording path actually does

Zoom in on the left side of the boundary. While you record a workflow, the desktop agent is quietly turning every meaningful event into a structured step. The pipeline runs as soon as the session ends, not in real time, so a long recording takes a couple of minutes to process and a short one is ready before you have closed the lid.

Recording path: the AI side of the boundary

1

Watch

Capture click + UI tree

2

Strip

Drop coordinates, sizes

3

Gemini 2.5 Flash

8-field step JSON

4

Gemini 2.5 Pro

Re-label with neighbors

5

Synthesize

Group into workflows

6

Emit .ts file

createWorkflow + steps

The output is a TypeScript file that imports from @mediar-ai/workflow. You can read it, you can edit it, you can put it in source control. The model has compiled your behavior into something that looks like the kind of code an RPA developer would have written by hand, only without the months of selector debugging.

What the runtime path actually does

The right side of the boundary is the same shape but a different kind of machine. The executor reads the .ts file, hands each step to Terminator over MCP, and Terminator does what Terminator does: walks the live UIA tree, picks the matching element, and dispatches a real Win32 input event.

Runtime path: the deterministic side

1

Load .ts

MCP execute_sequence

2

Walk UIA tree

role + name + automationId

3

4-strategy match

id, bounds, text, window

4

Click / type

Native Win32 events

5

Verify after

Re-read tree, assert state

The four-strategy match in step three is the single most important deterministic primitive in the runtime. It is implemented in apps/desktop/src-tauri/src/focus_state.rs and tries, in order: the stored automation ID, the recorded window plus bounds, the visible text content, and finally the parent window. The first three are what absorb a typical UI tweak (a button moves five pixels left, a panel reorders). The fourth is a fallback that asks the next step to retry from a known-good window. None of those four strategies call a model.

For the deeper walkthrough of the recording-to-replay primitive, the inside-look guide opens up the eight-field semantic record per step and the YAML accessibility-tree snapshot that backs it.

Counter-evidence: where the AI does come back

The argument is “the model leaves the room after recording,” not “the model is never invoked again.” Three honest exceptions are worth naming, because most claims of this shape are absolutist and none of them survive contact with reality.

First, on a failed step. If the four-strategy cascade in focus_state.rs cannot find a matching element, the runtime can mark the step for re-recording. The next time a human walks that part of the workflow, the recording pipeline runs, the model authors a fresh step, and the workflow updates in place. The model is invoked, but not on the hot path; it is invoked on what is effectively a code patch.

Second, in the no-code workflow builder. The web app at app.mediar.ai/web has an AI assistant that can write or edit a step from a natural-language description. That is a tool inside the editor, not a runtime dependency. The .ts file it produces still ships to the same deterministic executor.

Third, on parts of the input itself. If a workflow needs to extract structured data from a PDF or a scanned document, the extractor is a model call. That happens at the workflow boundary, not inside the click loop, and it is declared explicitly in the workflow file as an input transform. Compliance teams treat it the same way they treat OCR: it is a documented step, not an emergent action.

Why this split is the actual product

The reason “AI watches your workflow” gets repeated everywhere is that it is the half of the story that fits on a homepage. The half that is harder to put on a homepage, and that does most of the heavy lifting, is the deterministic runtime. Three real consequences fall out of the split.

Latency stays in the milliseconds. A UIA tree walk plus a Win32 click is roughly 50 to 200 milliseconds on modern hardware. A Gemini call would add 1 to 4 seconds per step. Across a 40-step claims-intake workflow, that is the difference between two minutes and two and a half hours per claim.

Cost decouples from volume. Recording is a one-time cost in tokens. Replay cost is dominated by the Windows session, not by an LLM provider. That is the reason Mediar can quote $0.75 per minute of execution honestly: there is no per-step inference bill underneath that has to be subsidized at scale.

Auditability is preserved. The .ts file checked into source control IS the workflow. A reviewer can read it, a security team can diff it, and a compliance officer can sign off on it. That is hard to do when the action sequence is generated fresh by a model every run.

The split, in numbers

4
Vertex AI call sites in the recording path
0
LLM call sites in the production executor
2
Gemini models used (2.5 Flash and 2.5 Pro)
1
Times a workflow is processed by an LLM

Counts come from a fresh ripgrep over the current monorepo: four call sites in the recording path, zero in the executor crate, two model identifiers used in production, and one authoring pass per workflow. Numbers are reproducible if you clone the repo.

The resolution

So what is the right one-line description of Mediar AI? Something closer to: a desktop automation platform that uses Gemini to compile a recorded workflow into a deterministic TypeScript program, then runs that program on Windows with no model in the loop. The product is genuinely an AI product, in the sense that no traditional RPA tool could be built without a frontier-class model authoring the steps. It is also genuinely a deterministic product, in the sense that the runtime would not change if every model provider on earth went offline tomorrow.

The two halves of that sentence are the entire pitch. Most write-ups keep the first and drop the second, which is how you end up with the “AI watches and runs 24/7” line. The version that survives a second read of the source is more boring and more useful: a model writes the script once, and a Rust binary reads the script every other time.

See the recording-to-runtime split with your own workflow

Bring a Windows process and we will record it live, show the .ts file the AI emits, and run the deterministic replay against your environment in the same call.

Frequently asked questions

Where exactly does the AI run in Mediar AI?

Inside the desktop recorder, after a session ends. Specifically in `apps/desktop/src-tauri/src/recording_processor.rs`, which imports `crate::vertex_ai` and calls `vertex_ai::call_vertex_ai` from four points in the file. Those four call sites cover step analysis, context-aware re-labeling, workflow synthesis, and the per-substep generation pass. Once the file emitted from that pipeline is saved, the LLM is done with that workflow.

Which models does Mediar use?

Two Gemini models on Vertex AI. Step analysis defaults to `gemini-2.5-flash` because the volume is high (one call per meaningful event) and the schema is tight enough that Flash is reliable. The labeling pass uses `gemini-2.5-pro` because it is reading three steps at once and writing a single descriptive label that stands in for a generic action. Both are routed through `apps/desktop/src-tauri/src/vertex_ai.rs`, which also has a path for `gemini-pro-latest` (Gemini 3 Pro) for projects that want to opt into it.

If the AI is gone at runtime, what is doing the work when a workflow runs?

A Rust binary in `crates/executor`. It pulls a workflow from the database, talks to a local MCP server that wraps the Terminator SDK, and the SDK in turn calls Windows UI Automation (UIA) to find elements and dispatch input events. There is no model in this loop. You can verify that the executor crate has zero references to gemini, claude, openai, llm, or inference in any source file. It is shipped as a self-contained binary that never opens a network socket to a model provider.

How does it find the right button if the UI changed since the recording?

Through a four-strategy cascade in `focus_state.rs` that runs in pure Rust. It tries the recorded automation ID first, then window handle plus bounds, then visible text, then the parent window as a fallback. None of those strategies need a model. They need a Windows accessibility tree and a stored description from the original session. If all four fail, the runtime surfaces the failure and the step gets queued for re-recording, which is when the AI comes back into play for that one specific step.

Is this what people mean when they say 'AI agent'?

Not quite. Most products marketed as AI agents call a model on every step at runtime, which is why they are slow, expensive, and probabilistic. Mediar AI calls a model once, during authoring, then turns the result into deterministic Rust calls. So the AI is doing the same job a human RPA developer does: watching a workflow and writing code that captures it. The runtime does what compiled code does. That is the entire architectural bet, and it is the reason a workflow that costs a few dollars to author can run for months at fractional cents per execution.

Why split the AI out of the runtime instead of running an LLM live like Adept or CUA?

Three reasons. First, latency: a Gemini call adds 1 to 4 seconds per step, and an enterprise workflow has dozens of steps. Second, cost: a model call per step makes per-execution economics terrible at the volume RPA targets (hundreds of thousands of runs a month). Third, repeatability: an LLM in the hot path means the same input can produce two different action sequences. Compliance teams in banking and healthcare cannot sign off on that. Pulling the model out of the loop and into the authoring pass is what makes Mediar look like UiPath in operation while looking like an AI product during build.

Is the production runtime open source?

Most of it. The Terminator SDK that performs the UIA calls and the four-strategy element resolution is published as `terminator-rs` on crates.io and lives at github.com/mediar-ai/terminator under MIT. The orchestration layer, the database queues, and the cloud workflow runner are commercial. A team that wants to run the executor on their own infrastructure can build on Terminator directly and wire it into whatever queue they already have.

Does the AI ever come back at runtime, even as a fallback?

Only on a re-record. If a workflow step fails the four-strategy match, the runtime can flag the step for re-recording. The next time a human runs that section, the recorder captures the new tree, sends it through the same Gemini pipeline, and replaces the failed step. The runtime itself does not phone home to a model to pick the next click. That separation is intentional: it keeps the production path inspectable, auditable, and free of vendor lock-in to whatever frontier model is current that quarter.