The act, not the diagram

Automation of workflow is a five-step pipeline. Here is the file that runs it.

Every guide on this topic explains the same diagram. A trigger fires on the left, a condition is checked in the middle, an action runs on the right. The diagram is correct. It is also the easy half. The harder half is how a workflow stops being a thing a person does in their head and starts being a JSON file with a trigger string, a terminator string, and a hierarchical graph of steps. That conversion has a specific shape, named in code, and you can read it end-to-end in roughly three hundred lines of Python.

Matthew Diakonov, Written with AI

Published April 29, 202612 min

Why the standard explainer is the easy half

Pull up the top results for this topic and read them side by side. Atlassian, Zapier, Workato, Hyland, Outsystems, Zoho, TechTarget, Creatio. Every one of them converges on the same model. A workflow is a sequence of steps. Automation is what happens when something fires that sequence without a human present. The bridge between the two is a trigger. They differ on which connectors they list, which industry they pitch, and how aggressively they badge themselves as AI. They agree on the trigger-condition-action diagram.

The diagram is correct. It describes the runtime. It does not describe how the workflow got there. The first step in every one of those guides reads roughly “define the workflow you want to automate,” which is the equivalent of a recipe book that starts with “first, decide what you are cooking.” The interesting work, the work a real ops team has to actually do, is taking a person’s actual day on a Windows desktop and turning it into a structured object that can be scheduled, monitored, and audited.

That conversion is the act referenced by the phrase “automation of workflow.” In the Mediar monorepo it is one Python file, one decorated function, and five sequential POST requests against five Next.js API routes. The rest of this guide opens that file and walks each step.

The orchestrator: one function, three hours, five endpoints

The whole pipeline is a single Modal function. It is a sales-led pilot tool, not a fire-and-forget batch job, which is why it is decorated with keep_warm=1 (one container kept hot to avoid cold-start pause) and timeout=10800 (three hours, in seconds, as the outer wall on the entire run).

# modal_apps/workflow_synthesis_orchestrator.py
@app.function(
    secrets=[modal.Secret.from_name("supabase-secret")],
    timeout=10800,  # 3 hours = 10,800 seconds
    keep_warm=1
)
def orchestrate_workflow_synthesis(
    user_id: str,
    model: str,
    start_date: str,
    end_date: str,
    user_instructions: str = ""
) -> Dict[str, Any]:
    """
    Run all 5 workflow synthesis steps by calling Vercel
    endpoints sequentially.
    """

Inside the function body, five sequential requests.post calls hit the production Vercel deployment at app.mediar.ai. Each one carries its own thirty-minute HTTP timeout. The pipeline is a straight line, not a DAG, and the order is fixed.

orchestrate_workflow_synthesis (modal_apps/workflow_synthesis_orchestrator.py)

initiate

step 1 of 5

refine

step 2 of 5

boundaries

step 3 of 5

synthesize

step 4 of 5

timeline

step 5 of 5

The five labels on the diagram correspond to five endpoint folders under apps/web/src/app/api/: initiate-workflow-analysis, refine-workflow-list, define-workflow-boundaries, synthesize-workflow, and analyze-raw-timeline-events. The remainder of this guide walks them in that order.

Step 1 of 5

Initiate. Look at the raw recording and produce candidate workflow names.

The first endpoint, /api/initiate-workflow-analysis, takes a user id, a model name, a start date, and an end date. It pulls the user’s low-level event analyses from Supabase for that window, runs them through a Vertex AI prompt called PROMPT_SYNTHESIZE_CONTEXT, and streams back a Server-Sent Events response with two pieces of state: a workflowContext object (essentially a high-level read on what the user does, what industry, what tools they live in) and a workflowNames list (a first-pass guess at the recurring workflows visible in the recording).

The output is intentionally rough. It is the model’s first guess at the universe of automatable workflows, with no deduplication and no boundary information. A typical first-pass list at this stage might contain ten workflow names where four turn out to be the same thing called three different ways and two turn out to be one-off events that happened to fire during the recording window. That is fine. The next step exists precisely to prune the list.

The streaming response shape matters here. Step one is the only one of the five (along with step five) that uses SSE rather than a single JSON response. The orchestrator parses the SSE chunks with a small parse_sse_response helper, scanning for lines that start with data: and accumulating the final state into one dict. The reason for the streaming format is that the synthesis prompt at this step can run for several minutes, and the desktop UI uses the intermediate progress events to keep the user informed instead of showing a spinner that looks frozen.

Step 2 of 5

Refine. Take the rough list and prune to the workflows that are actually recurring.

The second endpoint, /api/refine-workflow-list, takes the workflowContext and draft_workflow_names from step one and runs them against PROMPT_REFINE_WORKFLOWS_AND_CONTEXT. The model is told its job is to dedupe the list, throw out one-shot events that are not real workflows, merge synonyms, and return a refined_workflow_names list that has been cleaned of the first-pass noise.

This is the step that most explainers gloss as “decide what to automate.” In source it is forty lines of TypeScript plus a Vertex call with a structured-output schema named WORKFLOW_REFINEMENT_SCHEMA. The same 1,000-row analysis window from step one is re-fetched (the Supabase query lives in the route file, not in shared state) so the refinement model is looking at the same evidence the initiation model was. The per-step timeout is thirty minutes, the same as every other step. In practice the refinement call returns in under five minutes on a representative recording, but the cap exists so a stuck Vertex response cannot block the rest of the pipeline.

The output of step two is still just a list of names. No triggers. No actions. The next step is where the workflows start becoming candidates for automation rather than labels on a list.

Step 3 of 5

Boundaries. Find the trigger and the terminator for each workflow.

Step three is the moment a workflow becomes automatable. The third endpoint, /api/define-workflow-boundaries, takes the refined names and runs them against WORKFLOW_BOUNDARIES_PROMPT, with the structured-output schema forcing every workflow in the response to carry three required fields. The schema is short and worth printing in full.

// apps/web/src/lib/prompts.ts (lines 472-489)
// The schema that turns a workflow name into something automatable.
export const WORKFLOW_BOUNDARIES_SCHEMA = {
  type: "object",
  properties: {
    workflows: {
      type: "array",
      items: {
        type: "object",
        properties: {
          workflow_name: { type: "string" },
          trigger:       { type: "string" },
          terminator:    { type: "string" }
        },
        required: ["workflow_name", "trigger", "terminator"]
      }
    }
  },
  required: ["workflows"]
};

Three required fields. workflow_name carries forward from step two unchanged. The two new fields are where the value is. trigger is a sentence describing the concrete event that starts the workflow. terminator is the concrete event that ends it. The example in the docstring of WORKFLOW_BOUNDARIES_PROMPT uses an invoice workflow:

trigger: “The workflow begins when an email with ‘invoice’ in the subject arrives from a known vendor.”
terminator: “The workflow ends when the payment status for the corresponding invoice is marked as ‘Scheduled’ in the accounting software.”

Read those two strings carefully. They are written in business language, not in code. They name a concrete observable event on one side and a concrete observable end-state on the other. With the trigger string in hand, a downstream system can answer “is this workflow starting right now?” against any arriving event. With the terminator string, it can answer “did this workflow finish?” The boundary pair is what every other guide hand-waves when it tells you to “define the workflow.” In Mediar’s pipeline the pair is named in a JSON schema, the schema is enforced by structured output, and the result is checked into the database.

One detail worth pulling out: step three accepts a userInstructions string in its context payload. If the user typed something like “treat anything in Outlook as out-of-scope” into the desktop UI, that string flows into the prompt at this step (and at step four), and the model uses it to override its default reading of the boundaries. The override is by passing notes through, not by adding a configuration flag, which is the Mediar shape: prompt engineering as a first-class user-facing surface.

5 / 10800s / 1800s

“Five named API endpoints fired in fixed order by one Modal function, with a three-hour outer wall and a thirty-minute per-step inner wall. The trigger and terminator pair, the schema field that defines the start and end of an automatable workflow, emerges only at step three. Every step writes a structured JSON response under a Vertex AI structured-output schema.”

modal_apps/workflow_synthesis_orchestrator.py and apps/web/src/app/api

Step 4 of 5

Synthesize. Turn the boundaries into a hierarchical, runnable workflow object.

The fourth endpoint, /api/synthesize-workflow, is the largest of the five route files (around 640 lines). It takes the workflows-with-boundaries from step three, the same 1,000-row analysis window, up to 500 transcript rows from agent_live_transcriptions, the user’s high-level workflowContext, and any userInstructions, and runs them against WORKFLOW_SYNTHESIS_PROMPT. The output schema is large enough that printing it would fill the rest of this page; the interesting part is the rules the model is held to.

// apps/web/src/lib/prompts.ts, WORKFLOW_SYNTHESIS_PROMPT
// (excerpt of the rules the model is held to at step 4)
//
// 5. Synthesize Hierarchical Steps: Each workflow breaks into
//    high-level steps. Each step breaks into granular substeps.
// 6. Detail Each Sub-step: For every sub-step you must define
//    its inputs (what triggers it), outputs (what results from
//    it), and business_logic (the rules governing it).
// 7. Define Workflow Variations (Types): Identify and define
//    different paths the workflow can take. Describe the
//    conditions for each type.
// 8. Identify Concrete Examples (Instances): Extract specific,
//    concrete examples of the workflow being executed.
// 11. Do Not Hallucinate: Base all synthesized information
//     directly on the provided context, event data, and
//     conversations.

Read the numbers. Each workflow becomes a tree: high-level steps, sub-steps under each step. Every sub-step gets three required string fields, inputs (what triggers it), outputs (what results from it), and business_logic (the rules governing it). The branching paths the workflow can take become workflow_types, with a name, a description, and a conditions object. The concrete past executions of the workflow become workflow_instances, with a descriptive name and a structured instance_data dict.

The reason step three exists separately is exactly so the model can spend its full reasoning budget on synthesis here. The step three call has one job: fill in two strings per workflow. The step four call has the much larger job of generating a tree with ten or twenty leaves, each carrying three string fields. Splitting them lets each call use a focused prompt, a small schema, and a constrained context window. Trying to do both in one call produces structured outputs that mis-fill or skip the deeper fields, which is a failure mode anyone who has done structured-output prompt engineering will recognize.

The synthesized workflow lands as JSON on the response, gets written back to the orchestrator’s results dict understep_results.step4, and is also persisted to Supabase by the route file before the response returns. After step four, the workflow exists as a structured object that the desktop UI can render, the Rust executor can run, and a human reviewer can read.

Step 5 of 5

Timeline. Map the synthesized workflow back to the raw evidence.

The final endpoint, /api/analyze-raw-timeline-events, is the part most explainers do not have an analogue for. It takes the synthesized workflows and walks the original chronological event log, labeling each segment with which workflow it belongs to. The output is a workflow_mappings list: a sequence of (workflow_name, time_range, supporting_event_ids) triples that lets the desktop UI overlay the synthesized workflows onto the user’s actual recorded day.

The audit value is the point. After step four, the workflow is a definition that claims to describe what the user did. After step five, every claim in that definition can be traced back to a row in low_level_workflow_analyses, with a timestamp and an analysis id. A reviewer can open any sub-step in the synthesized object, click through to the supporting events, and verify that the workflow as written matches the workflow as observed.

This is the step that turns the pipeline from “impressive AI demo” into “something the internal compliance team can sign off on.” In a regulated industry (banking core systems on Jack Henry, Fiserv, FIS; healthcare on Epic or Cerner; insurance claims) the audit trail is not optional. The pipeline ships the audit trail as step five rather than as a bolt-on, which is the difference between a product that has thought about the deployment surface and one that has not.

The bounds the pipeline enforces on itself

Six numbers, each named in source, each a policy decision rather than a tunable knob. They are the right shorthand to take into a buying conversation about how a vendor turns observation into an executable workflow.

5 endpoints

The orchestrator fires exactly five named POST routes in fixed order. Adding a sixth means editing the orchestrator function, the queue worker, and the desktop progress UI in one pull request.

10,800 seconds

The Modal function decorator caps the whole orchestration at three hours. After that the function is killed and the desktop sees an error, never a half-finished workflow.

1,800 seconds per step

Each individual HTTP POST against the Vercel endpoints carries a 30-minute timeout. A stuck Vertex AI call cannot block the rest of the pipeline indefinitely.

1,000 analyses

Steps 2 through 4 each pull at most 1,000 rows from low_level_workflow_analyses for the chosen window. The cap is the working set every prompt is allowed to see in one round.

500 transcript items

If the user has live conversation transcripts in the chosen window, steps 3 and 4 mix in up to 500 of them. Conversation context is treated as evidence with the same time filter as screen activity.

keep_warm=1

One Modal container is kept warm at all times so the first request does not pay container start. The orchestrator is a long-running, sales-led pilot tool, not a fire-and-forget batch job.

The three-hour wall, in one number

The outermost bound is timeout=10800 on the Modal function decorator. In seconds. That is:

0 s

the wall the orchestrator function refuses to cross

Three hours. After that, the Modal function is killed, regardless of which step is in flight. The five-step pipeline runs to completion in under twenty minutes on a representative recording, but the wall exists so an unusually slow Vertex day cannot leave a pilot run open for longer than an afternoon.

What this means if you are evaluating tools

The reason it is worth opening one orchestrator file rather than reading another high-level explainer is that the five steps and the six bounds tell you, more honestly than any feature list, what the vendor is actually doing when they say they will “automate your workflow.” If the vendor cannot name the equivalent of step three (the moment a workflow becomes automatable, the moment a trigger and a terminator string are named), then they are selling the runtime diagram, not the conversion that produces something for the runtime to run.

UiPath has Task Mining. Microsoft has Process Insights. Celonis has a process discovery suite. Each has analogues of these five steps. The Mediar version is open about its shape because the orchestrator is one Python file you can read at modal_apps/workflow_synthesis_orchestrator.py, every API route is one route file you can read under apps/web/src/app/api, and every prompt is one named export under apps/web/src/lib/prompts.ts. The runtime executor that consumes the output is the Rust Terminator project on github.com/mediar-ai/terminator, published under MIT.

That openness is the difference between evaluating the pipeline and trusting a vendor slide that summarizes it. Five steps, three required fields at step three, one tree per workflow at step four, an audit trail at step five, six bounds. The shape of how a workflow gets automated, in less than a screen of numbers.

Bring a workflow you have been trying to automate. We will run it through these five steps live.

Pick one workflow from a recent recording. We will fire the orchestrator on it, name its trigger and terminator together, and walk you through the synthesized object the same way you would read it after a real pilot.

Questions a buyer eventually asks

Why not call this 'workflow automation' the way every other guide does?

Because the phrase 'automation of workflow' is the act, and the act is what the standard guides skip. They reach for the runtime metaphor (a trigger fires, a condition is checked, an action runs) and explain that diagram in five different ways. The diagram is correct. It is also the easy half. The harder half, the half a buyer trying to evaluate Mediar versus UiPath versus Power Automate eventually asks about, is how a real workflow stops being a thing a person does in their head and starts being a JSON file with a trigger field, a terminator field, and a graph of steps. That conversion is what 'automation of workflow' means in this codebase, and it has a specific shape: five named API endpoints fired by one Modal function in a fixed order.

Where does the five-step pipeline live in the repo?

One Python file: modal_apps/workflow_synthesis_orchestrator.py. The orchestrating function is orchestrate_workflow_synthesis, decorated with @app.function(timeout=10800, keep_warm=1) on lines 41 through 45. Inside the function body, five sequential requests.post calls hit five Next.js API routes under apps/web/src/app/api: initiate-workflow-analysis, refine-workflow-list, define-workflow-boundaries, synthesize-workflow, and analyze-raw-timeline-events. There is no plugin registry, no DAG engine. The pipeline is a five-element sequence by design.

What does step three actually produce that the first two do not?

A trigger and a terminator for each workflow. Steps one and two produce a list of workflow names. Step three opens define-workflow-boundaries, which calls Vertex AI with the WORKFLOW_BOUNDARIES_SCHEMA, and the schema's required fields are workflow_name, trigger, and terminator. The trigger string is a sentence describing the concrete event that starts the workflow ('the workflow begins when an email with invoice in the subject arrives from a known vendor'). The terminator is the concrete event that ends it ('the workflow ends when the payment status is marked as Scheduled in the accounting software'). Without those two strings, there is nothing to schedule and nothing to detect completion against. Step three is the moment a workflow becomes a candidate for automation rather than a label on a list.

Why does step four need its own dedicated endpoint when step three already named the workflow?

Because the boundary is the contract and the synthesis is the implementation. Step three writes the trigger and terminator strings. Step four reads those plus the same 1,000-row analysis window plus up to 500 transcript items plus the user's high-level context, and produces a hierarchical workflow object: high-level steps, sub-steps under each step, inputs and outputs and business_logic strings on every sub-step, workflow_types describing the branching paths, workflow_instances naming concrete past executions. The synthesis schema is roughly fifteen times the size of the boundaries schema. Splitting the call in two lets the model spend its full reasoning budget on each task in isolation, instead of trying to fit both into one structured output.

What is the fifth step doing if the workflow is already synthesized at step four?

Mapping the synthesized workflow back to the raw evidence. Step five hits analyze-raw-timeline-events, which walks the original timeline of screen events and labels each segment with which workflow it belongs to. The output is workflow_mappings: a list of (workflow_name, time_range, supporting_event_ids) triples that lets the desktop UI draw the workflow on top of the user's actual recorded day. The earlier steps produce the workflow definition. Step five is the audit trail that proves the definition came from real evidence, which is the part that matters in any deployment that has to clear an internal compliance review.

What stops a stuck step from holding up the rest of the pipeline?

Two timeouts that work together. The Modal decorator on the orchestrator function carries timeout=10800, which kills the entire function after three hours regardless of what step it is on. The requests.post call on every step inside the function carries timeout=1800, which kills any individual HTTP request after thirty minutes. If step three hangs on a slow Vertex AI response, the per-step timeout fires first and the orchestrator catches the exception, marks step three as failed in the results dict, and returns. The whole pipeline never silently hangs because both layers refuse to wait forever, and they have different upper bounds.

Is the AI in the loop at runtime, or only at authoring time?

Only at authoring time. The five endpoints all call Vertex AI through callVertexWithStructuredOutput to convert raw events into the workflow definition. Once that definition lands on disk, the runtime that fires it is the Rust scheduler in apps/desktop/src-tauri/src/workflow_scheduler.rs and the Terminator executor that drives Windows accessibility APIs. There is no model call at 9:13 in the morning when the workflow runs. The split (AI synthesis at authoring time, deterministic execution at runtime) is the design choice that lets a workflow pass a SOC 2 audit and still benefit from a frontier model during the writing step.

Can the same five-step pipeline run for any tool, or is it specific to Mediar?

The shape is not specific to Mediar, the inputs are. Any orchestrator that turns observed user behavior into an executable workflow has to do roughly these five things: discover candidate workflows from raw events, prune the list to the ones that are actually recurring, find each candidate's start and end conditions, generate a hierarchical step graph with branching variations, and re-anchor the result back to the source evidence. UiPath Task Mining, Microsoft Process Insights, and Celonis Process Mining all contain analogues of these steps. The Mediar version is open about its shape because the orchestrator is one Python file you can read, every API route is one route file you can read, and each Vertex prompt is one named export in apps/web/src/lib/prompts.ts. That openness is the difference between evaluating the pipeline and trusting a vendor slide that summarizes it.

What happens when the pipeline fails halfway through?

The orchestrator catches the exception, fills in error and end_time on the results dict, and returns the partial state. The results dict already has steps_completed (an integer between 0 and 5) and step_results (a dict keyed by step1, step2, step3, step4, step5). A failed run that completed steps one through three returns with steps_completed=3, the partial step results filled in, and the error string from whichever step blew up. The caller can decide whether to resume from step four against the same time window or to throw the whole run away. Resumption is not automatic because resuming against shifted state is an unsafe default; failure is loud and explicit.

Adjacent reading on this product

Runtime

Workflow and automation, traced through one Rust scheduler

The runtime side of the same story. Three trigger types in TriggerConfig, sixty minutes of jitter, a thirty-minute hard execution cap, fifty rows of rolling history. The shape of unattended automation in Mediar's executor.

Read

Workflow automation tools split by surface, not by feature count

The category-level argument. Cron triggers, public APIs, browser DOM, OS-level desktop event stream. Which surface your workflow lives on is the only decision dimension that survives a non-trivial pilot.

Read

Architecture

What robotic process automation actually is, traced through the source

The mechanical answer in three layers: a six-event capture filter, a four-stage synthesis pipeline, and a four-strategy replay cascade. Each layer walked with the open-source files that implement it.

Read