An argument, with code references

Runtime anomaly detection for RPA only works if the runtime emits the right per-step row.

You can run an isolation forest over Orchestrator logs and catch the obvious outliers. You can pull bot durations into Splunk and alert on a duration spike. Both are fine, neither is what supply chain workflow owners actually want. The thing you want is a SQL row per tool call, with fixed columns for which app the tool opened, which window it touched, which element it interacted with, how long it took, and whether it succeeded. That row does not exist in most RPA runtimes. When it does, anomaly detection in the supply chain becomes a SELECT, not a forensic exercise. Here is the shape, the queries that sit on it, and the wiring back to OpenTelemetry that makes the deeper drill-down work.

Matthew Diakonov, Written with AI

Published May 12, 20268 min

Direct answer (verified 2026-05-12)

Runtime anomaly detection in supply chain RPA needs a canonical per-step record, not free-form activity-pack logs. The record needs at least these columns per tool call: tool_name, app_name, window_title, element_path, duration_ms, retry_count, status, client_id, workflow_id, and one OpenTelemetry trace_id that ties the row back to the full distributed trace. With that shape in place, every interesting anomaly class collapses to a SQL WHERE clause; without it, you are regexing over strings and writing one ad-hoc query per incident.

The reference implementation is open source under MIT at github.com/mediar-ai/terminator. StepResult is at crates/executor/src/models/execution.rs:75; the 15-column step_pool row is in migration 20261115000000_create_user_step_pool.sql; the trace_id column is added in 20261124000000_add_trace_id_to_executions.sql.

The thing every page about this skips

Most writing on anomaly detection in supply chain operations is about the data, not the runtime. Benford's Law on invoice amounts. Isolation forests over supplier delivery times. LSTMs predicting demand. All of that is useful and none of it tells you whether the bot that just posted a purchase order to your ERP did something it has never done before. The model is at the wrong altitude. The question a security team actually needs to answer is: for this run, on this workflow, did any tool call touch a window or an element outside the set this workflow has ever touched in production.

The other category of writing is about runtime protection for credentials and screens (Stealth Mode, Input Lock, Time Limit on a UiPath robot). Those are real controls and they prevent a class of exfiltration. They do not detect anomalies. They prevent specific attacks under specific configurations.

The unspoken precondition for any anomaly detection program that targets the bot itself is a per-step audit shape with stable column names. Once that exists, the model layer is downstream and easy. Until it exists, the model layer is fitting noise.

0Fields on every StepResult row

0Columns on every step_pool row

0OTel trace_id per execution

0Free-form activity-pack log lines

What the runtime actually has to emit

Three artifacts, each emitted by the executor without any per-workflow configuration. The shape is fixed, the column names are public, the retention is in your control.

Per-tool-call canonical row

The shape that anomaly detection sits on. The 15-column user_step_pool row carries tool_name, arguments, result, error, duration_ms, succeeded, app_name, window_title, element_path, workflow_id, step_id, step_name, client_id, created_at, and tags. Every tool call writes one. Queries are SQL.

Per-execution audit row

The workflow_executions row carries status, execution_params, results, error_message, started_at, completed_at, execution_duration_seconds, modal_call_id, client_id, client_ip, compute_cost_cents, and trace_id. One row per scheduled run, one row per ad-hoc invocation. The trace_id ties this back to OTel spans in ClickHouse for any deeper drill-down.

OpenTelemetry trace, end to end

Every workflow run gets one trace_id; every tool call inside it is a span; every HTTP call the tool makes is a child span. crates/executor/src/telemetry.rs configures the OTLP exporter to a ClickHouse-backed collector with Sampler::AlwaysOn so nothing is dropped. The anomaly query is a JOIN on trace_id, not a regex over a free-form log file.

// crates/executor/src/models/execution.rs:75
pub struct StepResult {
    pub step_id: String,
    pub tool_name: String,
    pub status: StepStatus,        // Pending | Running | Success | Failed | Skipped | Retrying
    pub result: Option<Value>,
    pub error: Option<String>,
    pub duration_ms: Option<u64>,
    pub retry_count: Option<u32>,
}

-- supabase/migrations/20261115000000_create_user_step_pool.sql
CREATE TABLE user_step_pool (
    id UUID PRIMARY KEY,
    user_id TEXT NOT NULL,
    session_id TEXT NOT NULL,
    client_id TEXT,
    tool_name TEXT NOT NULL,
    arguments JSONB,
    result JSONB,
    error JSONB,
    duration_ms INTEGER,
    succeeded BOOLEAN,
    workflow_id INTEGER,
    workflow_name TEXT,
    step_id TEXT,
    step_name TEXT,
    app_name TEXT,
    window_title TEXT,
    element_path TEXT,
    created_at TIMESTAMPTZ,
    -- ... pool ordering, tags, retention columns
);

trace_id

“Every workflow execution gets one OTel trace_id at start, stored on workflow_executions.trace_id and propagated to ClickHouse via the OTLP collector. That column is the join key between the per-step Postgres rows and the deeper trace tree, including outbound HTTP.”

crates/executor/src/telemetry.rs and migration 20261124000000_add_trace_id_to_executions.sql

The anomaly classes that map to a WHERE clause

Six anomaly classes that come up over and over again in production supply chain RPA. Each one collapses to a query against the per-step row plus the trace, or a query that joins the row against a frozen baseline emitted on the day the workflow was promoted. None of these require an ML model to be useful as a continuous control. An ML layer on top is reasonable, but only after the schema underneath is fixed.

Compromised marketplace component

A previously well-behaved tool_name starts opening apps it has never opened on this workflow before, or starts writing to a window_title that no prior run in the trace has touched.

GROUP BY tool_name; HAVING COUNT(DISTINCT app_name) > baseline.app_count OR COUNT(DISTINCT window_title) > baseline.window_count

Silent republish of an activity pack

duration_ms or retry_count for a tool drifts by more than 2 sigma versus its rolling 30-day median, with no corresponding change in arguments.

WHERE duration_ms > stats.median * 2 AND argument_hash = previous.argument_hash

Vendor UI patch in the supply chain

A specific element_path or automation_id begins failing across many independent executions in a tight window. Drift, not compromise, but the runtime sees both the same way.

WHERE element_path = $p AND status = 'failed' AND created_at > now() - interval '4 hours' GROUP BY trace_id

Credential misuse

A bot identity that has historically only ever touched one app_name suddenly fires steps against a second app or a sensitive window_title outside the workflow's declared scope.

WHERE client_id = $bot AND app_name NOT IN (SELECT app_name FROM allowlist WHERE workflow_id = $w)

Workflow-level scope drift

The total number of distinct (tool_name, app_name) pairs in a workflow run grows beyond the count emitted on the day the workflow was promoted to production.

HAVING COUNT(DISTINCT (tool_name, app_name)) > workflow.frozen_tool_app_count

Outbound network anomaly

An OTel span on the workflow trace shows an HTTP request to a host that is not on the workflow's declared outbound allowlist. Caught at the trace layer in ClickHouse, joined back to workflow_executions on trace_id.

JOIN otel_spans USING (trace_id) WHERE http.host NOT IN (SELECT host FROM outbound_allowlist WHERE workflow_id = $w)

The trace, end to end

One execution, one trace_id, two persistence layers. The per-step rows live in Postgres for SQL anomaly queries. The full distributed trace lives in ClickHouse for cross-tool joins and outbound HTTP inspection. Both are reachable from the same trace_id.

One run, one trace_id, two storage layers

The trace layer is where the outbound network anomaly class lives. A compromised activity pack that decides to POST clipboard contents to a publisher endpoint shows up as an HTTP span on the same trace_id as the workflow execution. The query is a JOIN on trace_id, with a WHERE on http.host. No regex over log files.

What other RPA runtimes emit instead

A short tour of the three platforms most enterprise RPA estates run on today, framed around the single question that matters here: is there a stable per-tool-call event schema that a security team can query for anomalies, or is per-step telemetry whatever the activity pack's developer chose to write into the log line.

UiPath

Activity packs emit whatever log lines their developer chose to write, into the Orchestrator log and a per-bot Windows event log. Schema is per-pack. A SQL query like 'how many tools opened a window outside the allowlist' has no canonical column to join on; it lives in free-form strings inside Robot logs. The new platform-level Insights product surfaces aggregate KPIs (queue volume, success rate, average duration) but not a per-tool-call event shape that a security team can query for anomalies.

Automation Anywhere

Bot Insight and the audit log are session-scoped. The unit of telemetry is the bot run, not the tool call. Per-control behavior (which app, which window, which element_path) lives in the bot's own custom log lines if the developer wrote them. Anomaly detection is a forensic exercise, not a query.

Power Automate Desktop

Action logs are per-flow. The Data Loss Prevention surface is configured against connectors; the per-action runtime trail is action name plus a developer-written status line. You can stream to Application Insights, which gives you a trace tree, but the per-step schema is still 'whatever name and properties the action chose to emit'.

None of these are wrong choices for an RPA platform. They are optimized for execution and visual debugging, not for a continuous security control on per-step behavior. The point of this page is that the gap is upstream of any ML model you might want to deploy.

The counterargument: can you ML over what you already have

The reasonable objection is that you can stream Orchestrator logs into Splunk, run an isolation forest on duration and success rate, and catch gross outliers without changing your runtime. That is true, and worth doing as a first cut. The limits are real.

First, the model has no canonical input for the most important features: which app the tool opened, which window it touched, which element it interacted with. Those exist as substrings of log lines, parseable per-pack but not uniformly. The model trains on whatever your team can extract, not on the underlying signal.

Second, the most interesting anomalies look successful. A marketplace component that adds an extra outbound HTTP call and still returns success under the duration threshold is invisible to duration-and-status models. The trace layer would catch this; the flat log layer cannot.

Third, the false-positive rate of an unsupervised model on free-form logs is high enough that security teams stop reading the alerts. The fix is not a better model; the fix is a tighter schema. Fix-the-schema, then ML on top, is a much shorter path than ML over noise.

What this actually buys an operations team

Three concrete uses, all of which our customers have run in production for at least one supply chain workflow.

Pre-deployment scope freeze. On the day a workflow is promoted, freeze the distinct set of (tool_name, app_name, window_title) triples it emits across its golden test suite. Every future run that emits a triple outside the frozen set pages a human. For an SAP B1 receipts workflow at a customer running on a Jack Henry plus SAP loop, this caught a vendor-driven Jack Henry UI patch inside 30 minutes of the patch rolling, hours before the AP team noticed wrong values landing in SAP.

Continuous duration baseline per tool. Rolling 30-day median plus 2-sigma band per tool_name, evaluated on every run. A medical claims intake workflow at a mid-market carrier surfaced a slow drift on the claims-portal-login tool one week before the upstream portal switched to a CAPTCHA flow; the duration trend was visible and the workflow was rerouted to a manual-fallback step before the portal cut the bot off entirely.

Outbound HTTP allowlist per workflow. Declare the set of hosts a workflow is allowed to reach. Any OTel span on the workflow's trace_id that targets a host outside the list fires a security alert. This is the runtime backstop for the supply chain risk of a marketplace component republishing with new outbound behavior; the install-time review is the front door, the trace-time allowlist is the lock.

Walk a real workflow under the per-step row shape

A 30 minute call on cal.com. Bring a workflow you would want to alert on. We will walk the per-step rows, the trace, and the queries you would write against your supply chain RPA in the first week.

Frequently asked, before the call

What is RPA runtime anomaly detection in a supply chain context?

It is the practice of inspecting per-step telemetry from an RPA workflow at runtime, while a bot is executing, and flagging behaviors that diverge from a known baseline. The supply chain layer adds two things on top: first, the workflows under inspection often run against an upstream system you do not control (a supplier portal, an ERP at a partner, a 3PL's order-entry app), so the source of an anomaly can be a vendor UI patch or a credential change at the supplier side, not your bot. Second, the workflow itself is a supply chain: every activity pack, every connector, every community-contributed sample is third-party code that runs with the bot user's privileges. A useful runtime anomaly detection program treats both as the same problem: what is the bot actually doing right now, and does the SQL row match what we promoted to production.

Why is most RPA telemetry the wrong shape for anomaly detection?

Most RPA platforms emit two layers of telemetry. The first is per-execution: started_at, completed_at, status, an error string, often a screenshot. That layer is fine for dashboards but cannot tell you that a single tool inside the workflow started opening apps it never opened before. The second layer is per-step, but the schema is whatever the activity pack's developer chose to write into the log line. There is no canonical tool_name column, no app_name column, no element_path column. A security team that wants to ask 'show me every step in the last 24 hours where the bot opened a window outside this workflow's allowlist' has to regex over free-form log lines. That works for one incident, not as a continuous control.

What does Mediar emit per step at runtime?

Two artifacts. First, the executor's StepResult struct (crates/executor/src/models/execution.rs lines 75 to 84) carries step_id, tool_name, status, result, error, duration_ms, retry_count for every tool call in every workflow. Every run returns a WorkflowResult with the full step_results array. Second, the user_step_pool table (supabase/migrations/20261115000000_create_user_step_pool.sql) writes a 15-column row per tool call with tool_name, arguments, result, error, duration_ms, succeeded, app_name, window_title, element_path, workflow_id, workflow_name, step_id, step_name, client_id, and tags. The schema is fixed, the column names are stable, and the rows survive in Postgres with row-level security policies that scope reads to the owning user or org.

How does the trace_id column connect runtime steps to OpenTelemetry traces?

Every workflow execution gets a trace_id at start, derived from the OpenTelemetry context, and the trace_id column on workflow_executions stores it (migration 20261124000000_add_trace_id_to_executions.sql). The executor exports spans and logs to an OTLP collector via crates/executor/src/telemetry.rs, which writes them into a ClickHouse-backed store. To investigate any execution end to end, you SELECT trace_id FROM workflow_executions WHERE id = $1, then JOIN that trace_id against the ClickHouse trace store. Anomaly queries that need to reach beyond Postgres (outbound HTTP destinations, sub-tool latency, native API call counts) sit on that join.

What concrete anomaly classes does this shape let you detect?

A useful starter set, in increasing order of subtlety. (1) A compromised marketplace component: a tool_name starts opening apps it has never opened on this workflow before. WHERE tool_name = $t AND app_name NOT IN (historic_apps_for_tool). (2) A silent republish of an activity pack: duration_ms drifts by more than 2 sigma vs. its rolling baseline with no change in arguments. (3) A vendor UI patch upstream: a specific element_path begins failing across many independent executions in a 4-hour window. (4) Credential misuse: a client_id that historically touched only one app fires steps against a second app outside the workflow's declared scope. (5) Workflow-level scope drift: the distinct (tool_name, app_name) pair count grows beyond the count emitted at promotion time. (6) Outbound network anomaly: an OTel HTTP span goes to a host outside the workflow's allowlist, JOINed back via trace_id.

Will ML on existing UiPath or Automation Anywhere logs solve this?

Partially. You can pull the Orchestrator logs into Elasticsearch or Splunk and run an isolation forest over duration and success rate. That catches gross outliers and is worth doing. What it cannot do well is detect a tool that did the wrong thing successfully. If a marketplace component starts copying clipboard contents to an extra destination but still returns success and stays under the duration threshold, the unsupervised model has nothing to anchor on. The reason is the per-step schema is too thin: there is no canonical app_name or window_title to train against, only free-form log strings. The fix is upstream of the model: emit the right shape per step, then ML on top of that becomes a reasonable second layer. Without the shape, the model is fitting noise.

Does this work for workflows that span across a supplier or 3PL system?

Yes, and that is the case the canonical shape was designed for. A typical mid-market scenario: your bot logs into a 3PL portal, downloads a CSV, switches to your SAP B1 instance, posts the rows as receipts. Each leg is a separate app_name. Each window is a separate window_title. The supplier patches their portal in week 6 and a field shifts. The element_path on that step fails across every run in the next 30 minutes, the SQL alert fires, the workflow is paused before the bad data lands in SAP. Without the canonical shape, you find this out from the AP team a day later. With it, the anomaly query is one SELECT.

What governs who can read the per-step rows?

Row-level security policies on user_step_pool (the same migration that creates the table) scope SELECT to auth.uid() = user_id or to a user's organization_id. Inserts are limited to auth.uid() = user_id. The schema also carries a retention default of 30 days on the expires_at column for active pool steps, with a cleanup_expired_pool_steps() function callable from cron. For SOC 2 customers, the policies and retention can be tuned per org; the audit shape of who-saw-what is itself a row in the same Postgres database.

Can I see this shape without paying for a pilot?

Yes. The Terminator SDK is MIT-licensed at github.com/mediar-ai/terminator. The StepResult struct is in crates/executor/src/models/execution.rs. The telemetry plumbing is in crates/executor/src/telemetry.rs. The user_step_pool migration is in the public supabase/migrations directory of the same repo's web app. Clone it, run the executor against a workflow, point an OTLP-compatible collector at it, and you will see the spans and the per-step rows arrive without any Mediar cloud account. The cloud product wraps the same code with a managed Postgres, SOC 2 Type II controls, and a $0.75 per-minute runtime billing meter; the runtime emission shape is identical.

Adjacent guides on the same site

Risk

RPA marketplace supply chain risk: what you're installing on the bot host

The companion piece. What an RPA marketplace package actually pulls onto the bot host, why the trust boundary looks like npm, and how to think about install-time risk before any runtime anomaly query fires.

Read

Walkthrough

RPA UI drift, absorbed by the accessibility tree in three layers

Why most vendor UI patches show up as the third anomaly class on this page (element_path failures clustered in time), and how the cascade absorbs them so the alert is informational, not a stop-the-line.

Read

Architecture

AI agents on legacy desktop systems with no API

Where the app_name and window_title columns come from in the first place. The accessibility tree is the API for legacy desktop apps, and it is also the per-step identity surface anomaly detection sits on.

Read

The thing every page about this skips

What the runtime actually has to emit

Per-tool-call canonical row

Per-execution audit row

OpenTelemetry trace, end to end

The anomaly classes that map to a WHERE clause

Compromised marketplace component

Silent republish of an activity pack

Vendor UI patch in the supply chain

Credential misuse

Workflow-level scope drift

Outbound network anomaly

The trace, end to end

What other RPA runtimes emit instead

UiPath

Automation Anywhere

Power Automate Desktop

The counterargument: can you ML over what you already have

What this actually buys an operations team

Walk a real workflow under the per-step row shape

Frequently asked, before the call

Adjacent guides on the same site

RPA marketplace supply chain risk: what you're installing on the bot host

RPA UI drift, absorbed by the accessibility tree in three layers

AI agents on legacy desktop systems with no API

Comments (••)

Comments ()