An argument, with the executor source open
To automate SAP data entry, you mostly need to get the failure path right. The clicks are the easy part.
A team that has run an SAP automation queue in production for a quarter knows this in their bones, and a team that has not is about to find out: typing into F-02 is the cheap part of the system. Deciding what the queue does when the type fails, when SAP returns a status that does not match the recording, when the application server bounces during a save, that is the part that determines whether the automation pays back its build cost or whether it becomes a 4am pager. This page reads the exception-handling layer of a shipping Mediar SAP workflow as it actually exists in the executor crate, and argues that the failure taxonomy is the interesting object, not the click recording.
The thesis, plainly
Most existing material on this topic answers the question how do I get a robot to type into SAP. That is the question to ask once. After it is answered, the question that decides whether your automation queue runs unattended or burns an analyst hour every morning is how does the queue behave when the type fails.
The shape of an SAP queue is that ninety-something percent of runs succeed without anyone looking at them, and the bottom few percent contain the entire operating cost of the system. If those failures all wake a human, you have not automated SAP data entry; you have built a ticket factory. If those failures get retried indiscriminately, you eventually post a duplicate journal and have a real audit conversation. The middle path is a classifier that knows which kind of failure is which.
That classifier exists in the Mediar codebase as a 230-line Rust file at crates/executor/src/config/retry.rs. The rest of this page reads it.
The classifier, in one function
The function is called classify_error. It takes one argument, the lowercase error message, and returns one of three enum variants: Infrastructure, WorkflowLogic, or Unknown. Two arrays of substring patterns drive the match. There is no regex, no model, no protocol-specific wrapper. The whole policy is legible in a single screen of code.
That choice is deliberate. SAP returns text into a status bar, the Windows accessibility runtime returns text from a node, and the MCP transport layer returns its own. A classifier that operates on the flattened lowercase string handles all three without a wrapper per source. The cost is that adding a pattern is a code change, which is the right side of that trade for an executor that has to be conservative.
Infrastructure
Retried, with backoff
Connection refused, 503, MCP unavailable, VM is down, name resolution failed, deadline exceeded, out of memory. Each pattern is a substring inside the matched string. The retry schedule is 30 seconds, then doubled, capped at 600 seconds, up to three attempts on default config.
WorkflowLogic
Never retried
Validation failed, record not found, permission denied, duplicate, step failed, assertion failed, parse error. Hammering the queue would not change the answer. The run is marked failed and surfaced; the next job does not start until a human acks it.
Unknown
Treated as do-not-retry
Anything that did not match either pattern set. The conservative default is to stop. A half-posted journal is worse than no journal, and the cost of a manual triage is lower than the cost of a duplicate post you have to back out of the G/L.
The categories, mapped to SAP failures
Reading the classifier as patterns is one thing. Reading it as the failure modes a working SAP queue actually produces is another. Here is the map: every left-column entry below corresponds to a substring branch in retry.rs, and every right-column entry is the SAP-shaped failure that surfaces it.
Failure taxonomy
What each branch means in the SAP context
Connection refused / reset
Classified Infrastructure. Retried with 30s, 60s, 120s backoff (capped 600s). Common when a SAP application server bounces.
503 / 504 / VM is down
Infrastructure. Same retry posture. Distinguishes between the runtime VM and SAP's own gateway by inspecting the error string.
MCP service unavailable
Infrastructure. The runtime communicates with SAP through an MCP HTTP endpoint, and a missing endpoint is treated as transient.
Validation failed / invalid input
WorkflowLogic. Never retried. The journal entry has a bad value (a posting date in a closed period, an unknown company code).
Record not found / permission denied
WorkflowLogic. Never retried. SAP itself rejected the operation. Hammering the queue would not change the answer.
Step failed / assertion failed
WorkflowLogic. Never retried. The recording asserted a state SAP did not return (e.g. expected document number after save).
Element not found
Falls through to Unknown by default, which is a deliberate do-not-retry posture. The escalation layer takes over from here.
Anything else
Unknown. Treated as do-not-retry. Conservative on purpose: a half-posted journal is worse than no journal.
One real run, traced through the layer
What the classifier does is easier to read in a trace than in prose. The output below is the shape of an SAP run that hits two different failure classes back to back: an Infrastructure error that the queue retries through, and an Unknown error that the queue refuses to retry and hands to the analyzer instead.
Two things to note. First, the retries on the Infrastructure branch are silent: nobody is paged for a connection reset that self-heals on the second attempt. Second, the Unknown branch does not retry at all. Element not found after the Infrastructure retries are exhausted is the signal that something changed in SAP, and pounding the queue at it would not help. The executor stops, the analyzer takes over, and a human ack is required before the queue moves.
What an SAP error actually flows through
The runtime sits between the SAP-side error sources and three possible terminal outcomes. The diagram below is the live shape: three input channels surface text into a single classifier, and the classifier hands the run to one of three downstream paths. Two of those paths are silent; the third pages a human.
classify_error: one function, three terminal paths
The diagram exists because the two bits of code that decide which path a failure takes (the classifier in retry.rs, the analyzer route in apps/web/src/app/api/internal/analyze-error/route.ts) live in different services in the monorepo, and reading them in isolation hides the fact that they form one decision surface. Together they define the policy a SAP automation team operates under in production.
The model only enters on the failure path
The escalation layer runs in a separate route at apps/web/src/app/api/internal/analyze-error/route.ts. When the deterministic retries are exhausted on a SAP run, the executor posts the failed execution's error, logs, and result payload to that route. The route loads gemini-2.5-flash through Vertex AI, hands it a prompt scoped to the OneDrive-to-SAP workflow, and asks for a structured analysis with four named sections: Root Cause, Impact, Solution, and Prevention.
The prompt itself names six common failure categories that the model is told to look for. Reading the prompt as a checklist of what an SAP automation actually fails on in the field is the most direct way to ground the categories above:
From the prompt sent to gemini-2.5-flash
- MCP connection timeouts (the runtime talks to SAP through an HTTP MCP endpoint).
- SAP session timeouts or login failures.
- Element not found errors (SAP UI changes).
- Data format mismatches (dates, amounts, account codes).
- Network connectivity issues.
- Azure VM/service availability.
Verbatim categories from the prompt body, lines 85 through 92 of the route file. The prompt also requires the model to name which exact MCP endpoint is failing and which SAP screen or element is problematic, which is what makes the resulting note useful at 3am instead of generic.
Two things make this design choice load-bearing. The model runs after the deterministic retries, not before any of them. And the model writes a note that lands in the dashboard next to the failure, but it does not click anything in SAP. It cannot retry a journal post on its own. The human is still the only path back into the queue, and that is on purpose: a regulated G/L is not the place to discover what an autonomous agent will do under pressure.
Why volatile-attribute stripping is the SAP-specific trick
One pattern in the classifier is conspicuously absent: position changed. SAP repaints constantly. A support pack nudges a field down a row, a screen variant rearranges a tab, a customer-specific variant adds a column. Selector-based RPA categorizes most of those as element not found because the recorded selector tied itself to a coordinate or a tree position. The result is a queue that fails for cosmetic reasons and pages a human for nothing.
The Mediar runtime sidesteps this with a small file at apps/desktop/src-tauri/src/dom_tree_diff.rs. Before two captured trees are compared, the function remove_volatile_dom_attributes walks the JSON and drops every x, y, width, and height field, plus the value field on input elements (which is captured separately in the event stream and would otherwise dominate the diff with noise). What is left is the structural shape: roles, ids, names, types, text content.
The SAP-specific consequence is that a layout shuffle does not fire a diff, which means it does not produce an Unknown error, which means it does not page anyone. The classifier never sees it. The queue keeps moving. That is the part of the system that makes the math work for an SAP shop running 200 posts a day on a vendor cadence that ships meaningless layout updates every two months.
What this buys an SAP team in practice
The argument has been technical so far. The operational version is shorter. An SAP automation queue with this exception-handling layer behaves like a junior analyst who follows three rules: retry the network once and never tell anyone, escalate anything SAP itself rejected, and stop cold on anything novel. That is the shape that turns an SAP automation pilot into a system the finance team trusts to run unattended overnight, because it exposes the right failures and silences the noise without covering up the dangerous failures.
What it does not buy you is autonomy on the failure path. Designing for autonomy on a journal entry that touches the G/L would be a different product, and it would be a worse one for the regulated SAP buyer. The point of the layer is to make the remaining human attention as cheap as possible, not to remove it.
The honest place to start, if this argument is interesting, is to bring one SAP transaction and one realistic failure mode and watch the layer work on it.
When this design is the wrong fit
A few cases where the conservative do-not-retry posture is not what you want.
Read-only SAP scraping. If the workflow only reads from SAP (a daily extract for a dashboard), retrying liberally on Unknown errors is fine and the conservative default is overcautious. The classifier is configurable; bumping max_infrastructure_retries on a read workflow is a one-line change.
Idempotent posting against a unique key. If the upstream system supplies a deterministic external key and SAP rejects duplicates on it, an aggressive retry is safe. Most journal entries are not in this shape, but a few line-item interfaces are. For those, the do-not-retry default is leaving throughput on the table.
High-volume bulk loads. A 40,000-row chart-of-accounts load is what LSMW or a properly tuned BAPI queue is built for. The exception-handling shape on this page is built for the steady stream of event-driven posts (a PDF arrives, post it; a row appears, post it), and it is honest to say it is not the right answer for the once-a-quarter migration.
Bring one SAP failure mode and watch the queue handle it
On a 30-minute call we record one transaction against your test SAP system, then deliberately break it the way it actually breaks in production. You leave with the classifier output, the analyzer note, and a checked-in workflow file you can run yourself.
Frequently asked questions
Why does the retry classifier hard-code substring patterns instead of HTTP codes?
Because most of the errors that surface here are not HTTP errors. SAP GUI returns text into a status bar ("Posting period 2024 closed", "Document number range exhausted"), the Windows accessibility runtime returns text from a node ("Element not found"), and an MCP transport layer returns its own text. A single classifier that operates on the lowercase error string handles all three sources without a wrapper per protocol. The cost is that adding a new pattern is a one-line code change rather than a config edit, which is the right side of that trade for an executor that cares about not retrying the wrong thing.
What does the actual backoff look like for a SAP queue?
The default is 30 seconds, 60 seconds, 120 seconds, 240 seconds, and so on, doubled until capped at 600 seconds. The default max is three Infrastructure retries before the run is marked failed. So an SAP application server that hiccups for two minutes is invisible: the queue waits, retries, and posts. A SAP system that is down for a planned outage produces three retries inside about three and a half minutes, then a clean failure that the dashboard can group into one alert. The numbers come straight out of the RetryConfig::default impl in the same file (initial_delay_secs: 30, max_delay_secs: 600, backoff_multiplier: 2.0).
What runs after the deterministic retries are exhausted?
A separate API route, apps/web/src/app/api/internal/analyze-error/route.ts, hands the failed execution to gemini-2.5-flash with a prompt scoped to the OneDrive-to-SAP workflow. The prompt asks for Root Cause, Impact, Solution, and Prevention sections, and explicitly seeds the model with six common categories: MCP connection timeouts, SAP session timeouts or login failures, element not found errors (SAP UI changes), data format mismatches (dates, amounts, account codes), network issues, and Azure VM availability. The model writes a markdown analysis that lands in the dashboard next to the failure, so the on-call operator sees a triaged note rather than a raw stack trace at 3am.
Why a model only after retry, not in the loop?
Two reasons. First, money: the executor processes a SAP post in 25 to 60 seconds, and gating every step on a model round trip would blow past that budget. Second, determinism: the same input has to produce the same posted document or the audit story falls apart. The model is on the failure path, not the success path. Two identical PDFs produce two identical posts; only the divergent run pays for the analysis.
Does this work for non-OneDrive SAP triggers too?
The classifier and retry layer are workflow-agnostic. They sit in crates/executor and operate on the error string regardless of what triggered the run. The gemini-2.5-flash analyzer is currently scoped to the OneDrive-to-SAP prompt, which is the highest-volume SAP workflow we run today. Other SAP triggers (a row in a Postgres queue, a webhook from a POS, an attachment in an inbox) reuse the classifier as-is and either inherit a generic analyzer prompt or get a workflow-specific prompt added when the volume justifies it.
How does the runtime tell an SAP UI tweak apart from a real failure?
The DOM tree diff helper at apps/desktop/src-tauri/src/dom_tree_diff.rs is the relevant code. Before comparing two captured trees, it strips x, y, width, and height from every node, plus the value attribute on input elements (which gets noisier than it is informative). What survives the strip is the structural shape: roles, ids, names, types, and text content. A SAP support pack that nudges a field down a row triggers no diff because the fields it cares about are the same. A support pack that renames a label or removes a field shows up as a real change, and that flagged change is what feeds into the recording-side fix queue rather than the live execution path.
What happens when the queue gives up on a journal entry?
stop_on_error defaults to true on SAP workflows. The execution stops on the failed step, the run state is marked failed, the dashboard surfaces it, and the gemini-2.5-flash analyzer attaches its triage note. The next item in the queue does not start until a human acks the failure. That posture is louder than "retry forever" or "silently move on", which is the whole point: a half-posted journal flowing into the G/L is the worst possible outcome and the queue is configured so you cannot reach it without an explicit operator decision.
Where can I read the classifier myself?
The Terminator SDK and the executor crate are open source under MIT at github.com/mediar-ai/terminator. The classifier we have been describing is in crates/executor/src/config/retry.rs. The DOM tree diff is in apps/desktop/src-tauri/src/dom_tree_diff.rs in the same monorepo. The gemini-2.5-flash route is part of the commercial cloud product and is not in the open-source repo, but the prompt structure is straightforward to reproduce on top of any inference SDK against the same workflow output. A team running Terminator standalone gets the deterministic classifier and supplies its own escalation layer.
Is the runtime fully autonomous?
No, and this matters. The model writes the workflow during the offline recording-processing pass. The runtime that types into SAP is deterministic Rust with no LLM call sites in crates/executor. The only model in the live loop is the post-failure analyzer, and even that one runs against a frozen execution trace, not against a live SAP session. "Fully autonomous" overclaims a different shape of system; what we ship is a recording-driven workflow with a triage assistant on the failure path.
How do you price the failed runs?
Runtime is billed at $0.75 per minute regardless of outcome, because the SAP session and the desktop agent ran for that minute. The gemini-2.5-flash analyzer call is included in the platform fee, not metered separately. A run that fails after 20 seconds and takes another 10 seconds to triage costs you 30 cents in runtime and zero additional inference. The economics make the conservative do-not-retry posture cheap: a failed run that you investigate once is materially cheaper than a runaway retry loop pounding a SAP system that has already declined the post.
Three more pieces of the same architecture
Adjacent reading
SAP data entry automation: one journal entry, traced from PDF to F-02
The field-level companion to this piece. Traces the four-field zod schema for one shipping workflow from OneDrive trigger to the F-02 status bar.
Where the AI in Mediar AI actually lives (and where it does not)
The model writes the workflow once during recording. The runtime is a Rust binary calling Windows accessibility APIs with zero LLM calls in the hot path. Read alongside this page to see where the analyzer model fits in.
Meaning of robotic process automation, word by word
Why the same phrase covers a brittle selector tree and an AI-recorded workflow file, and which one a regulated SAP team can sign off on.