A walkthrough of the four T-codes
SAP order-to-cash automation is four transaction codes and one selector shape.
The honest answer to “how do you automate SAP O2C” is short. Drive the four classic transaction codes by accessibility role and name, not by pixel coordinates or fragile XPaths. VA01 to enter the sales order, VL01N to post the outbound delivery, VF01 to create the billing document, F-28 to clear the incoming payment. Every click in a recorded run carries two selectors: a chained primary that walks process to window to field, and a generated scoped fallback that drops the process anchor and survives an SAP GUI theme repaint or a move from SAP GUI for Windows to SAP GUI for HTML.
Direct answer (verified 2026-05-03)
The shipping shape of an SAP O2C automation is a recording that drives four T-codes in order: VA01 (Create Sales Order), VL01N (Create Outbound Delivery with Reference), VF01 (Create Billing Document), and F-28 (Post Incoming Payment). Each click in the recording targets a UI element by accessibility role and name, with a chained primary selector and a generated scoped fallback. The implementation that emits both selectors is open source at github.com/mediar-ai/terminator, in apps/desktop/src-tauri/src/mcp_converter.rs.
Why a transaction-code spine and not a process diagram
Every guide on this topic opens with a rectangle that says “Sales order → Delivery → Invoice → Cash application”. The shape of the diagram is fine. The problem is that the diagram does not say what gets clicked. An automation that ships has to drive the actual screens that an operator drives, and on a steady-state SAP tenant those screens are reachable by transaction code from the SAP Easy Access menu or from a command field in the status bar.
The four T-codes below are the ones that absorb the bulk of human time in an O2C cycle on SAP ECC and S/4HANA. Each one is an SAP GUI screen with named fields and a status bar that returns a document number. The recorder targets each field by role and name. The execution is deterministic. The output of one screen feeds the next.
The rest of this page walks each T-code, then shows the selector shape, then admits the cases where this approach is not the right answer.
The four T-codes, in order
VA01: Create Sales Order
The trigger is usually a customer PO arriving as a PDF. A document model parses the PDF into a typed object: order type, sales organization, distribution channel, division, sold-to party, ship-to party, PO reference, and a list of line items with material, quantity, and price. The recording opens VA01, types the header on the initial screen, presses Enter to advance to the order overview, and types each line item in the items grid. The status bar returns Standard Order 0000010247 has been saved. That document number is parsed out by a status-bar role-and-text read and held for the next step.
The validation shape of the parsed object catches malformed inputs before they touch the screen: a sold-to that is not in the customer master, a posting date in a closed period, a material that does not exist for the plant. SAP itself rejects these too, but rejecting on the input side avoids a half-typed order on the screen and a clean-up step.
VL01N: Create Outbound Delivery with Reference
VL01N is the first T-code that consumes the output of the previous one. The initial screen takes a shipping point and the sales order number from VA01. The next screen is the delivery overview with a tab strip (Item Overview, Picking, Loading, Transport, Administration) and a line-item grid. The recording reaches each field by role and column header rather than by row index, which is what survives a release that reorders columns. Picked quantity defaults to ordered quantity for a make-to-stock case; the recorded step types nothing and proceeds. Post goods issue is a single button click on a button with role:Button and text:Post Goods Issue.
VF01: Create Billing Document
VF01 is the shortest of the four. The initial screen takes a delivery (or order) reference. The recorder types the delivery number from VL01N, presses Enter, and the system pulls the line items, prices, and tax conditions from the delivery and the pricing procedure. The save click posts the billing document and writes it to FI as a customer open item. The status bar returns Document 90123456 has been saved. That number is parsed and held for F-28.
F-28: Post Incoming Payment
F-28 is where the cycle closes. The initial screen takes the bank account, the document date, the posting date, the company code, and the amount of the inbound wire. The recorder types each field, then enters the customer account, then advances to the open-item selection screen. On a single-item match the recorder selects the billing document by ID, confirms the residual is zero, and posts. The status bar returns the clearing document number, which is written to the cycle's audit row alongside the bank reference, the wire amount, and the posting date.
On a partial-allocation case (one wire, three open items, slightly off amount), the recorder does not invent the allocation. An upstream matching pass produces a proposal with allocated amounts and residuals, and the F-28 recording takes that proposal as a structured input and drives the screen to apply it. The apply happens on the screen because the apply button is on the screen.
The four T-codes, threaded into one cycle
A clean cycle reads like a sequence diagram. The clerk drops a PDF into the inbox; the runtime drives four T-codes in order; four document numbers come back; the audit row is written.
One cycle: PDF in, four documents out
The sequence is interesting only because the typing in the middle is deterministic. Every click is a role-and-name selector against the accessibility tree of the SAP GUI window. There is no pixel matcher and there is no LLM in the hot path.
The selector shape, on one click
The point of the page. Every click step in the recording carries two selectors. The primary is a chained selector that walks process to window to field. The fallback is a generated scoped selector that drops the process anchor and starts at the window. Both target the same field. The primary times out at 3000 ms before the fallback is tried.
Below is the JSON shape of one step in a real recording: the Sold-to party field on the VA01 initial screen.
The shape, in words
- Primary (chained):
process:saplogon.exe >> role:Window && text:Create Sales Order: Initial Screen >> role:Edit && text:Sold-to party - Fallback (scoped):
role:Window && text:Create Sales Order: Initial Screen >> role:Edit && text:Sold-to party - Operator: the
>>separates a parent scope from a child match. Each segment is a role and text predicate. The selector engine resolves left to right. - Timeout: 3000 ms on the primary. If the primary fails to resolve, the fallback is tried with the same budget.
Emission lives in apps/desktop/src-tauri/src/mcp_converter.rs. The chained-vs-scoped branch is around lines 553 to 609. The scoped generator (generate_scoped_selector, generate_window_selector, generate_element_selector) is around lines 1772 to 1888.
What the fallback actually catches
The fallback exists for three real cases. Each one is a single difference between the recording moment and the playback moment.
One: SAP GUI for Windows to SAP GUI for HTML. A team records on the Windows client (saplogon.exe) and later moves a fleet of operators to SAP GUI for HTML in S/4HANA. The window title (Create Sales Order: Initial Screen) is identical in both, but the host process is now Chrome or Edge. The chained primary breaks at the first segment. The scoped fallback drops the process anchor and starts at role:Window with the title; it resolves.
Two: a SAP GUI for Windows version upgrade. A future client release that ships under a different host name (saplogon64.exe, sapgui.exe, etc.) breaks any selector tied to the process. The fallback does not name the process and is unaffected.
Three: a parent title shift on a future SAP release. The window title for an SAP transaction is generally stable, but a customer-modified screen title or a localized title on a non-English tenant can shift in a way the chained primary did not anticipate. The fallback also depends on the title, so it does not help here: this is the case the recording side has to fix. Better to detect this once at the recording side than to silently misroute clicks at runtime, which is the deliberate design choice.
The fallback does not attempt to be clever. It does one specific thing (drop the process anchor) for one specific class of failure (a process binary or host change). That is what makes it cheap and predictable.
What stays stable across an SAP release, and what does not
The reason the role-and-name approach holds up is that SAP GUI maintains its accessibility surface across releases, because that surface is what assistive software (screen readers, keyboard-only navigation, the ABAP debugger panes) reads. Field roles and field names on classic transaction screens like VA01 are part of that contract. The position of a field on screen is not. Pixel coordinates are not. Cell widths inside a grid are not.
The runtime captures the accessibility tree at recording time and again at playback time. Volatile attributes (x, y, width, height, and the value attribute on inputs at the moment of capture) are stripped before comparison. The roles, names, and tab orders are what the recording matches against. A theme repaint that moves a field down a row produces no diff. A field rename produces a real diff and is flagged at the recording side.
When this approach is the wrong answer
Three honest cases.
A tenant fully on EDI plus clean BAI2 every morning. If every customer is on EDI 850/856/810/820 and the bank file auto-matches every wire, the IDOC layer and the standard cash application job are doing the work. The recorded line has very little to do.
A one-time migration of forty thousand sales orders. That is what LSMW or the S/4HANA Migration Cockpit is built for. The recorder is for steady-state event-driven O2C cycles, one PDF and one wire at a time. It is not the right shape for a once-a-quarter bulk load.
A tenant where O2C lives in S/4HANA Fiori transactional apps with a coherent OData layer. Some Fiori apps expose a clean API and the right answer is to call that API. The recorder is the answer for the SAP GUI for Windows or SAP GUI for HTML surface, not for every screen SAP ships.
Bring one O2C cycle and we will record VA01 to F-28 against your tenant
On a 30-minute call we open your SAP GUI window, record the four T-codes against one of your real customers, and hand back a checked-in workflow file you can run yourself. The selector shape is the part you can read first.
Frequently asked questions
Why the four T-codes specifically? Where do VOFM, VKM1, FBL5N, OVA8 fit?
VA01, VL01N, VF01, and F-28 are the four screens that touch every O2C cycle for a shipping product company. VOFM (formula maintenance) is configuration, not transaction work; it happens once per pricing change. VKM1 (credit hold release) is part of the cycle but only fires for orders flagged by the credit limit rule, which is a fraction of total volume. FBL5N (customer line item display) is a read screen, not a posting screen, and its automation case is reporting rather than O2C. OVA8 is configuration. Recording on the four T-codes is the part that compounds across cycles. Anything else is conditional, low-frequency, or read-only and can be added once the spine is shipping.
Does this work the same on SAP GUI for Windows and SAP GUI for HTML in S/4HANA?
The selector shape is the same on both. SAP GUI for Windows exposes its surface via Microsoft UI Automation, the standard Windows accessibility framework that screen readers like NVDA and JAWS use. SAP GUI for HTML renders the same screens inside a browser pane, and Microsoft UI Automation walks the embedded document tree the same way it walks a native window. The role names line up: an Edit field is role:Edit in both, a Window is role:Window in both, a button is role:Button in both. The difference is in the window selector. On the Windows client the parent is process:saplogon.exe with the screen title (`Create Sales Order: Initial Screen`). On HTML it is process:chrome.exe (or msedge.exe, etc.) with the same screen title nested inside the browser tab. The chained `>>` operator handles both because it is parent-scoping the field by the screen title, not by the binary that hosts it.
What does the two-tier selector actually look like, and where in the code does it get emitted?
Each click step in a recording carries two selectors. The primary is a chained selector that walks from the process down through the window down to the field, separated by `>>`: `process:saplogon.exe >> role:Window && text:Create Sales Order: Initial Screen >> role:Edit && text:Sold-to party`. The fallback is a generated scoped selector that drops the process anchor and starts at the window: `role:Window && text:Create Sales Order: Initial Screen >> role:Edit && text:Sold-to party`. The converter writes both into the JSON step. If the chained selector exists at recording time it becomes the primary; the generated scoped selector becomes the fallback. The implementation lives in apps/desktop/src-tauri/src/mcp_converter.rs around the chained-vs-scoped branch (lines 553 to 609 in the open-source repo).
Why two selectors instead of one? What does the fallback actually catch?
Three real cases. First, an SAP GUI for Windows version upgrade can change how the host process exposes itself; the chained selector starts at process:saplogon.exe but a future client might run under a different host name, in which case the chain breaks before reaching the window. The fallback drops the process anchor and starts at role:Window, which still resolves. Second, when a customer moves the same recording from the Windows client to SAP GUI for HTML, the window is now nested in a Chrome tab, so the process binary is different but the window title is identical; the fallback resolves. Third, a future SAP release that retitles a screen at a level above the field still leaves the field role and name stable; the scoped selector starts close to the field and is more robust to title changes far up the tree. The primary times out at 3000 ms before the fallback is tried; in practice the primary resolves in tens of milliseconds and the fallback is rarely invoked, but it is the part that keeps the recording alive across an SAP upgrade.
What about VL01N's create-with-reference flow? It is not a flat form like VA01.
VL01N starts with the sales document number on the initial screen, then routes to the picking and goods-issue screens. The recording captures three screen contexts in one T-code: the initial screen (one field, the order number), the delivery overview (a tab strip plus a line-item grid), and the post-goods-issue button. Each click in the overview grid is recorded with the role of the cell (role:DataItem inside role:DataGrid) plus the column header text (Plant, Storage location, Picked qty), which is what survives a release that nudges the column order. The post-goods-issue button has its own selector with role:Button and text:Post Goods Issue. The same two-tier shape applies to each click: chained primary, scoped fallback.
How does the runtime handle the SAP GUI status bar, where SAP returns the document number you need to feed into the next T-code?
The status bar is a real accessibility element with role:StatusBar and a text child that contains the message (`Document 0000010247 has been saved`). The recording captures the status text by role-and-name selector and parses the document number out with a small zod regex. That parsed value becomes the input for the next T-code's input schema. Because the status bar is addressable by role, the parser does not depend on a screenshot or a known pixel region. A theme change that recolors the status bar does not affect it.
F-28 has a partial-allocation flow when a wire does not match a single open invoice. Is that automated too?
Partly. F-28 has two dominant cases. The first is a wire that auto-matches a single open item: the recording opens the screen, types the bank account, types the customer, types the amount, and posts. That is fully recorded and runs on every cycle. The second is a wire that covers parts of three open items: the recording does not replace the judgment about which open items the wire covers. Instead, an upstream matching pass produces a structured allocation proposal (open items, allocated amounts, residuals) and the recorded F-28 pass takes that proposal as input and drives the screen to apply it. The apply happens on the screen because the apply button is on the screen. The proposal is what the human reviews; the typing is what the recording does.
Can the same recording run unattended on a Citrix or RDP session, or only on a real desktop?
Both. The desktop agent runs on any Windows machine that hosts the SAP client. That can be a real workstation, a Citrix XenApp session, an Azure Virtual Desktop session, or a dedicated Windows VM. The accessibility tree is exposed by the GUI host process; the host of the GUI host (Citrix, RDP, native) does not change the tree. A common production shape is a small fleet of Windows VMs each running one operator's signed-in SAP session, with the agent processing queue items against the open SAP GUI window. Audit logging and SOC 2 / HIPAA posture apply to the runtime regardless of which host the SAP client is sitting on.
How long does a full O2C cycle take through this when the selectors all resolve cleanly?
A typical cycle (one PDF in, four documents out) runs in 90 to 120 seconds end to end on a warm SAP GUI for Windows session against an on-premise S/4HANA tenant. The breakdown is roughly 25 to 35 seconds in VA01 (header plus eight to twelve line items), 15 to 20 seconds in VL01N (reference, picking, post goods issue), 10 to 15 seconds in VF01 (reference, save), and 25 to 40 seconds in F-28 when the wire is a single-item match. Pricing is $0.75 per minute of runtime. The number every CFO actually asks about is the comparison to the operator alternative: the same cycle takes a trained AR clerk twelve to fifteen minutes when there is a queue of PDFs to type in. The compounding is in the volume.
What breaks this approach? When is the recorded line the wrong answer?
Three honest cases. First, a tenant that already has every customer on EDI 850/856/810/820 and a clean BAI2 file every morning has very little O2C left for a recorder; the IDOC layer is doing the job. Second, a one-time data migration of forty thousand sales orders is the wrong shape for an event-driven recorder; that is what LSMW or Migration Cockpit is for. Third, a tenant where the screen rendering goes through Fiori transactional apps (not SAP GUI for HTML, the Fiori apps proper) exposes a different surface; some Fiori apps have a coherent OData layer and the right answer is to hit that layer instead. The recorder is the right shape for steady-state O2C with PDFs in, single-item or proposal-driven matches, and the SAP GUI for Windows or SAP GUI for HTML surface as the one the operators actually use.
Where is the runtime open source, and what is the file path I want to read?
Terminator and the desktop agent are open source under MIT at github.com/mediar-ai/terminator. The selector emission for clicks lives in apps/desktop/src-tauri/src/mcp_converter.rs in the convert_click and convert_application_switch paths. The chained-vs-scoped selector branch is around lines 553 to 609. The scoped-selector generator (generate_scoped_selector, generate_window_selector, generate_element_selector) is around lines 1772 to 1888. The retry classifier and backoff schedule are in crates/executor/src/config/retry.rs.
Three more pieces of the same SAP architecture
Adjacent reading
SAP Business ByDesign order-to-cash automation: where the OData line ends and the WorkCenter line begins
ByDesign-specific seam map for the OData v2 + SOAP layer vs the WorkCenter screens. Reads alongside this page if your tenant is the cloud product, not classic SAP.
Automate SAP data entry: the failure taxonomy a working SAP queue lives or dies by
The error classifier behind a shipping SAP workflow. Connection resets and 503s retry with a 30/60/120s backoff; validation failures stop immediately. Same runtime that drives the four T-codes here.
SAP data entry automation: one journal entry, traced from PDF to F-02
Field-level companion piece. Traces a four-field zod schema for one shipping workflow from a OneDrive trigger to the F-02 status bar.