Skyvern-AI/skyvern @ d1de195
What is Skyvern commit d1de19556efa2de6ab3420de0aa908d732a62024?
Short version: it is the commit where Skyvern stopped being wired to a single model. Below is exactly what it changed, the router package it introduced, and the one thing the diff quietly tells you about the ceiling of any browser agent.
Direct answer · verified 2026-06-21
Commit d1de19556efa2de6ab3420de0aa908d732a62024 in Skyvern-AI/skyvern is “Implement LLM router (#95)”, authored by Kerem Yilmaz (GitHub ykeremy) and dated 2024-03-17. It changed 16 files (+485 / -308), deleted the hardcoded OpenAI client, and added a provider-agnostic LLM router that resolves to OpenAI, Anthropic, or Azure.
View the commit on GitHubWhat the diff actually contains
Before this commit, Skyvern talked to exactly one place: skyvern/forge/sdk/api/open_ai.py. Model choice was not a setting; it was a file. The d1de195 change deleted that file (and a small chat_completion_price.py helper) and replaced both with a new package, skyvern/forge/sdk/api/llm/. Two of the added files carry the weight: api_handler_factory.py builds the call handler, and config_registry.py holds the table of which model keys are allowed.
File list and stats are from the GitHub commit API for this revision. The two red lines are deletions; the six green lines are the new llm/ package.
The registry: three providers, three enable flags
The interesting code is in config_registry.py. Each provider is wrapped in a settings check, so a model key only exists if you turned its provider on. At this revision the registry reads, in effect:
Two of those keys (GPT4V, Claude 3, and the Azure deployment) are registered as vision-capable, which matters because Skyvern reasons over a screenshot of the rendered page, not just the DOM. The router does not pick a model for you; it just guarantees the one you named in config is wired and reachable.
One router, three providers
How a request resolves after this commit
Once the router lands, the agent no longer imports a provider. It asks the factory for a handler keyed by the configured model name, the factory looks the key up in the registry, and only then does a real provider get called. If the key is not registered (because its enable flag is off), you get a clean configuration error instead of a deep stack trace from inside a vendor SDK.
Resolving a completion through the router
This is a clean abstraction and it did exactly what it set out to do. But notice what it does not touch: skyvern/webeye/actions/handler.py was modified in the same commit, and it is the part that turns a model decision into an action. Every action there is a browser action. Swapping OpenAI for Claude changes the brain. It does not change the hands.
Why this commit is a good lens on the browser boundary
People treat “which model” as the big lever in agent automation. The d1de195 commit is a useful reminder that it is not. Skyvern is a genuinely good open-source browser agent (AGPL-3.0, 20k+ stars), and the router made it model-portable. But the reach of the system is set by the action layer, and that layer speaks Chromium. The line below is the line the router cannot move.
| Feature | Skyvern (browser agent) | Mediar (accessibility tree) |
|---|---|---|
| Where actions land | Inside a Chromium tab: DOM + rendered pixels | OS accessibility tree, the layer screen readers use |
| SAP GUI, mainframe green screens | No DOM to read, out of reach | Native controls exposed via accessibility APIs |
| Jack Henry, Fiserv, FIS, Epic, Cerner | Thick desktop clients, no browser surface | Read and drive the same controls a user sees |
| Model choice | Router picks OpenAI / Anthropic / Azure | Model-agnostic too; the difference is the action layer |
| What changing the LLM fixes | Reasoning quality on a reachable page | Same, but the reachable surface is the desktop, not a tab |
This is not a knock on Skyvern. For automating a web app, a browser agent is the right tool and the router commit made it a better one. The point is narrower: if your stalled workflow lives in a thick Windows client with no API, no model on the router's list will reach it, because the action handler never had a path there. That gap is the reason the accessibility-API approach exists. We dig into that input layer in accessibility tree vs pixels and trace Skyvern's exact run loop in Skyvern as RPA.
Stuck where the browser agent stops?
If your workflow lives in SAP GUI, a mainframe, or a banking core, a 20-minute call shows whether the accessibility-tree approach reaches it.
Frequently asked questions
Frequently asked questions
What is Skyvern commit d1de19556efa2de6ab3420de0aa908d732a62024?
It is the commit titled 'Implement LLM router (#95)', authored by Kerem Yilmaz (GitHub ykeremy) and merged into Skyvern-AI/skyvern on 2024-03-17. It changed 16 files (+485 / -308) and replaced Skyvern's single hardcoded OpenAI client with a provider-agnostic LLM router that can resolve to OpenAI, Anthropic, or Azure.
What files did the commit add and remove?
It deleted skyvern/forge/sdk/api/open_ai.py and skyvern/forge/sdk/api/chat_completion_price.py, then added a new package at skyvern/forge/sdk/api/llm/ containing api_handler_factory.py, config_registry.py, models.py, utils.py, exceptions.py and an __init__.py. It also touched config.py, agent.py, app.py, handler.py, the env example, and the setup script.
Which model keys did the router register?
In that commit's config_registry.py the router registers OPENAI_GPT4_TURBO (gpt-4-turbo-preview), OPENAI_GPT4V (gpt-4-vision-preview), ANTHROPIC_CLAUDE3 (anthropic/claude-3-opus-20240229), and AZURE_OPENAI_GPT4V. Each block is gated behind an enable flag: ENABLE_OPENAI, ENABLE_ANTHROPIC, and ENABLE_AZURE respectively.
Does swapping the LLM change what Skyvern can automate?
No. The router changes which model reasons about the page, not what actions are reachable. Every Skyvern action is still bounded by a Chromium tab: navigate a URL, click a rendered element, type into a field, read the DOM and the pixels. A different model does not give the browser agent a handle on a native Windows control.
Why would someone search this exact commit hash?
Usually because they hit a reference to it: a dependency audit, a blame line, a changelog entry, or a tutorial that pinned the LLM abstraction to this revision. The short answer is that d1de195 is the historical moment Skyvern became multi-provider, so it is the natural anchor for anyone tracing how model selection works in the codebase.
Is this where browser automation stops being enough?
For browser-native SaaS, no. For data that lives in SAP GUI, a mainframe green screen, Jack Henry, Fiserv, FIS, Epic, or Cerner, yes. Those are native desktop controls with no DOM and no headless API. Mediar reads them through the OS accessibility tree, the same interface screen readers use, which is the layer the router commit's architecture cannot reach.
Keep reading
Skyvern as RPA: what it automates
The full run loop of a Skyvern browser agent, and where the browser boundary falls.
Accessibility tree vs pixels
The input layer an agent reads decides which apps it can touch. DOM, pixels, or the accessibility tree.
Where Power Automate Desktop hits SAP limits
The exact points a browser- or selector-bound tool stalls on SAP GUI.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.