A guide

Skyvern, self-hostable: three containers, five ports, AGPL-3.0, and one boundary self-hosting does not move.

Skyvern is fully self-hostable. The runtime is AGPL-3.0 at github.com/Skyvern-AI/skyvern, and a single docker compose up -d brings the agent loop, the workflow builder, and a bundled Postgres up on your own hardware. What this page does, and what most other guides on this skim past, is name what you actually get in the public images, what stays proprietary in the cloud product, and the architectural boundary self-hosting does not change.

Matthew Diakonov, Written with AI

Published May 7, 20269 min

Direct answer, verified 2026-05-07

Yes. Skyvern is self-hostable under AGPL-3.0. Two paths: pip install skyvern && skyvern quickstart for a single-user setup, or git clone github.com/Skyvern-AI/skyvern && docker compose up -d for a Docker setup. Docker brings up three containers (postgres, skyvern API + Chromium, skyvern-ui) on five exposed ports (8000 API, 6080 noVNC, 9222 CDP, 8080 UI, 9090 artifact API). Latest release tag at the time of writing is v1.0.34 (7 May 2026). What the cloud tier adds on top, and what self-hosting does NOT include, is named in the comparison further down: anti-bot fingerprinting, residential proxy network with geo-targeting, and integrated CAPTCHA solving. The agent loop itself (Planner, Actor, Validator) is identical in both.

The literal docker-compose, top to bottom.

The repo's docker-compose.yml declares three required services and two optional ones. The required services are postgres, skyvern (the API plus Playwright plus a headful Chromium, all in the same container), and skyvern-ui (the workflow builder web app). The optional services, shipped commented out, are Vaultwarden and a Bitwarden CLI REST server, used by teams that want vault-backed credential storage on the same box. Below is the literal sequence from a clean clone to a running stack.

skyvern, from clone to ready

A clean run on a developer box takes 60 to 120 seconds: most of that is the initial image pull. The skyvern container boots a headful Chromium internally and exposes a noVNC WebSocket on port 6080, so you can open a browser tab against your own host and watch the Actor agent click through a workflow live. Port 9222 forwards the Chrome DevTools Protocol if you want to attach an external Playwright client; port 9090 is the artifact API the UI calls to fetch screenshots and DOM snapshots from past runs.

The three required services and what each one is for.

postgres · image postgres:14-alpine. Default credentials are skyvern / skyvern / skyvern. The host port is commented out by default, so the database only listens on the docker network. Persistent data lives at /var/lib/postgresql/data/pgdata inside the container, which the compose file mounts to a host volume. Pip-install setups (v1.0.31+) default to SQLite at ~/.skyvern/data.db instead, which is fine for one user and not fine for shared use.
skyvern · image public.ecr.aws/skyvern/skyvern:latest. Holds the API server, the Planner / Actor / Validator agent loop, Playwright, and a headful Chromium, all inside one container. Ports: 8000 (REST API), 6080 (noVNC WebSocket so you can watch the Actor live), 9222 (Chrome DevTools Protocol forwarding for an external Playwright client). Environment is driven by .env: LLM provider toggles, the database string, and BROWSER_TYPE=chromium-headful by default.
skyvern-ui · image public.ecr.aws/skyvern/skyvern-ui:latest. The workflow builder web app and run history viewer. Ports: 8080 (the UI itself, this is what you load in a browser) and 9090 (artifact API for screenshots and DOM snapshots from past runs).

The .env that actually matters.

The repo ships three example env files: the default .env.example, an env.litellm.example for routing through LiteLLM, and an env.ollama.example for fully local inference. Pick whichever matches your provider and layer the keys on top. The minimum viable shape is below.

.env

One non-obvious point: the LLM provider list is provider-agnostic via LiteLLM. That means the same self-hosted instance can fan requests across multiple providers (Anthropic for hard pages, OpenAI for cheap routine ones, Ollama for offline). The runtime does not lock you to one vendor, which is the same property the cloud product has.

What the AGPL-3.0 image contains, and what it does not.

A common surprise: every public guide on running self-hosted Skyvern shows the docker compose command, almost none of them name what stays in the cloud. The agent loop itself is identical between self-hosted and cloud. What the cloud product adds, and what the AGPL public images deliberately do not include, is a fabric of three proprietary capabilities that target hardened public-internet sites.

Stays proprietary, NOT in the public image

Anti-bot fingerprinting and behavior shaping (the layer that makes the headful Chromium look like a real user, not a Playwright session). Not in the public repo, not in the public ECR images.
Residential proxy network with country, state, city, and ISP targeting. Not bundled. If you self-host you wire your own proxies, or run direct from your egress.
CAPTCHA solving across image, audio, hCaptcha, reCAPTCHA, Cloudflare Turnstile. The cloud product solves these as part of the workflow. Self-hosted, the agent loop hits the CAPTCHA and stops.
Geo-targeting at the egress layer. Self-hosted runs from your IP; cloud runs from the proxy fabric.

On the other side, here is the full set of capabilities the public AGPL-3.0 image actually contains. This is the part most self-hosting guides understate.

Included in the AGPL-3.0 self-hosted runtime

The Planner agent that decomposes a plain-English goal into ordered steps.
The Actor agent that fires Playwright clicks, types, navigation, downloads, and dropdowns inside a managed Chromium tab.
The Validator agent that confirms each Actor step changed the page the way the Planner expected, and otherwise asks for retry or replan.
The full LLM provider matrix: OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Gemini, Ollama, OpenRouter, Groq. Provider-agnostic via LiteLLM.
The workflow DSL with email, HTTP, and custom code blocks (ENABLE_CODE_BLOCK=true).
The web UI at port 8080 for building workflows visually, watching runs over noVNC at port 6080, and inspecting artifacts at port 9090.
The MCP (Model Context Protocol) server, so a coding agent in your IDE can drive Skyvern as a tool.
JSON-schema-driven data extraction; pip-install path (skyvern quickstart) for non-Docker setups.

Self-hosted vs. Skyvern cloud, line by line

The agent loop and the LLM matrix are identical. The fabric around them is what the cloud product is selling.

Feature	Self-hosted (AGPL-3.0)	Skyvern Cloud
Agent loop (Planner, Actor, Validator)	Self-hosted, AGPL-3.0, in the public repo	Same, hosted for you, no infra to run
LLM provider routing (OpenAI, Anthropic, Bedrock, Gemini, Ollama, OpenRouter, Groq, Azure)	BYO key, you pay the inference bill directly	BYO key, billed on top of credits
Anti-bot fabric (fingerprinting, behavior shaping)	Not in repo, not in image	Included, proprietary
Residential proxy network with country/state/city/ISP targeting	Not in repo, not in image	Included, proprietary
CAPTCHA solving (image, audio, hCaptcha, reCAPTCHA, Cloudflare Turnstile)	Not in repo, not in image	Included, proprietary
Vaultwarden / Bitwarden CLI for credential storage	Optional services, commented out in compose, you wire your own	Bundled, integrated
Browser session persistence and replay	Local volume on your host	Managed, with run history retention
Concurrency	Whatever your host can run; one Chromium per workflow	Plan-bounded, parallel runs
Audit log retention	Whatever your DB retention says	Plan-bounded, exposed via UI

“Self-hosting Skyvern gets you the agent loop and your own infra. It does not get you the anti-bot fabric or the proxy network. If your workflow lives behind a hardened portal, that gap is not a knob you can tune; it is a feature that is not in the box.”

Internal Mediar engineering note

On the practical line between the AGPL image and the cloud product

The boundary self-hosting does not move.

Self-hosting changes where the data plane runs. It does not change which surface the agent loop reads. Self-hosted Skyvern is still Playwright driving a managed Chromium tab. The Actor scores elements rendered inside that tab. The Validator reads a screenshot of that tab. Anti-bot detection (in the cloud product) and your own proxies (in self-hosted) operate at the tab edge.

That tight coupling is what makes the architecture good at what it is good at, and it is the reason the architecture stops where the tab stops. A SAP GUI window is not a tab; it is a Win32 process that publishes its UI through Microsoft UI Automation, a separate accessibility surface a Chromium-bound agent has no way to read. An Oracle Forms session is not a tab. A Jack Henry green-screen terminal is not a tab. An Epic Hyperspace patient chart inside a Citrix shell renders pixels the agent can see and an accessibility tree it cannot reach from inside Chromium.

Self-hosting solves data residency, cost at very high volume, and the ability to fork the agent loop. It does not solve the surface mismatch. For workflows that leave the browser tab into closed Windows desktop apps, the relevant surface is Microsoft UI Automation, the same accessibility tree screen readers consume, and the natural unit of work is wall-clock time on Windows controls, not browser-execution credits. That is the gap Mediar covers, at $0.75 per minute of runtime, with the Terminator SDK published openly at github.com/mediar-ai/terminator. The two tools live on different surfaces; most enterprise workflows actually need both.

If your workflow leaves the browser tab, the unit of work changes.

Twenty minutes is enough to walk through which steps of your workflow Skyvern self-hosted handles cleanly, and which steps need a different agent surface (Windows UI Automation, no DOM).

Frequently asked questions about Skyvern self-hosted

Is Skyvern self-hostable, and what license does it use?

Yes. The full agent runtime is open source under AGPL-3.0 at github.com/Skyvern-AI/skyvern. You can self-host either through pip (pip install skyvern, then skyvern quickstart) or through Docker Compose (git clone the repo, configure .env, run docker compose up -d). AGPL-3.0 lets you install, modify, and run it on your own hardware. The catch is the network-use clause: if you offer a modified Skyvern as a hosted service to other organizations, your modifications also have to be published under AGPL-3.0. For internal-only enterprise use, that clause does not bind you.

What exactly runs when I run docker compose up -d?

Three containers. (1) postgres:14-alpine, the bundled Postgres, with the credentials skyvern/skyvern/skyvern by default and persistent volume at /var/lib/postgresql/data/pgdata. (2) public.ecr.aws/skyvern/skyvern:latest, which is the API server plus a Playwright-driven Chromium tab in the same container, exposing 8000 (REST API), 6080 (noVNC WebSocket, watch the Actor live), and 9222 (CDP forwarding). (3) public.ecr.aws/skyvern/skyvern-ui:latest, the workflow builder web app, exposing 8080 (UI) and 9090 (artifact API). Two more services, Vaultwarden and a Bitwarden CLI REST server, ship commented out in the compose file for credential storage; uncomment them if you want vault integration on box.

What is the minimum hardware I need to run Skyvern self-hosted?

The official documentation does not publish a hard minimum. In practice, every Actor step shoots a screenshot plus rendered DOM at a vision-capable LLM, so the heavy CPU and RAM costs sit at the inference provider, not your host. You need enough headroom for one headful Chromium per concurrent workflow plus a small Postgres, which lands at roughly 4 vCPU and 8 GB RAM as a comfortable floor for a single-user dev box. Concurrency scales linearly with Chromium instances. If you point Skyvern at a local Ollama model rather than a hosted API, your hardware floor jumps because inference now runs on your box too.

What does self-hosting NOT give me, compared to skyvern.com cloud?

Four things, each named explicitly in the comparison above. (1) The anti-bot fabric: the layer that makes the headful Chromium session look like a real human session to fingerprinting checks. Not in the public repo, not in the public ECR image. (2) The residential proxy network with country, state, city, and ISP targeting. Self-hosted runs from your egress IP. (3) Integrated CAPTCHA solving across image, audio, hCaptcha, reCAPTCHA, and Cloudflare Turnstile. Self-hosted hits a CAPTCHA and stops. (4) The credit-billed concurrency, plan-bounded run history, and managed audit log retention. Self-hosted, you keep whatever your host and Postgres retention allow. The agent loop itself, including all three agents and the full LLM provider list, is identical.

Can I run Skyvern fully offline, with a local model?

Yes, the runtime supports Ollama and any OpenAI-compatible local endpoint (the Skyvern repo ships env.ollama.example as a starting point). You set ENABLE_OLLAMA=true in .env, point it at the Ollama host, and the Planner-Actor-Validator loop runs through the local model. Two practical caveats. First, the vision pass on every step is the throughput bottleneck; small local vision models take seconds per step where a hosted Sonnet-class model takes a fraction of a second, and that compounds across long workflows. Second, hardened sites that lean heavily on anti-bot tooling will still trip on the missing cloud-only fabric, regardless of which model you swapped in.

Which environment variables do I actually need to set in .env?

At minimum: an LLM provider toggle and the matching API key (ENABLE_OPENAI=true plus OPENAI_API_KEY, or the equivalent for Anthropic, Bedrock, Gemini, Azure OpenAI, OpenRouter, Groq, or Ollama). DATABASE_STRING is pre-filled to point at the bundled postgres service. BROWSER_TYPE=chromium-headful is the default and rarely needs changing. ENABLE_CODE_BLOCK=true if you want workflows to run custom code blocks. If you wire in Bitwarden vault integration, the matching BWS_ACCESS_TOKEN and vault host. The repo ships .env.example, env.litellm.example, and env.ollama.example as three working starting points; copy whichever matches your provider, then layer your keys on top.

How do I upgrade a self-hosted Skyvern when a new release ships?

On Docker, docker compose pull then docker compose up -d. The compose file pins both skyvern and skyvern-ui to the :latest tag, so a pull always grabs the most recent public ECR image. The latest tag at the time of writing is v1.0.34 (released 7 May 2026). On pip, pip install --upgrade skyvern. Migrations run on container start; the bundled Postgres volume is preserved across restarts. If you have pinned a specific version in your fork, pin the tag explicitly in docker-compose.yml (for example public.ecr.aws/skyvern/skyvern:v1.0.34) and bump deliberately rather than tracking latest in production.

Where do my workflows and artifacts live on disk?

Inside the postgres container by default; the compose file mounts ./postgres-data on the host as the persistent volume for postgres data, which means your workflow definitions and run records survive container rebuilds. Skyvern artifacts (screenshots, DOM snapshots, downloaded files) are exposed via the artifact API on port 9090 and stored under the skyvern container's volume. If you want to back them up out of process, mount that directory to a host path or to S3 via a sidecar. The pip install path puts SQLite at ~/.skyvern/data.db (as of v1.0.31+) instead of Postgres, which is fine for a single user but not for shared deployments.

Does self-hosting Skyvern help with workflows that touch SAP, Oracle Forms, or Citrix-rendered Epic?

No, and this is the boundary self-hosting does not move. Self-hosted Skyvern is still Playwright driving a Chromium tab. The Actor scores elements rendered inside that tab; the Validator reads a screenshot of that tab. A SAP GUI window is a Win32 process that publishes its UI through Microsoft UI Automation, not through the Chromium DOM. An Oracle Forms session is the same. A Jack Henry or Fiserv green-screen terminal is the same. An Epic Hyperspace patient chart inside Citrix renders pixels the agent can see and an accessibility tree the agent cannot read from inside a browser. Self-hosting moves the data plane onto your hardware; it does not change the surface the agent loop reads. For workflows that leave the browser tab, the natural unit of work is wall-clock time on Windows accessibility APIs, not browser-execution credits.

When does it make sense to self-host Skyvern instead of using the cloud?

Three honest cases. (1) Data residency: your workflow operates on data that cannot leave your VPC, so the screenshots and DOM snapshots cannot reach a third-party host. (2) Cost predictability at high volume: you have a workflow that runs millions of steps a month and the credit math for the cloud tier is worse than your own infra plus inference bill. (3) Custom fork: you need to modify the agent loop itself, ship a Validator behavior the open core does not have, or integrate a private model. Outside those three, the cloud product's bundled anti-bot, proxies, and CAPTCHA solving are usually worth more than the infra savings, especially for any workflow that touches a hardened portal.

Keep reading

Architecture

Skyvern, plainly: the planner-actor-validator architecture and the Chromium-tab edge

A close read of the three cooperating agents that decide what to click, the WebBench task-category split, and where the architecture stops.

Read

Pricing

Skyvern pricing decoded: what a credit actually buys

What the credits-based 2026 cloud tier prices, what one credit measures, and the architectural reason it can only price browser-tab runtime.

Read

Input layer

RPA agent UI input layer: accessibility tree versus pixels

The choice of input surface is the most consequential architectural decision an RPA agent makes. Walks the tree-versus-pixel split and what each gives up.

Read