Skip to main content
Computer use models are vision-language models (VLMs) that operate a browser the way a person does: they look at a screenshot, decide what to do next, and emit a concrete action: move the mouse, click, type, scroll, or drag. Kernel runs these agents on cloud browsers, so you don’t install or maintain anything locally, and gives the model the low-level Computer Controls API it needs to see the screen and act on it.

How computer use works on Kernel

Every computer use integration runs the same action-observation loop:
  1. Capture a screenshot of the current browser state with the Computer Controls API.
  2. Predict the next action by sending that screenshot to your model.
  3. Execute the returned action (click, type, scroll, drag, or key press) through Computer Controls.
  4. Repeat until the task is complete.
Computer Controls emulates native keyboard and mouse input at the OS level (with human-like Bézier curves by default) instead of driving the page over the Chrome DevTools Protocol (CDP). This keeps the loop close to real user input and reduces the automation signals that bot detection systems look for. The loop works with any VLM that predicts actions from pixels. The models below are the ones we maintain ready-to-deploy templates and guides for.

Supported models

Anthropic

Claude’s computer use tool

Gemini

Google’s Gemini 2.5 Computer Use model

OpenAGI

OpenAGI’s Lux model

OpenAI

OpenAI’s computer-using agent (CUA)

Tzafon

Tzafon’s Northstar CUA Fast model

Yutori

Yutori’s Navigator n1.5 pixels-to-actions model
Using a model that isn’t listed here? Any VLM works; wire its predicted actions straight to the Computer Controls API and run the same loop.

Get started

Each model page includes a one-command template so you can deploy a working agent in minutes. For example, to scaffold the Anthropic integration:
kernel create --name my-computer-use-app --template computer-use
Pick a model above to get its template, then follow the deploy and invoke guides to run your agent on Kernel.

Build your own agent

For full control over the loop, @onkernel/cua-agent is a TypeScript library that runs it against a Kernel browser for you. You point it at a model, give it a task, and it handles the screenshots, actions, and follow-up turns.
npm install @onkernel/cua-agent @onkernel/cua-ai @onkernel/sdk
import Kernel from "@onkernel/sdk";
import { CuaAgent } from "@onkernel/cua-agent";

const client = new Kernel({ apiKey: process.env.KERNEL_API_KEY! });
const browser = await client.browsers.create({ stealth: true });

const agent = new CuaAgent({
  browser,
  client,
  initialState: {
    model: "anthropic:claude-opus-4-7", // swap to target another provider
    systemPrompt: "You are a careful browser automation agent.",
  },
});

await agent.prompt("Open news.ycombinator.com and summarize the top story.");
Switch providers by changing the model ref:
ProviderModel ref
Anthropicanthropic:claude-opus-4-7
OpenAIopenai:gpt-5.5
Geminigoogle:gemini-3-flash-preview
Tzafontzafon:tzafon.northstar-cua-fast
Yutoriyutori:n1.5-latest
Set the matching provider key (ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY, TZAFON_API_KEY, or YUTORI_API_KEY) alongside KERNEL_API_KEY.

Benefits of using Kernel for computer use

  • No local browser management: Run computer use automations without installing or maintaining browsers locally
  • Scalability: Launch multiple browser sessions in parallel for concurrent AI agents
  • Stealth mode: Built-in anti-detection features for reliable web interactions
  • Session state: Maintain browser state across runs via Profiles
  • Live view: Debug your agents with real-time browser viewing
  • Cloud infrastructure: Run computationally intensive AI agents without local resource constraints

Next steps