Home / Blog / Agent SDK

The Claude Agent SDK: build your own agent

The whole platform I run is built on the Claude Agent SDK. Not on the raw API with a tool loop I wrote myself, and not on Claude Code automated with a script - on the SDK that hands you the exact agent loop powering Claude Code, inside your own program. What the SDK actually is, a minimal agent you can run today, how tools and MCP servers plug in, and the call between the SDK and just using Claude Code.

What the Claude Agent SDK is

The Claude Agent SDK is a library that runs the same agent loop, built-in tools, and context management that power Claude Code, driven from your own Python or TypeScript program. You give it a prompt and a set of tools, and it autonomously reads files, runs commands, calls tools, and works toward the goal until it's done. Anthropic renamed it from the Claude Code SDK in late 2025, and that rename is the clearest description of what it is: Claude Code, as a library.

The distinction that matters most is the one against the plain API. With the Anthropic client SDK you send a prompt, get back a request to call a tool, run the tool yourself, send the result, and loop - you own that whole machine. The Agent SDK owns it for you. You hand it the goal and the tools; it runs the loop. That single difference is why I built on it instead of wiring my own loop around the messages endpoint: the loop is the hard part, and it's the part that's already battle-tested in Claude Code.

The two packages, one per language:

# Python, needs 3.10+
pip install claude-agent-sdk

# TypeScript / Node
npm install @anthropic-ai/claude-agent-sdk

I'll use Python for the examples because that's what our platform runs on, but every example maps one-to-one to TypeScript - the option names just switch from snake_case to camelCase. You authenticate with an ANTHROPIC_API_KEY environment variable from the Console, or via Bedrock, Vertex, or Azure if your model lives there.

Your first agent in 10 lines

A working agent is one call to query() with a prompt and a list of allowed tools. You iterate over the messages it yields as it works. Here's a complete, runnable agent that finds and fixes a bug:

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions


async def main():
    async for message in query(
        prompt="Find and fix the bug in auth.py",
        options=ClaudeAgentOptions(allowed_tools=["Read", "Edit", "Bash"]),
    ):
        print(message)


asyncio.run(main())

That's not pseudocode. It runs. The agent reads auth.py with the Read tool, reasons about what's wrong, edits the file with Edit, and can run the tests with Bash to check itself - all without you implementing a single tool. The built-in tools (Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch) ship with the SDK and execute on your machine.

The one trap worth naming early: allowed_tools is a real boundary, not a hint. If you leave Bash off that list, the agent physically cannot run a shell command, no matter what the prompt says. Treat that list as the permission set for the agent, because that's exactly what it is.

The agent loop you're not writing

The agent loop is the cycle the SDK runs for you: gather context, take an action with a tool, observe the result, decide the next action, repeat until the task is done or a limit is hit. This loop is the whole product. Everything else is configuration around it.

The agent loop you're not writing Modeldecides the next action SDK runs the toolyour tool / MCP Observe resultoutput fed back Done?or max_turns hit not yet, loop scalably.io
One turn of the loop. The SDK runs this cycle so you don't have to.

Concretely, one turn of the loop is: the model looks at the conversation and the tool results so far, decides whether to call a tool or to answer, and if it calls a tool the SDK executes it and feeds the output back. Then the model looks again. It keeps going - read a file, grep for a caller, edit, run the test, read the failure, edit again - until it reaches an answer. You can cap it with max_turns so a confused agent can't loop forever:

options = ClaudeAgentOptions(
    allowed_tools=["Read", "Glob", "Grep"],
    max_turns=8,
)

A hand-rolled tool loop looks fine in a demo and then falls apart on the edges: the model returns two tool calls at once, a tool errors halfway, the context window fills and you have to decide what to drop. The SDK handles parallel tool calls, tool failures, and context compaction the same way Claude Code does, because it is the same code. Inheriting that is cheaper than re-debugging it.

Where context goes: the SDK manages the context window for you, compacting older turns when it fills. By default it loads your user, project, and local filesystem settings, including any CLAUDE.md files, matching the CLI. For an isolated agent, which is usually what you want in production or a multi-tenant setup, pass setting_sources=[] to load none of it. Getting this backwards catches people once.

Giving it tools and MCP servers

You extend an agent beyond the built-in tools by attaching MCP servers through the mcp_servers option. Each server advertises its own tools, and the agent can call them inside the same loop. This is how you give an agent a database, a browser, or your own API.

The Model Context Protocol is the standard the SDK speaks for external tools, so anything exposed as an MCP server plugs straight in. If you've never built one, my walkthrough on how to build an MCP server in Python covers the tool definitions and safety model the agent will be calling into. Here's an agent wired to the Playwright server so it can drive a real browser:

options = ClaudeAgentOptions(
    mcp_servers={
        "playwright": {"command": "npx", "args": ["@playwright/mcp@latest"]}
    },
    allowed_tools=["mcp__playwright__browser_navigate", "mcp__playwright__browser_snapshot"],
)

Two things to notice. The server is launched by command, so the SDK starts it as a subprocess and talks to it over stdio. And the tools it exposes are namespaced - every MCP tool name takes the form mcp__<server>__<tool>, which is also how you allow them. If you understand what an MCP server is, this is the payoff: the same server you point Claude Desktop at works unchanged inside an agent you wrote.

Your own tools, in process

For tools specific to your app, you don't need a separate MCP server process. Define a function with the @tool decorator and bundle it with create_sdk_mcp_server, and it runs in the same process as your agent - no subprocess, no transport.

This is the option I reach for most, because most real tools are just "call this internal function with these typed arguments." The decorator takes a name, a description the model reads, and the argument schema:

from claude_agent_sdk import tool, create_sdk_mcp_server, query, ClaudeAgentOptions
import asyncio


@tool("order_status", "Look up the status of a customer order by id", {"order_id": str})
async def order_status(args):
    status = lookup_order(args["order_id"])  # your real logic
    return {"content": [{"type": "text", "text": status}]}


orders = create_sdk_mcp_server(name="orders", version="1.0.0", tools=[order_status])


async def main():
    options = ClaudeAgentOptions(
        mcp_servers={"orders": orders},
        allowed_tools=["mcp__orders__order_status"],
    )
    async for message in query(prompt="Where is order 10421?", options=options):
        print(message)


asyncio.run(main())

The model reads the description to decide when to call the tool, so write it like you're telling a new teammate what the function is for. A vague description is the single most common reason an agent ignores a tool you gave it. The argument schema is what the model fills in, and the SDK validates it before your function runs.

What it gives you in production

In production the Agent SDK's real value is the parts you'd otherwise spend months hardening: the loop, tool execution, permission control, subagents, and per-call model choice. It runs inside your own process, on your own infrastructure, which means your data and your tools never leave your environment.

Across many tenants, a few things stop being theoretical. Permissions are the first. allowed_tools and permission_mode are the difference between an agent that can only read and one that can run shell commands, and that boundary is enforced by the SDK, not by hoping the prompt holds. For anything that writes, I gate the specific tool, not the whole agent.

Model routing is the second. The model option is per call, so you don't pick one model for the whole app. A cheap, fast model handles the simple turns; a stronger one handles the hard reasoning. Choosing per task instead of per app is the lever that keeps an agent platform affordable without making it dumb, and the SDK puts that choice in a single field.

Third is delegation. The SDK lets a main agent spawn subagents with their own focused instructions and their own narrower tool set, which is how you keep one agent from trying to hold an entire job in a single context window. And hooks let you run your own code at fixed points in the lifecycle - log every file write, block a tool call that fails a check, audit what the agent did. That audit trail matters a lot more when the agent is acting for someone else's business than when it's editing your own repo.

When to use the SDK vs Claude Code

Use Claude Code for interactive work you drive yourself at a terminal. Use the Agent SDK when the agent has to run inside something else: a backend service, a CI pipeline, a product your users touch, anything that runs without a human typing each prompt. Same engine, different driver's seat.

The clean test is whether a person is in the loop. If you're sitting there reviewing each step, Claude Code is the better tool and you should not reach for the SDK. The moment the agent needs to run on a schedule, respond to an event, serve a request, or operate for many users at once, you need it embedded in your own program, and that's the SDK. Here's how I split it:

SituationReach for
Daily development at your terminalClaude Code
One-off refactor or investigationClaude Code
Agent triggered by an API request or eventAgent SDK
A step inside a CI/CD pipelineAgent SDK
A product feature many users hitAgent SDK
Anything multi-tenant or scheduledAgent SDK

Plenty of teams use both, and that's the right answer, not a compromise. Claude Code for building, the SDK for shipping. The skills, prompts, and tool configs you settle on while using Claude Code translate straight across to the SDK, because underneath they're the same agent.

The Agent SDK is not a thin wrapper over the messages API. It's the agent loop itself, the same one that's already proven in Claude Code, with the slot left open for you to drop in your own tools, your own permissions, and your own program around it. Start with the ten-line example, give it one tool that matters to you, and you'll have a real agent running before you've finished deciding what to build.

P

Pavle Lazic is the founder of Scalably, where he builds and runs multi-tenant Claude agent platforms in production for real businesses, entirely on the Claude Agent SDK. He writes about the Agent SDK, MCP servers, and what it actually takes to put AI agents to work. See the platform.