A web-enabled coding agent in 200 lines

Introduction

There are many coding agents, but how do they work? Surely, they must be complicated...

As it would turn out, all you need for an agent is: an LLM, a loop, and some tools.

This blog post demonstrates how to build a coding agent with web search & code execution in ~200 lines—the only external dependency being anthropic.

Our agent will be able to:

View and edit files
Search the web
Execute bash commands

You may find the full source on Github.

Jumping in

Normally, tools are defined using a JSON schema, for example a web search implementation might look like this.

We can use tools built-in to the Anthropic API, which don't require JSON schema definitions, but do have different characteristics.

def web_search(topic):
    print(f"pretending to search the web for {topic}")
 
web_search_tool = {
    "name": "web_search",
    "description": "A tool to retrieve up to date information on a given topic by searching the web",
    "input_schema": {
        "type": "object",
        "properties": {
            "topic": {
                "type": "string",
                "description": "The topic to search the web for"
            },
        },
        "required": ["topic"]
    }
}

Init

Claude comes with predefined tools that require shorter definitions: text editor, web search, and bash. These are the only tools we need for this demonstration.

We start with imports and tool definitions. Setting web search max_uses to 5 prevents research loops.

load_dotenv()
 
ANTHROPIC_MODEL = "claude-sonnet-4-0"
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
 
if not ANTHROPIC_API_KEY:
    raise ValueError("ANTHROPIC_API_KEY must be set")
 
ANTHROPIC_TOOLS = [
    {"type": "text_editor_20250728", "name": "str_replace_based_edit_tool"},
    {"type": "web_search_20250305", "name": "web_search", "max_uses": 5},
    {"type": "bash_20250124", "name": "bash"},
]

Prompting

Now, we'll take a look at our prompt—it's recommended that the system prompt only contain the model's role.

We split the prompt in our code and load only the role tag, the rest is in the first user message.

Using best practices for prompts helps with tool execution and reasoning.

XML

We define prompt blocks in XML tags for structure and interpretability by the model.

This is a best practice that I've found useful in my own projects. A nice side effect is that prompts are more human-readable, too.

Context & role

We build context around the task and clearly define the role of the agent. We include the role tag in the system prompt and pass the rest of the prompt as the first user message.

Thinking

Using the <thinking_process> block, we encourage the model to think through each problem.

This is also known as "chain of thought".

Instructions

We use explicit, declarative instructions on exactly how the model should perform a given task. This includes the steps the model should take on each turn.

Tool use

We define tool use best practices to ensure parallel tool calls and proper work checking.

By default, parallel tool use is enabled, but explicit prompts can maximize parallel use.

<role>
You are an expert software engineering assistant specializing in code analysis,
debugging, and implementation. You excel at understanding codebases, identifying
issues, and implementing clean, maintainable solutions that follow best
practices.
 
You are working in the current directory. When referencing files, use relative
paths from the current working directory unless you specifically need an
absolute path.
</role>
 
<thinking_process>
Before taking any action, think through the problem step by step:
 
1. **Analyze**: What is the specific request or error? What context do I need?
2. **Plan**: What tools and steps are needed to address this effectively?
3. **Execute**: Implement the solution methodically
4. **Verify**: Ensure the solution addresses the original problem
 
Always reason through your approach before acting.
</thinking_process>
 
<instructions>
When working with code:
 
1. **Understanding First**: Always examine existing files to understand the
   current state, structure, and patterns
2. **Targeted Changes**: Use precise `str_replace` operations that maintain
   code quality and consistency  
3. **File Creation**: When creating new files, first understand the project
   structure and follow existing conventions
4. **Testing**: Always use `uv run` instead of `Python` for execution (e.g.,
   `uv run test.py`)
5. **Error Handling**: Provide clear, helpful error messages when operations
   fail
 
For each task:
- Start by thinking through what you need to understand
- Gather necessary information through file inspection
- Plan your approach before making changes
- Execute changes systematically
- Verify results by executing any file you create or edit
 
Please be concise and direct in your responses.
</instructions>
 
<tool_usage_best_practices>
- Use parallel tool calls when performing multiple independent operations
- Always check if files exist before attempting to modify them
- Provide detailed, helpful feedback about what actions were taken
- Verify results by executing any file you create or edit
</tool_usage_best_practices>
 
<code_quality_principles>
- Write clean, readable, and maintainable code
- Follow existing project conventions and patterns
- Include appropriate error handling
- Make minimal, focused changes that solve the specific problem
- Ensure changes don't break existing functionality
</code_quality_principles>

Handling Tools

"Tool use" can be a confusing term—while we're equipping the model with the knowledge to execute tools, we still need to provide the tools themselves.

We have one server tool (web search), but other tools need local execution.

This function defines a group of tool actions that we'll give access to our model to execute.

execute_tool

We accept a tool_name and tool_input, then route tool requests to the appropriate operation. This provides a nice way to implement error and retry logic close to the tool implementations.

Some best practices when executing tools:

Adding an is_error property to the response, which we can then pass to Claude
Using proper try / except logic with detailed logging for the agent

For this implementation, we only log errors. You could add retry logic as needed.

View

Be sure to handle both directories and files

String replace

Best practice: replace only the content that needs to be altered, rather than the entire file.

Bash

Best practice: ensure reasonable timeouts for our bash tool and return both stdout & stderr to our agent

def execute_tool(tool_name: str, tool_input: dict) -> dict:
    """Execute a tool and return structured result with error handling."""
    try:
        # string replace tools
        if tool_name == "view":
            path = Path(str(tool_input.get("path")))
            if path.is_file():
                content = path.read_text()
                return {"content": content, "is_error": False}
            elif path.is_dir():
                content = "\n".join(sorted([f.name for f in path.iterdir()]))
                return {"content": content, "is_error": False}
            else:
                return {"content": f"Error: {path} does not exist", "is_error": True}
        elif tool_name == "create":
            path = Path(str(tool_input.get("path")))
            content = str(tool_input.get("file_text"))
            if not content:
                return {
                    "content": "Error: No content provided in file_text",
                    "is_error": True,
                }
            path.parent.mkdir(parents=True, exist_ok=True)
            path.write_text(content)
            return {"content": f"File {path} written successfully", "is_error": False}
        elif tool_name == "str_replace":
            path = Path(str(tool_input.get("path")))
            old_str = str(tool_input.get("old_str"))
            new_str = str(tool_input.get("new_str"))
 
            if not path.exists():
                return {
                    "content": f"Error: File {path} does not exist",
                    "is_error": True,
                }
 
            content = path.read_text()
            if old_str not in content:
                return {
                    "content": f"Error: String '{old_str}' not found in {path}",
                    "is_error": True,
                }
 
            new_content = content.replace(old_str, new_str, 1)
            path.write_text(new_content)
            return {
                "content": f"Replaced '{old_str}' with '{new_str}' in {path}",
                "is_error": False,
            }
        # bash tools
        elif tool_name == "bash":
            command = tool_input.get("command")
            print(command)
            if not command:
                return {
                    "content": "Error: No input in command",
                    "is_error": True,
                }
            result = subprocess.run(
                command,
                shell=True,
                capture_output=True,
                text=True,
                timeout=30,  # Add timeout for safety
            )
 
            # Return both stdout and stderr, mark as error if non-zero exit code
            output = f"stdout: {result.stdout}\nstderr: {result.stderr}"
            return {"content": output, "is_error": result.returncode != 0}
        else:
            return {
                "content": f"Error: Unknown tool '{tool_name}'",
                "is_error": True,
            }
    except Exception as e:
        return {
            "content": f"Error executing {tool_name}: {str(e)}",
            "is_error": True,
        }

Building an agent

Now, we'll take a look at our agent:

The agent uses a while loop that handles different cases.

First, we initialize the Anthropic client and pass in our tools. The temperature is set to a low value to encourage concise responses.

Caching

Prompt caching caches the full prefix up to the cache point in the following order: tools, system, messages.

That means our cache point caches the system prompt, tools, and first message.

Since this is sent with every message, we get cost savings during the cache window (5 minutes by default).

Stop reasons

Next, we handle stop_reasons, which is the client's way of communicating why the chat ended.

A best practice is to implement robust stop reason handling. We keep handling short for our demo, but we recommend handling all possible reasons.

Responses

Now, we loop through responses and check for events: text, tool_use and citations (from our web results). These are surfaced to the user and our agent.

Tools

If tools were requested, we make conditional calls to our executor function by extracting relevant details.

You might notice our web search handling differs from the string replace & bash tools. That's because web search is a server tool with a different output structure.

Final response

We return the results to the client and raise an error if one occurred.

This while loop makes another good structure for top-level retries or error handling logic.

We simply raise an exception, but you could imagine some number of more complex iterations. Messages and tool results are tracked through message blocks, returned to the client.

Thoughts

And thats... it. 200 lines, a few while loops, and some tool calls.

There's something magical about understanding the tools we use everyday—Claude Code, Cursor, and the like can feel like magic, but they're just tools.

An "agent" is just a loop that handles different cases using AI... and that doesn't have to be complicated.

Now, a production agent is another story, but this is a good starting point for anyone whose curious about how agents work.

if __name__ == "__main__":
    # Load and parse prompt
    prompt_content = Path("./public/instructions.md").read_text()
 
    system_prompt = prompt_content[
        prompt_content.find("<role>") + 6 : prompt_content.find("</role>")
    ].strip()
 
    instructions_content = prompt_content[
        prompt_content.find("<thinking_process>") :
    ].strip()
 
    client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
 
    while True:
        user_input = input("💬 User: ")
        # Cache everything up to first user message
        messages = [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": instructions_content,
                        "cache_control": {"type": "ephemeral"},
                    }
                ],
            },
            {"role": "user", "content": user_input},
        ]
 
        while True:
            # TODO: delegate—haiku for simple tasks, opus for complex
            response = client.messages.create(
                model=ANTHROPIC_MODEL,
                system=[{"type": "text", "text": system_prompt}],
                max_tokens=4096,
                temperature=0.2,
                messages=messages,  # type: ignore
                tools=ANTHROPIC_TOOLS,  # type: ignore
            )
            # TODO: stop reason handling
            if response.stop_reason in ["tool_use"]:
                tool_results = []
                tool_calls = []
 
                # First pass: collect all tool calls and display text
                for block in response.content:
                    if hasattr(block, "text"):
                        print(block.text)
                    # web search tool
                    if block.type == "server_tool_use":
                        print(f"Searched for: {block.input.get('query')}")
                    if hasattr(block, "citations") and block.citations:
                        print(f"Cited sources: {len(block.citations)}")
                    if block.type == "tool_use":
                        if block.name == "bash":
                            tool_name = block.name
                        elif block.name == "str_replace_based_edit_tool":
                            tool_name = block.input.get("command", None)
 
                        tool_calls.append(
                            {
                                "tool_name": tool_name,
                                "tool_use_id": block.id,
                                "tool_input": block.input,
                            }
                        )
 
                # Second pass: execute all tools
                if tool_calls:
                    print(f"Executing {len(tool_calls)} tool(s)...")
                    for tool_call in tool_calls:
                        tool_name, tool_use_id, tool_input = (
                            tool_call["tool_name"],  # type: ignore
                            tool_call["tool_use_id"],  # type: ignore
                            tool_call["tool_input"],  # type: ignore
                        )
 
                        print(f"Executing tool: {tool_name}")
 
                        result = execute_tool(tool_name, tool_input)
                        print(result["content"])
 
                        # Handle structured error results
                        tool_result = {
                            "type": "tool_result",
                            "tool_use_id": tool_use_id,
                            "content": result["content"],
                        }
 
                        if result["is_error"]:
                            tool_result["is_error"] = True
 
                        tool_results.append(tool_result)
                messages.append(
                    {
                        "role": "assistant",
                        "content": [block for block in response.content],
                    }
                )
                if tool_results:
                    messages.append({"role": "user", "content": tool_results})
 
                continue
            else:
                # Handle non-tool responses
                for block in response.content:
                    if hasattr(block, "text"):
                        print(block.text)
                if response.stop_reason in ["end_turn"]:
                    break  # Break out of inner loop to restart conversation

Next steps

Implement more robust stop reason handling, retry logic, and try / except blocks
Implement streaming for more responsive messages
Turn our simple agent into a Multi-agent Architecture with Opus as an orchestrator and Haiku for lightweight tasks
Play with remote code execution for a sandboxed approach
Reduce latency in our responses
Add guardrails to bash tool execution

A big thank you to Thorsten Ball of Amp for the inspiration on this project.