Introduction
There are many coding agents, but how do they work? Surely, they must be complicated...
As it would turn out, all you need for an agent is: an LLM, a loop, and some tools.
This blog post demonstrates how to build a coding agent with web search & code execution in ~200 lines—the only external dependency being anthropic.
Our agent will be able to:
- View and edit files
- Search the web
- Execute bash commands
You may find the full source on Github.
Jumping in
Normally, tools are defined using a JSON schema, for example a web search implementation might look like this.
We can use tools built-in to the Anthropic API, which don't require JSON schema definitions, but do have different characteristics.
def web_search(topic):
print(f"pretending to search the web for {topic}")
web_search_tool = {
"name": "web_search",
"description": "A tool to retrieve up to date information on a given topic by searching the web",
"input_schema": {
"type": "object",
"properties": {
"topic": {
"type": "string",
"description": "The topic to search the web for"
},
},
"required": ["topic"]
}
}Init
Claude comes with predefined tools that require shorter definitions: text editor, web search, and bash. These are the only tools we need for this demonstration.
We start with imports and tool definitions. Setting web search max_uses to 5 prevents research loops.
load_dotenv()
ANTHROPIC_MODEL = "claude-sonnet-4-0"
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
if not ANTHROPIC_API_KEY:
raise ValueError("ANTHROPIC_API_KEY must be set")
ANTHROPIC_TOOLS = [
{"type": "text_editor_20250728", "name": "str_replace_based_edit_tool"},
{"type": "web_search_20250305", "name": "web_search", "max_uses": 5},
{"type": "bash_20250124", "name": "bash"},
]Prompting
Now, we'll take a look at our prompt—it's recommended that the system prompt only contain the model's role.
We split the prompt in our code and load only the role tag, the rest is in the first user message.
Using best practices for prompts helps with tool execution and reasoning.
XML
We define prompt blocks in XML tags for structure and interpretability by the model.
This is a best practice that I've found useful in my own projects. A nice side effect is that prompts are more human-readable, too.
Context & role
We build context around the task and clearly define the role of the agent. We include the role tag in the system prompt and pass the rest of the prompt as the first user message.
Thinking
Using the <thinking_process> block, we encourage the model to think through each problem.
This is also known as "chain of thought".
Instructions
We use explicit, declarative instructions on exactly how the model should perform a given task. This includes the steps the model should take on each turn.
Tool use
We define tool use best practices to ensure parallel tool calls and proper work checking.
By default, parallel tool use is enabled, but explicit prompts can maximize parallel use.
<role>
You are an expert software engineering assistant specializing in code analysis,
debugging, and implementation. You excel at understanding codebases, identifying
issues, and implementing clean, maintainable solutions that follow best
practices.
You are working in the current directory. When referencing files, use relative
paths from the current working directory unless you specifically need an
absolute path.
</role>
<thinking_process>
Before taking any action, think through the problem step by step:
1. **Analyze**: What is the specific request or error? What context do I need?
2. **Plan**: What tools and steps are needed to address this effectively?
3. **Execute**: Implement the solution methodically
4. **Verify**: Ensure the solution addresses the original problem
Always reason through your approach before acting.
</thinking_process>
<instructions>
When working with code:
1. **Understanding First**: Always examine existing files to understand the
current state, structure, and patterns
2. **Targeted Changes**: Use precise `str_replace` operations that maintain
code quality and consistency
3. **File Creation**: When creating new files, first understand the project
structure and follow existing conventions
4. **Testing**: Always use `uv run` instead of `Python` for execution (e.g.,
`uv run test.py`)
5. **Error Handling**: Provide clear, helpful error messages when operations
fail
For each task:
- Start by thinking through what you need to understand
- Gather necessary information through file inspection
- Plan your approach before making changes
- Execute changes systematically
- Verify results by executing any file you create or edit
Please be concise and direct in your responses.
</instructions>
<tool_usage_best_practices>
- Use parallel tool calls when performing multiple independent operations
- Always check if files exist before attempting to modify them
- Provide detailed, helpful feedback about what actions were taken
- Verify results by executing any file you create or edit
</tool_usage_best_practices>
<code_quality_principles>
- Write clean, readable, and maintainable code
- Follow existing project conventions and patterns
- Include appropriate error handling
- Make minimal, focused changes that solve the specific problem
- Ensure changes don't break existing functionality
</code_quality_principles>Handling Tools
"Tool use" can be a confusing term—while we're equipping the model with the knowledge to execute tools, we still need to provide the tools themselves.
We have one server tool (web search), but other tools need local execution.
This function defines a group of tool actions that we'll give access to our model to execute.
execute_tool
We accept a tool_name and tool_input, then route tool requests to the appropriate operation. This provides a nice way to implement error and retry logic close to the tool implementations.
Some best practices when executing tools:
- Adding an
is_errorproperty to the response, which we can then pass to Claude - Using proper try / except logic with detailed logging for the agent
For this implementation, we only log errors. You could add retry logic as needed.
View
Be sure to handle both directories and files
String replace
Best practice: replace only the content that needs to be altered, rather than the entire file.
Bash
Best practice: ensure reasonable timeouts for our bash tool and return both stdout & stderr to our agent
def execute_tool(tool_name: str, tool_input: dict) -> dict:
"""Execute a tool and return structured result with error handling."""
try:
# string replace tools
if tool_name == "view":
path = Path(str(tool_input.get("path")))
if path.is_file():
content = path.read_text()
return {"content": content, "is_error": False}
elif path.is_dir():
content = "\n".join(sorted([f.name for f in path.iterdir()]))
return {"content": content, "is_error": False}
else:
return {"content": f"Error: {path} does not exist", "is_error": True}
elif tool_name == "create":
path = Path(str(tool_input.get("path")))
content = str(tool_input.get("file_text"))
if not content:
return {
"content": "Error: No content provided in file_text",
"is_error": True,
}
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(content)
return {"content": f"File {path} written successfully", "is_error": False}
elif tool_name == "str_replace":
path = Path(str(tool_input.get("path")))
old_str = str(tool_input.get("old_str"))
new_str = str(tool_input.get("new_str"))
if not path.exists():
return {
"content": f"Error: File {path} does not exist",
"is_error": True,
}
content = path.read_text()
if old_str not in content:
return {
"content": f"Error: String '{old_str}' not found in {path}",
"is_error": True,
}
new_content = content.replace(old_str, new_str, 1)
path.write_text(new_content)
return {
"content": f"Replaced '{old_str}' with '{new_str}' in {path}",
"is_error": False,
}
# bash tools
elif tool_name == "bash":
command = tool_input.get("command")
print(command)
if not command:
return {
"content": "Error: No input in command",
"is_error": True,
}
result = subprocess.run(
command,
shell=True,
capture_output=True,
text=True,
timeout=30, # Add timeout for safety
)
# Return both stdout and stderr, mark as error if non-zero exit code
output = f"stdout: {result.stdout}\nstderr: {result.stderr}"
return {"content": output, "is_error": result.returncode != 0}
else:
return {
"content": f"Error: Unknown tool '{tool_name}'",
"is_error": True,
}
except Exception as e:
return {
"content": f"Error executing {tool_name}: {str(e)}",
"is_error": True,
}Building an agent
Now, we'll take a look at our agent:
The agent uses a while loop that handles different cases.
First, we initialize the Anthropic client and pass in our tools. The temperature is set to a low value to encourage concise responses.
Caching
Prompt caching caches the full prefix up to the cache point in the following order: tools, system, messages.
That means our cache point caches the system prompt, tools, and first message.
Since this is sent with every message, we get cost savings during the cache window (5 minutes by default).
Stop reasons
Next, we handle stop_reasons, which is the client's way of communicating why the chat ended.
A best practice is to implement robust stop reason handling. We keep handling short for our demo, but we recommend handling all possible reasons.
Responses
Now, we loop through responses and check for events: text, tool_use and citations (from our web results). These are surfaced to the user and our agent.
Tools
If tools were requested, we make conditional calls to our executor function by extracting relevant details.
You might notice our web search handling differs from the string replace & bash tools. That's because web search is a server tool with a different output structure.
Final response
We return the results to the client and raise an error if one occurred.
This while loop makes another good structure for top-level retries or error handling logic.
We simply raise an exception, but you could imagine some number of more complex iterations. Messages and tool results are tracked through message blocks, returned to the client.
Thoughts
And thats... it. 200 lines, a few while loops, and some tool calls.
There's something magical about understanding the tools we use everyday—Claude Code, Cursor, and the like can feel like magic, but they're just tools.
An "agent" is just a loop that handles different cases using AI... and that doesn't have to be complicated.
Now, a production agent is another story, but this is a good starting point for anyone whose curious about how agents work.
if __name__ == "__main__":
# Load and parse prompt
prompt_content = Path("./public/instructions.md").read_text()
system_prompt = prompt_content[
prompt_content.find("<role>") + 6 : prompt_content.find("</role>")
].strip()
instructions_content = prompt_content[
prompt_content.find("<thinking_process>") :
].strip()
client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
while True:
user_input = input("💬 User: ")
# Cache everything up to first user message
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": instructions_content,
"cache_control": {"type": "ephemeral"},
}
],
},
{"role": "user", "content": user_input},
]
while True:
# TODO: delegate—haiku for simple tasks, opus for complex
response = client.messages.create(
model=ANTHROPIC_MODEL,
system=[{"type": "text", "text": system_prompt}],
max_tokens=4096,
temperature=0.2,
messages=messages, # type: ignore
tools=ANTHROPIC_TOOLS, # type: ignore
)
# TODO: stop reason handling
if response.stop_reason in ["tool_use"]:
tool_results = []
tool_calls = []
# First pass: collect all tool calls and display text
for block in response.content:
if hasattr(block, "text"):
print(block.text)
# web search tool
if block.type == "server_tool_use":
print(f"Searched for: {block.input.get('query')}")
if hasattr(block, "citations") and block.citations:
print(f"Cited sources: {len(block.citations)}")
if block.type == "tool_use":
if block.name == "bash":
tool_name = block.name
elif block.name == "str_replace_based_edit_tool":
tool_name = block.input.get("command", None)
tool_calls.append(
{
"tool_name": tool_name,
"tool_use_id": block.id,
"tool_input": block.input,
}
)
# Second pass: execute all tools
if tool_calls:
print(f"Executing {len(tool_calls)} tool(s)...")
for tool_call in tool_calls:
tool_name, tool_use_id, tool_input = (
tool_call["tool_name"], # type: ignore
tool_call["tool_use_id"], # type: ignore
tool_call["tool_input"], # type: ignore
)
print(f"Executing tool: {tool_name}")
result = execute_tool(tool_name, tool_input)
print(result["content"])
# Handle structured error results
tool_result = {
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": result["content"],
}
if result["is_error"]:
tool_result["is_error"] = True
tool_results.append(tool_result)
messages.append(
{
"role": "assistant",
"content": [block for block in response.content],
}
)
if tool_results:
messages.append({"role": "user", "content": tool_results})
continue
else:
# Handle non-tool responses
for block in response.content:
if hasattr(block, "text"):
print(block.text)
if response.stop_reason in ["end_turn"]:
break # Break out of inner loop to restart conversationNext steps
- Implement more robust stop reason handling, retry logic, and try / except blocks
- Implement streaming for more responsive messages
- Turn our simple agent into a Multi-agent Architecture with Opus as an orchestrator and Haiku for lightweight tasks
- Play with remote code execution for a sandboxed approach
- Reduce latency in our responses
- Add guardrails to bash tool execution
A big thank you to Thorsten Ball of Amp for the inspiration on this project.
