AI agents are one of the most overused terms in tech right now — and one of the most genuinely transformative ideas. But what does "AI agent" actually mean from a software engineering perspective? And how do you build one that works reliably in production?
Let's cut through the noise.
What Makes Something an "Agent"?
A traditional LLM call is stateless and single-shot: you send a prompt, you get a completion. An AI agent extends this with a loop:
- Observe — receive a goal and the current state of the world
- Think — reason about what action to take next
- Act — execute a tool or produce output
- Repeat — use the result to inform the next step
The key difference is that the LLM drives a multi-step process, deciding which tools to call and when to stop. The program doesn't just answer — it does things.
The Four Building Blocks
Every agent, regardless of framework, has the same four components:
| Component | What it is | Example |
|---|---|---|
| LLM | The reasoning engine | Claude, GPT-4, Gemini |
| Tools | Functions the agent can call | search(), write_file(), send_email() |
| Memory | Context the agent carries forward | Conversation history, retrieved docs |
| Action loop | The driver that keeps it running | A while loop + tool dispatch |
Your First Agent: File Summariser
Here's a minimal but complete agent using the Claude API. It can read files and answer questions about their contents:
import Anthropic from "@anthropic-ai/sdk";
import fs from "fs";
const client = new Anthropic();
const tools: Anthropic.Tool[] = [
{
name: "read_file",
description: "Read the contents of a file at the given path",
input_schema: {
type: "object",
properties: {
path: { type: "string", description: "Relative path to the file" },
},
required: ["path"],
},
},
{
name: "list_files",
description: "List all files in a directory",
input_schema: {
type: "object",
properties: {
directory: { type: "string", description: "Directory path" },
},
required: ["directory"],
},
},
];
function executeToolCall(name: string, input: Record<string, string>): string {
switch (name) {
case "read_file":
return fs.readFileSync(input.path, "utf-8");
case "list_files":
return fs.readdirSync(input.directory).join("\n");
default:
return `Unknown tool: ${name}`;
}
}
async function runAgent(goal: string): Promise<void> {
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: goal },
];
while (true) {
const response = await client.messages.create({
model: "claude-opus-4-5",
max_tokens: 4096,
tools,
messages,
});
// Add assistant turn to history
messages.push({ role: "assistant", content: response.content });
if (response.stop_reason === "end_turn") {
// Agent decided it's done — print final answer
const text = response.content.find((b) => b.type === "text");
if (text && text.type === "text") console.log(text.text);
break;
}
// Process tool calls
const toolResults: Anthropic.ToolResultBlockParam[] = [];
for (const block of response.content) {
if (block.type === "tool_use") {
const result = executeToolCall(
block.name,
block.input as Record<string, string>
);
toolResults.push({
type: "tool_result",
tool_use_id: block.id,
content: result,
});
}
}
// Feed results back
messages.push({ role: "user", content: toolResults });
}
}
// Usage
runAgent("List the TypeScript files in ./src and summarise what each one does");Run this and Claude will call list_files, then call read_file for each .ts file, then synthesise a summary — all autonomously.
The ReAct Pattern
Most production agents use a pattern called ReAct (Reason + Act). The model explicitly writes out its reasoning before each action:
Thought: I need to find out which files are in the src directory first.
Action: list_files(directory="./src")
Observation: index.ts, agent.ts, tools.ts, utils.ts
Thought: Now I should read each file to understand its purpose.
Action: read_file(path="./src/agent.ts")
Observation: [file contents...]
...This "chain of thought" dramatically improves reliability — the model is less likely to make hasty decisions when it explicitly reasons through each step. Claude does this naturally with the tools parameter without needing explicit prompting.
Common Pitfalls
Infinite loops — Always set a max iterations limit. Agents can get stuck trying the same failing tool call repeatedly.
const MAX_TURNS = 20;
let turns = 0;
while (turns < MAX_TURNS) {
// ... agent loop
turns++;
}Tool errors breaking the loop — Wrap tool execution in try/catch and return the error as a tool result, not a thrown exception. The agent can then decide what to do with it.
No stopping condition — Make sure your task description includes clear success criteria. "Research this topic" loops forever; "Research this topic and write a 500-word summary" gives the agent a concrete finish line.
When to Use Agents vs Simple LLM Calls
Not everything needs to be an agent. Use a plain LLM call when:
- The task fits in a single prompt
- You need consistent, deterministic output
- Latency is critical (agents add round-trips)
Use an agent when:
- The task requires multiple sequential steps
- You don't know upfront which steps are needed
- External data or actions are required mid-task
What's Next
In upcoming posts we'll cover multi-agent systems (agents calling other agents), persistent memory with vector stores, and building production-grade agent pipelines with proper error handling and observability.
The code above is production-usable — the main thing you'd add for real deployments is proper logging, retry logic, and a state persistence layer so agents can resume after failures.
