What Makes an Agent an Agent?
The word “agent” has been so thoroughly stretched by marketing that it risks becoming meaningless. For our purposes, an AI agent is a system where a language model is in the control loop — making decisions about what to do next, executing actions, observing results, and updating its plan based on what it finds.
This distinguishes an agent from a pipeline. A pipeline has a fixed sequence of steps determined at design time. An agent can determine its own sequence at runtime based on intermediate results. That flexibility is what makes agents powerful — and what makes them hard to build reliably.
Three capabilities define what a practical production agent can actually do: tool use, memory, and planning. Everything else is built on top of these primitives.
Tool Use in Practice
A tool is any function an agent can call: a web search, a database query, a file write, a code interpreter, an API call. Modern LLMs expose tool use via structured function-calling interfaces — the model outputs a JSON object specifying which tool to call and with what arguments, and the orchestrator executes it and returns the result.
The design of your tool interface matters enormously. Agents perform much better with tools that:
- Have clear, unambiguous names and descriptions (the model reads these to decide which tool to use)
- Return structured, parseable outputs rather than raw text
- Are idempotent wherever possible — safe to call multiple times with the same arguments
- Fail gracefully with informative error messages the model can act on
Tool description quality is a performance dial. We’ve seen 15–25% improvements in correct tool selection simply by rewriting tool descriptions to be more specific about when each tool should and shouldn’t be used.
The Three Layers of Memory
Memory is what separates a single-turn chatbot from an agent that can work on a task over hours or days. In production systems, we work with three distinct layers:
- Working memory is the agent’s context window — everything it currently knows about the task in progress. It’s fast and zero-latency, but finite and ephemeral. Structure it carefully: unstructured prose wastes tokens; well-formatted summaries pack far more usable information into the same space.
- Long-term memory lives in a vector database. The agent writes important facts, decisions, and summaries here and retrieves semantically relevant entries at the start of new tasks. This is how an agent can “remember” context from last week without burning its entire context window.
- Episodic memory is the immutable log of the agent’s actions — what it did, when, with what inputs and outputs. This serves as your debugging surface, your compliance audit trail, and potentially your training data for future fine-tuning.
The interplay between these layers is where most agent architecture design happens. When does the agent summarise vs. retain verbatim? What goes into long-term storage vs. staying in working memory? What’s worth retrieving at the start of each new run?
Planning and Reasoning
Planning is how an agent turns a high-level goal into a sequence of concrete actions. There are two broad approaches, each with distinct trade-offs:
ReAct-style planning (Reasoning + Acting) interleaves reasoning steps with tool calls. The model thinks out loud about what to do next, calls a tool, observes the result, and reasons again. It’s flexible and handles unexpected results gracefully, but it’s token-intensive and the reasoning chain can drift over long runs.
Plan-then-execute generates a structured plan upfront and then executes it step by step. Faster and cheaper for well-defined tasks, but brittle when intermediate results differ from expectations. A hybrid approach — structured plan with ReAct-style adaptation at decision points — works well for most production use cases.
The choice between these isn’t purely technical. Consider how often the real world will diverge from the plan. If the answer is “frequently,” invest in ReAct. If the answer is “rarely,” plan-then-execute will be faster and easier to audit.
Orchestrating It Together
In a working agent, these three primitives interact constantly. The planning step reads from working memory and long-term memory to generate a course of action. Tool calls update working memory with new information. Important results get written to long-term storage. The episodic log captures everything.
The orchestrator — the code that ties it all together — is responsible for maintaining the run state, enforcing timeouts, handling tool call failures, and deciding when to escalate to a human or terminate the run. It’s the least glamorous part of agent engineering and arguably the most important.
A common mistake is to treat the orchestrator as a thin wrapper. In practice, the quality of your orchestration logic — how it handles ambiguous states, partial failures, and contradictory tool outputs — determines whether your agent is a research toy or a production system.
Conclusion
Tool use, memory, and planning are not new ideas — they’ve been core concepts in AI research for decades. What’s new is that modern LLMs have made all three practical to build with relatively small teams and modest infrastructure.
The engineering challenges aren’t conceptual; they’re in the details. How do you write tool descriptions that reliably guide correct selection? How do you decide what earns a place in long-term memory? How do you keep reasoning chains from drifting? These are the questions that separate agents that demo well from agents that run in production.
If you’re building your first production agent, start with the simplest possible version of each primitive and add sophistication where the data tells you to. A well-designed simple agent beats a poorly-designed complex one every time.