A common mistake I see teams make when they first start running AI agents in production: they instrument the model and call it done. They set up LLM observability tooling — tracking token counts, latency, cost per API call — and think they have good visibility into their system.

They don't. They have model observability. What they need is agent observability. These are related but fundamentally different things.

What model observability gives you

Model observability tools — think LangSmith's basic features, Helicone, or even just LLM API dashboards — give you visibility into individual model calls:

This is genuinely useful for understanding your model spend and catching model API issues. But for an agent, it's only a slice of what's happening.

What model observability misses

An AI agent is not just a sequence of LLM calls. It's a program that uses LLM calls as one of many operations. When an agent fails, the failure is often not in the LLM call — it's in what the agent does with the LLM's output.

Consider a customer service agent that:

Model observability gives you visibility into steps 1, 3, and 5. It gives you nothing about steps 2 and 4. And crucially, it gives you no visibility into the relationships between these steps — how the output of step 1 affected step 2, how the failure in step 4 cascaded to step 5.

Model observability treats your agent as a bag of API calls. Agent observability treats it as what it is: a stateful program with branching logic, tool dependencies, and accumulated state.

The key differences

Unit of observation: Model observability's unit is the LLM API request. Agent observability's unit is the run — a complete execution from input to output that may span many LLM calls and many non-LLM operations.

State: Model observability is stateless — each request is independent. Agent observability tracks state accumulated across an entire run, because state determines behavior.

Causality: Model observability shows you what happened. Agent observability shows you why it happened — because you can see the full decision chain, not just isolated data points.

Tool calls: Model observability has no concept of tool calls (except where the LLM uses function calling, and even then only shows the function call decision, not the execution result). Agent observability captures tool execution: inputs, outputs, latency, errors.

Replay: Model observability cannot replay a run because it doesn't have enough information. Agent observability makes replay possible because it captures the complete run state.

Do you need both?

Yes, but you should understand what each layer gives you. Model observability is valuable for cost management and model API health. Agent observability is what you need for debugging, reliability, and understanding your system's behavior in production.

Think of it like application performance monitoring (APM) vs. infrastructure monitoring. Infrastructure monitoring tells you your servers are healthy. APM tells you why your users are experiencing slow page loads. You need both, but they answer different questions.

Get the full picture — not just the model calls.

Connect your first agent — free