Memory, Context, and the Long Arc of Agent Cognition

Ask any engineer who has deployed a production AI agent and they’ll eventually mention the same limitation: agents forget.

This isn’t a metaphor. It’s a technical reality. Most large language models operate within a context window—a finite amount of text they can consider at any moment. When that window closes, what came before is gone.

For a system meant to execute long, multi-step tasks, this is a fundamental constraint.

The Three Layers of Memory

When researchers talk about adding memory to agents, they typically mean one of three things:

In-context memory is the simplest: the agent’s current context window, containing recent conversation, tool outputs, and intermediate reasoning. It’s fast and accurate—but ephemeral and limited in size.

External memory stores information in a retrieval system—a vector database, a document store, or a structured knowledge base. The agent queries this store when it needs to recall something. This extends what an agent can “know” dramatically, but introduces retrieval errors and latency.

Fine-tuned memory bakes knowledge into the model’s weights at training time. This is the most durable form of memory—it can’t be “forgotten” in the same way—but it’s also the least flexible, since updating it requires retraining.

Why This Matters for Agency

A truly capable agent needs all three. In-context memory handles immediate task state. External memory handles project history and user preferences. Fine-tuned memory provides domain expertise and behavioral norms.

The systems being built today are learning to orchestrate these three layers. The results are early but compelling: agents that remember your preferred communication style, that recall the context of last week’s project, that know what tools worked and didn’t in similar situations.

We’re watching this space closely. The agents that solve memory will be qualitatively more capable—and more trustworthy—than what we have today.