Designing Memory for AI Agents

AI agents feel more useful when they can remember things. Not just what we said five messages ago, but who we are, how we work, what decisions we have already made, and where the important source material lives.

But “memory” is a slippery word. When people talk about AI memory, they often mean something vague and human-like: the agent should remember me, remember our conversations, and build up context over time. That sounds right, but it hides a lot of hard design problems.

An AI agent should not simply remember everything. Some memories are wrong. Some become stale. Some are private. Some are only useful in a specific workflow. Some should be deleted. Some should never be treated as truth because they were only mentioned casually in conversation.

At the end of the day, AI memory is closer to a database than a brain. The goal is not to recreate human memory. Human memory is unreliable, lossy, emotional, and inconsistent. The goal is to build a memory system that is inspectable, correctable, private, and useful at the moment an agent needs it.

What should an AI agent remember?

The first mistake is treating all memory as one thing. There are several different types of information an agent might need to remember, and they should not all be stored or retrieved in the same way.

Conversation memory

This is the familiar version of memory: “Remember when we talked about X?” It includes previous conversations, decisions made during a session, unresolved questions, and context that helps the agent understand why we ended up where we are.

Conversation memory is useful, but it is also messy. Long conversations drift. They branch into unrelated topics. Sometimes the most important part is not the exact transcript, but the path that led to a decision.

User facts

These are specific data points about a person or organization. Things like a name, birthday, location, company, role, family member, or recurring commitment.

Facts sound simple, but they are not always clean. Someone might say “DOB”, “birthdate”, or “date of birth” and expect the same field. Someone might mention a fact casually, correct it later, or provide a more authoritative source somewhere else.

Preferences and decisions

Some memories are not facts, but preferences. For example: “When I create new files, I prefer kebab-case file names.” These memories are useful because they help the agent behave consistently without needing to be told the same thing repeatedly.

Preferences can also be workflow-specific. A coding agent might need to remember naming conventions, testing habits, or how a project is structured. A writing agent might need to remember tone, formatting style, or publishing constraints.

Source-of-truth data

Some information should not be treated as ordinary memory because there is a more authoritative source elsewhere. A calendar is a good example. If I previously told an agent I was free on Friday, but my calendar now says I have a meeting, the calendar should win.

The same applies to a codebase. A conversation about how the code worked last week is less authoritative than the current state of the files. Memory should help the agent find the right source of truth, not replace it.

Artifacts and files

Humans often remember pointers rather than full contents. I might not remember every line of a receipt, but I remember that I have a receipt for my mug and that it is in a particular folder.

AI agents need a similar model for PDFs, images, screenshots, videos, and other files. The agent does not always need to memorize the entire artifact. It may only need to remember that the artifact exists, where it lives, what it roughly contains, and when it should be retrieved.

Not all memories are equal

A useful memory system needs to understand the difference between a claim, a summary, a preference, and a source of truth.

For example, if John tells me Dave was born in February, that is a claim. If Dave later tells me he was actually born in December, that is a correction from a more authoritative source. The system should not simply keep both facts and retrieve whichever one has better semantic similarity. It should understand that one memory replaced another.

The same issue appears with mutable information. If a friend used to be blonde but is now brunette, the current fact is that they are brunette. The older memory might still be useful as history, but it should not be used as the current answer.

This suggests that memories need metadata:

Provenance: where did this memory come from?
Confidence: is it confirmed, inferred, uncertain, or contradicted?
Authority: does this come from a user, a calendar, a file, an API, or another agent?
Freshness: when was it created, updated, or last confirmed?
Scope: is this personal, project-specific, organization-wide, or session-only?

Without this metadata, memory retrieval becomes dangerous. The agent may confidently use stale or low-quality information because it matched the current prompt.

When should memory be used?

Memory should not only be retrieved after the model asks for it. There are several points in an agent workflow where memory can be useful.

Before sending a request to the model

Before the agent responds, it can retrieve relevant facts, preferences, recent decisions, and source-of-truth references. This reduces the need for the model to search blindly and helps it start with the right context.

This is probably the most obvious place to inject memory. If the user asks the agent to create a new file, the agent should already know naming preferences before it starts writing.

After the model responds

While waiting for the user to reply, the system can preload likely-needed context. If the model suggests doing X, Y, and Z, the memory system can prepare related files, previous decisions, or relevant documentation in the background.

This makes the next turn faster and more useful without forcing the model to predict every retrieval step up front.

During and after a session

After a certain number of messages, the system should scan the recent conversation and decide what needs to be preserved. This might include new facts, changed preferences, decisions, tasks, or unresolved questions.

The system should not wait until the end of a long conversation to compact everything. Long conversations often drift so far that the beginning no longer feels connected to the end. It may be better to continuously summarize, update, and restructure the session as it evolves.

Creating memories

There are two broad approaches to creating memories.

The first is lossless storage: save everything as-is, then use search and embeddings to retrieve relevant material later. This is simple and preserves the original source, but it can become noisy and expensive. It also does not distinguish between a passing comment and an important fact.

The second is extraction: use another model or classifier to pull out facts, preferences, entities, decisions, and summaries. This makes memory more structured and easier to query, but it introduces another layer of interpretation. The extraction system can miss things, invent structure, or overfit to the wrong detail.

There may be a role for smaller local models or traditional neural networks here. Entity extraction, key phrase detection, and classification do not always require a large language model. A local system could make memory creation faster, more private, and more deterministic.

But the hardest part is not only technical extraction. It is deciding what deserves to become memory at all.

Some things should be remembered automatically. Some should be suggested to the user: “Should I remember this?” Some should be session-only. Some should be ignored entirely. If an agent remembers too aggressively, it becomes creepy and noisy. If it remembers too little, it feels stateless and repetitive.

Retrieving memories

Memory retrieval is not a single search problem. Different questions require different retrieval strategies.

Keyword search

Keyword search is useful when the user mentions a specific term, project, person, or file name. It is simple, explainable, and often good enough.

Semantic search

Semantic search is useful when the user refers to an idea rather than an exact phrase. This is especially helpful for conversation history, where the user might ask about something discussed earlier without using the same wording.

Semantic search needs ranking. Recency matters, but it should not be the only factor. An older memory may be more relevant than a newer one. A newer memory may supersede an older one. Retrieval should consider relevance, freshness, authority, and confidence together.

Structured lookup

Some memories should be exact data points. If the user asks for a birthday, the system should not rely only on vector similarity. It should be able to map “DOB”, “birthdate”, and “date of birth” to the right structured field and return the exact value.

This may require semantic search over the schema itself. The user may not use the same field names as the database, so the system needs to understand what data exists and how to query it.

Source lookup

Sometimes the correct answer is not in memory at all. The memory system should know where to go. If the question concerns my schedule, check the calendar. If it concerns the current implementation, inspect the codebase. If it concerns a receipt, retrieve the file.

Good memory should route the agent to the right source of truth.

Updating and deleting memories

Memory needs a lifecycle. It is not enough to create and retrieve memories. The system also needs to update, archive, and delete them.

A basic lifecycle might look like this:

Create: store a new memory from a conversation, file, tool, or external system.
Update: replace stale information with newer or more authoritative information.
Deprecate: keep the old memory as historical context, but stop using it as current truth.
Archive: hide the memory from ordinary retrieval unless explicitly needed.
Delete: permanently remove the memory when the user asks, when privacy requires it, or when it should no longer exist.

Deletion is especially important. Humans forget naturally, but AI systems do not unless they are designed to. Some memories should fade because they are old. Some should be replaced because they are incorrect. Some should be removed because they are sensitive, traumatic, or no longer useful.

An audit log can help explain why a memory changed, but it should not become an excuse to keep everything forever. If the user wants something deleted, the system needs a real deletion path.

Privacy and local-first memory

Memory is private by default. A memory system accumulates details over time: habits, relationships, documents, health, finances, work, location, and decisions. This makes it more sensitive than a single chat transcript.

Ideally, memory should be local-first. It should live on the user’s machine where possible, or in infrastructure the user controls. If memory is synced or shared, access should be explicit and limited.

Agents also need permissions. A personal assistant, coding agent, and finance agent should not automatically share all memory. Project memory, personal memory, organization memory, and session memory should have boundaries.

Privacy is not only about storage. It is also about visibility and control. Users should be able to inspect, edit, export, archive, and delete memories. They should be able to see what the agent knows and why it thinks it knows it.

Memory poisoning and trust

If an agent can write to memory, memory becomes an attack surface.

A malicious prompt could try to create false preferences, overwrite facts, or insert instructions that affect future behavior. Even without malicious intent, the agent might infer something incorrectly and store it as fact.

This is why provenance, confidence, and user control matter. A memory created from an untrusted document should not have the same authority as a direct user correction. A model-generated summary should not silently override a calendar, database, or current file state.

Trustworthy memory systems need rules for conflict resolution. When memories disagree, the agent should know which source wins, when to ask for clarification, and when to avoid using the memory at all.

Visualizing memory

Memory should not be invisible magic. It would be useful to see what happens as an agent works:

What did it retrieve?
What did it ignore?
What did it write?
Why was a memory considered relevant?
Which memories affected the final answer?

This is useful for users, but it is also useful for evaluation. Different memory systems could be benchmarked by showing what they retrieved, what they missed, and whether the remembered context improved the result.

Visualization makes memory debuggable. It turns “the agent remembered something” into something inspectable.

The real goal

AI agents do not need memory that works like human memory. Human memory is full of gaps, distortions, and emotional shortcuts. That does not mean human memory is useless as a metaphor, but it should not be the blueprint.

The better model is memory as infrastructure: a system for storing, retrieving, updating, and deleting context with clear rules about authority, privacy, freshness, and relevance.

The goal is not for an AI agent to remember everything. The goal is for it to remember the right things, forget the right things, and explain why.