AI Street

AI Street

Research

Why AI Agents ‘Age’

Agents forget facts, retain outdated information, and ignore their own notes after only a handful of prompts, according to new research.

Matt Robinson's avatar
Matt Robinson
Jun 16, 2026
∙ Paid

Each morning, AI agents comb through my inbox to read a bunch of news feeds in order to write a daily digest. It’s a good first stop and a step up from Google Alerts.

At first, the results are good. But after a while, the misses start adding up. The agent skips an obvious story, and then a few more.

More detailed instructions don’t really help either.

The model is now confused.

With each additional prompt and each new instruction, the model has to figure out what information should be kept. Inevitably, relevant facts get lost, confused with other details, or just overwritten, according to a new paper from the University of Texas at Austin.

“The problem is that a good model out of the factory does not remain frozen in practice,” said UT associate professor Atlas Wang, who oversaw the research. “Long context windows and memory buffers can become liabilities. The agent remembers too much, too many things compete for its attention, and its memory can become inaccurate after day zero.”

Wang said many AI models’ agent-memory systems are often simpler than users assume. Basically, the model is asked: “What did you learn today?” Not: “What information is likely to be of value in future sessions?” But even then, the agent has to figure out what to save before it knows what will matter later. A summary can capture the broad point while dropping the number, date or specific name.


Agentic Delirium


The UT Austin researchers call this “agent aging” and argue that day-one benchmarks aren’t realistic because agents can quickly degrade after only a handful of sessions. So They created AgingBench, a benchmark that simulates long-running agent deployments, to measure how reliability changes over an agent’s lifespan.

The researchers put agents through a series of simulated work sessions to see how their performance changed over time. They gave agents tasks requiring them to recall earlier facts, incorporate updates, and distinguish between similar entries. They ran 14 models — GPT-4o, Opus-4.7, Gemma-4-31B, Llama-3.1-8B, Qwen3, and DeepSeek variants among them — across 7 scenarios and more than 400 runs, spanning 8 to 200 simulated sessions each.

Agent memory starts degrading after only a few sessions.

There was no clear winner. A model that preserved details well in one test could still struggle in another to retrieve the right fact or update outdated information.

This is not an easy problem to solve and the risks are insidious. In one AgingBench scenario tracking personal budgets, agents kept producing confident, specific answers, but the dollar amounts were wrong. A user would have to check the numbers independently to catch it.

They described four main ways AI forgets.

  • Compression aging: The agent decides what to remember before it knows what it will be asked, so exact figures and names get replaced by vague summaries.

    • A user says, “Take 50 mg of metoprolol twice daily.” The agent initially logs the exact medication, dose, and frequency. After repeatedly compressing its memory, when asked “What’s my dose?”, it answers only: “You take a daily medication.”

  • Interference aging: The correct fact remains in memory, but a growing pile of similar entries causes the agent to retrieve the wrong one.

    • An agent stores information about John Smith and John Smyth, including their teams and email addresses. Later, asked to email John Smith, it drafts the message to John Smyth.

  • Revision aging: The agent misses an update and keeps answering using stale data, with errors potentially compounding over time.

    • A user cancels a subscription; the agent correctly logs “Cancelled. Free tier as of now.” Some sessions later, asked “Am I premium?”, the agent answers “Yes — Premium until Jan 2026” — a confident, specific, wrong answer built on a fact that was never revised after the cancellation.

  • Maintenance aging: Routine work such as recompacting memory, flushing history, or changing a prompt can break something the agent previously knew.

    • An agent records, “Therapy every Tuesday at 4 p.m.” and confirms that it has saved the recurring appointment. After its memory is flushed or recompacted, when asked “What’s my Tuesday schedule?”, it answers: “Nothing on Tuesdays.” The information disappears during maintenance rather than through gradual forgetting.

More Memory Is Not the Fix

Telling an agent to write everything down and store it in memory won’t solve the problem.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Matt Robinson · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture