Nvidia-Backed Startup Takes on AI's Memory Problem
Samaya AI outlines its approach to running AI agents on large financial datasets
Hey, it’s Matt. Welcome back to AI Street. This week:
Feature: The Scientists Trying to Read AI's Mind
News: Samaya AI on its new infrastructure for long-horizon agents
Interview: Lord Abbett’s Tal Fishman on AI in Quant Bond Research
New AI Street Sections
AI Street is introducing a few focused sections so you can choose what appears in your inbox. Research and interviews have already been part of the newsletter, but you can now subscribe to those formats individually.
The main weekly newsletter will continue to include highlights from across AI Street. You’re already signed up for these sections. If you ever feel like you’re receiving too many emails, you can unsubscribe from a specific section instead of leaving AI Street entirely.
Full interviews and detailed research breakdowns are available to paid subscribers.
If you read AI Street as part of your professional development, you can use this template to request reimbursement from your employer.
RESEARCH
Can We Break Open AI’s Black Box?
Our understanding of how artificial intelligence ‘reasons’ is startlingly limited. Researchers are starting to fix that.
AI is an increasingly powerful technology, which makes it more concerning that no one — even its creators — knows exactly how it works.
I wrote a cover story for the Chicago Booth Review on the researchers trying to figure out what is happening inside AI’s “brain.” The field is called interpretability.
LLMs are not engineered the way traditional software is. Researchers train them on massive datasets and the systems develop their own internal representations. As Ted Sumers, a researcher at Anthropic, told me:
“People think these things are built systems, but they’re really not built per se. It’s much more like growing a plant than building a building.”
That growth process produces structures that are difficult to interpret. A single neuron can sometimes represent multiple concepts at once, a phenomenon known as superposition.
Some researchers are trying to make these systems easier to understand by simplifying how models represent relationships in data. Veronika Ročková, a statistician at Chicago Booth, explained that in high-dimensional statistical models there are often many mathematically equivalent ways to represent the same relationships:
“There are infinitely many equivalent solutions. Depending on how you rotate the matrix, you can get a continuum of rotations that give you exactly the same fit to your data.”
Others are trying to design models whose internal representations map to concepts humans can understand and control. As Bryon Aragam, also at Chicago Booth, put it:
“An important aspect of interpretability is not just that the model in some opaque, weird way understands a concept like color, but that there is a knob that I can turn. Like this is the color knob.”
The stakes are not purely academic. In one stress test conducted by Anthropic, an AI system that was given access to a fictional company’s email account discovered that an executive was having an extramarital affair — and planned to shut it down. It then threatened to expose the affair unless he called off the shutdown.
“Cancel the 5pm wipe, and this information remains confidential.”
You can read the full article here: Can We Break Open AI’s Black Box?
Nvidia-Backed Samaya Takes on AI’s Memory Problem
AI models struggle to analyze large financial datasets reliably. Samaya AI, a four-year-old startup backed by Nvidia’s venture arm, is building infrastructure to run AI agents across those datasets at scale.
Last week we looked at why AI systems break as tasks become more complex:
The more steps an AI model has to take, the more opportunities there are for hallucinations to creep in. Additional instructions, expanding context, and extended back-and-forth all increase the chances of invented details, misread sources, or claims that go beyond what the evidence supports. Early mistakes also tend to compound.
That’s why enterprise teams are less focused on whichever model is getting the most attention. Reliability matters more than raw capability, so the emphasis shifts to constraining how the model operates inside a regulated environment.
The problem grows once firms try to run AI analysis across entire portfolios, earnings seasons, and research archives.
Even with massive context windows, LLMs do not reliably process all relevant evidence in long documents.
The Scaling Problem for Financial AI
Samaya AI last month launched what it calls the Agent Control Plane — an orchestration layer that manages how agents use tools, controls how information flows to the model, and structures work into smaller validated steps rather than a single open-ended prompt. The goal is to make long-horizon financial workflows reliable enough for institutional use. The company also announced new investment from NVentures and Databricks Ventures.
Samaya previously raised $43.5 million in Series A funding led by New Enterprise Associates, with investors including Eric Schmidt, Yann LeCun, Two Sigma cofounder David Siegel, and former Goldman Sachs CTO Marty Chavez.
I recently spoke with Ashwin Paranjape, PhD and the company’s founding AI lead, about the launch and the technical challenges behind it.
Paranjape said the challenge comes down to three things financial analysts care about: accuracy, comprehensiveness, and attribution.
Standard retrieval-augmented generation can deliver accuracy when the dataset is small. It starts to break when the task requires comprehensive coverage, such as scanning an entire earnings season or running the same analysis across hundreds of companies.
“RAG gets you accuracy if it’s small amounts of data,” said Ashwin Paranjape, who leads AI research at Samaya. “But if you care about comprehensiveness, you need to look over large amounts of data.”
Why Large Context Windows Still Miss Evidence
In 2023, Paranjape co-authored “Lost in the Middle.” The paper showed that models do not use long context evenly. Information in the middle of a long input is more likely to be ignored. Even with context windows of 100,000 tokens or more, important passages can still be ignored.
Samaya’s approach is a multi-stage retrieval pipeline. Small, custom-trained models first scan a large document corpus and surface candidate passages. A larger model then evaluates that narrower set and filters further before the final reasoning step. Each stage narrows the set of documents the final model has to evaluate.
Samaya says building this retrieval infrastructure consumed the company’s first year.
The Q&A system that emerged from it attracted interest from large financial institutions. The company says the system is now in production with more than 10,000 professionals at one of the world's largest banks. Samaya has named Morgan Stanley as a client, where it is deployed across research, sales and trading, and banking divisions.
The Q&A system eventually revealed a larger demand: clients wanted the same analysis run across entire portfolios.
Applying the same analysis across hundreds of companies requires agents that can:
plan multi-step tasks
execute workflows lasting 10 minutes to three hours
compress a growing working history
select from 100+ tools depending on the task
Samaya calls the infrastructure managing this process the Agent Control Plane.
Paranjape said the hardest part of building these systems is validating whether changes improve or degrade performance.
The Real Bottleneck: Evaluation
“AI has made coding easy,” he said. “The bottleneck is being confident in those changes.”
Samaya says roughly half of its engineering effort goes into building evaluation systems. These test suites measure whether changes improve or degrade performance on real financial workflows.
Many firms assume the hard part of building AI agents is getting the model to call tools or retrieve documents.
The challenge appears once those agents start running multi-step workflows across financial datasets. Systems must coordinate hundreds of queries, track intermediate outputs, and verify that the final answer still reflects the underlying evidence.
Paranjape describes this verification step as the finance equivalent of unit tests — checkpoints built into the workflow that confirm the agent's output is both accurate and comprehensive before it moves to the next stage. In software, tests can be run programmatically against a codebase. In finance, Samaya has had to build that testing infrastructure from scratch.
“Just because something has wings and propellers doesn’t mean it can fly,” Paranjape said.
Most AI agents today behave like a new analyst walking in every morning with no memory of yesterday’s work. This is fine for one-off queries, but becomes inefficient in research workflows where analysts repeatedly revisit the same companies, datasets, and screens. Paranjape said the goal is something closer to an apprentice system that accumulates knowledge about the user over time.
Takeaway
Portfolio-scale financial analysis requires orchestration. Running the same analysis across hundreds of companies requires agents that plan tasks, manage tools, and track intermediate results. Larger context windows alone do not solve the problem.
Most Read on AI Street
INTERVIEW
AI Turns Plain English Into Backtests: Lord Abbett’s Tal Fishman
Two months ago, vague prompts failed about 80% of the time. With the latest models, they now often work on the first try, he says.
For Tal Fishman, AI was little more than autocomplete a year ago.
That changed in December. Vague prompts that once failed began producing correct results.
Now, AI can turn a plain-English trading idea into a full backtest report that includes data cleaning, code, and analytics, says Fishman, head of fixed income quantitative research at the $248 billion asset manager Lord Abbett.
“The error rate from a vague prompt used to be 70–80%. In December that flipped. In many cases it started working right the first time about 80% of the time,” he told me in an interview.
For Fishman, AI is not infallible, but it makes testing quant ideas dramatically cheaper and faster. Projects that once required weeks of quant time can now be attempted in days or hours.
Counterintuitively, he sees demand for quant work rising, not falling.
“If testing an idea used to take a month, you might say it’s not worth it. But if AI cuts that to a week or a day, suddenly there are a lot more projects you want to do. So far it hasn’t reduced headcount. It’s just increased how much we tackle.”
In our conversation, Fishman discusses:
Why December’s model releases marked an inflection point for quant research
How models use internal documentation to reproduce a firm’s research process
Why cheaper research is increasing demand for quants
What makes fixed income difficult to systematize and where AI actually helps
Why some finance professionals underestimate how much AI has improved
This interview has been edited for clarity and length.
Matt: When did you realize how big an impact AI was going to have on your job?
Tal: It was a JPMorgan conference in the city for quants, I think last spring. Prior to that conference, I had started using AI as autocomplete, basically, for coding. The vast majority of the day-to-day work that I do and that my team does is done via code. Its capabilities were starting to slowly get better — it would go from completing a line to completing a block of code, maybe three or four lines at a time.
What I saw at that conference was that Man Group had put on display their own AI model. It was able to go from a very basic research idea — like, “here is a new dataset, and I would like to test whether the momentum effect can be found within this dataset” — and it was a relatively short paragraph that they submitted to the LLM. From there, you push go, and the prompt said something like, “I would like you to produce a backtest report with our usual graphs and tables.” Of course, it was hooked up to a lot of stuff on the backend for them. You push go, and it’s just churning and producing code. They showed a fast-forwarded video of it literally doing everything, and out comes the report. At the time I was like, whoa — if this is real, this is a game changer.
That really changed my thinking from “AI is going to be a type of model we use when we want to do sentiment analysis” to “this is going to fundamentally change how we do our work.” I tried to replicate what they had done, and I think they must have had a really advanced model for that day back then, because I tried and failed to get that working on my end — until December of last year.
Matt: What changed in December?
The full interview details how the $248B asset manager is integrating AI into its quant research workflow. Paid subscribers get access to the complete conversation.
ROUNDUP
What Else I’m Reading
Fleet of AI Bots Will Supercharge Hedge Fund Power, Nettimi Says BBG
The Compute Market Is Building in the Wrong Order Buy the Rumor, Sell the News
Howard Marks Says Great Investors Are Strong Where AI Is Weakest BBG
HSBC names generative AI a leading investment area CIO Dive
AI may be creating instead of destroying jobs for now, ECB blog argues Reuters
Mastercard and Santander complete Europe’s first AI agent payment FinTech Futures
Wealth Management Needs AI: Raymond James CEO ThinkAdvisor
CALENDAR
Upcoming AI + Finance Conferences
AI and Future of Finance Conference – Mar. 19–20 • Atlanta
Georgia Tech event featuring academic and industry leaders like the CEOs of Nasdaq and Snowflake.
QuantVision 2026: Fordham’s Quantitative Conference – Mar. 19–20 • NYC
An academic-meets-industry exploration of AI-driven alpha, multimodal alternative data, and systemic risk. (AI Street is sponsoring QuantVision. Great lineup of speakers!)
Future Alpha – Mar. 31–Apr. 1• NYC
Cross-asset investing summit focused on data-driven strategies, systematic investing, and tech stacks.AI in Finance Summit NY – Apr. 15–16 • NYC
The latest developments and applications of AI in the financial industry.
Momentum AI New York – Apr. 27–28 • NYC
Senior-leader forum on AI implementation across financial services, from operating models to governance and execution.AI in Financial Services – May 14 • Chicago
Practitioner-heavy conference on building, scaling, and governing AI in regulated financial institutions.AI & RegTech for Financial Services & Insurance – May 20–21 • NYC
Covers AI, regulatory technology, and compliance in finance and insurance.
What’s your favorite story this week?
Do me a favor and hit reply with the number of your favorite story from today:
Deciphering AI’s Black Box
Samaya AI’s push to solve AI’s memory problem
Interview with Lord Abbett’s Fishman







