Don’t Expect Chatbots to Beat the Market
LLMs struggle with investing, hedge fund engineers lean on AI coding tools, and Anthropic expands further into Wall Street
Hey, it’s Matt. You’re reading AI Street, where I report on how Wall Street uses AI.
NEWS
LLMs Make Poor Stock Pickers
Two stories this week looked at how ChatGPT struggles with stock picking. My question is: why would you expect it to be good in the first place? These are large language models, trained on text, not financial data.
That said, these stories are important. There’s a lot of confusion about what AI can actually do.
The WSJ’s Gunjan Banerji tested ChatGPT as a hypothetical adviser for a $1 million portfolio and found it could explain risks, but struggled with actual investment calls. It gave a reasonable long-term allocation, but made a basic arithmetic error, drifted into market timing and picked a trade-war stock basket that rose about 5.5%, trailing the S&P 500’s roughly 8% gain.
Banerji highlighted the annoying, sycophantic part of AI that’s an investing risk:
“At times, it felt like ChatGPT responded with what I wanted to hear.”
Bloomberg’s Justina Lee took the question one step further, looking at trading competitions that pit major LLMs against each other, a topic I wrote about in November.
In Nof1’s Alpha Arena, eight models including Claude, Gemini, ChatGPT and Grok traded US tech stocks with $10,000 each across four competitions. The results were mostly ugly: the overall portfolio lost about a third of its capital, and only six of 32 model results finished in profit.
“LLMs can’t really make money by themselves,” said Jay Azhang, founder of Nof1.
Ashwin Paranjape, founding AI lead at Samaya AI, told me that LLMs can gather financial data, but stock picking requires higher-order skills: judging materiality, forecasting metrics and connecting signals across industries.
“Beating the market relies on information and reasoning asymmetry,” he said. “Eventually picking stocks will look like: ‘My AI knowing what I know, beats your AI knowing what you know.’”
These models are getting better, and there are some early results suggesting multi-agent setups can perform better than a single model acting alone. But that still does not make a chatbot a trading system.
Training models directly on market data to find signals is different. But that is a much harder, more expensive problem than asking a $20 chatbot what stocks to buy.
A NOTE FROM OUR SPONSOR
When your agent reads Bloomberg or Reuters, is it finding an edge?
On Tesla’s Q1 earnings, Goldman held at $375, TD Cowen reiterated Buy at $490, JPMorgan stayed at $145. You and your competitors are reading the same call.
The dispersion across sell-side targets. The reasoning behind each one. The hedge buried in the fifth paragraph of an operator quote.
That’s where the analytical signal is. It lives in the paragraphs your agent isn’t getting.
Typical retrieval looks fine. The agent doesn’t know what it’s missing, and neither do you.
Seltz returns full context in hundreds of milliseconds, every result traceable to source. Built for workflows where deep research matters more than the headline.
If you’re running agents on financial news, Seltz will run an eval on your setup.
Email CEO Antonio Mallia at antonio@seltz.ai or ask me for an introduction.
AI Is Narrowing the Hedge Fund Tech Gap
Craig Whiting, a hedge fund tech headhunter, wrote about recent conversations he’s had with four different engineers across Wall Street. They said they are writing less code than they did a year ago and increasingly work alongside tools like Cursor and Claude Code to draft code, review output and catch bugs.
The bar at any firm worth working for is now: you write good prompts, you read AI output critically, you know when to override it.
Whiting also points out how many firms are getting value out of relatively straightforward use cases like cutting down email volume by structuring unstructured text, an unsexy topic that I’ve written about before.
He also mentioned how AI is making it easier for smaller shops to compete:
The new reality: the gap between a £2bn credit fund and a $60bn multi-strat is collapsing because the tooling is cheap and the workflow is portable. Cursor is cheap. Claude Code is cheap. The bottleneck is not budget anymore. It is engineering culture, leadership willingness, and how legacy your stack is.
Anthropic’s Busy Week
Anthropic has had a busy week announcing three different initiatives across Wall Street:
Anthropic and FIS Target AML
Anthropic announced it’s embedding Claude inside bank compliance departments through a partnership with fintech giant FIS, starting with anti-money laundering. Today, investigators spend most of their time manually pulling records from disconnected systems before any analysis can begin — the agent handles that assembly automatically, then flags cases by risk level. FIS claims it compresses investigations from days to minutes; BMO and Amalgamated Bank are the first pilot customers, with broader availability planned for H2 2026.
Anthropic Forms $1.5B JV with Blackstone, Goldman for PE Portfolio Companies
Anthropic is forming a $1.5 billion joint venture with Blackstone, Hellman & Friedman, Goldman Sachs, and several other Wall Street firms to embed Claude inside mid-sized companies — particularly PE portfolio companies — that can't staff frontier AI deployments on their own, according to the WSJ. Anthropic, Blackstone, and H&F are each putting in roughly $300 million; Goldman $150 million; General Atlantic, Apollo, Leonard Green, GIC, and Sequoia rounding it out. OpenAI is reportedly building a rival structure.
Anthropic Releases Ten Pre-Built Finance Agents
Anthropic released ten pre-built agent templates for financial services work — pitchbooks, KYC screening, month-end close, earnings review — deployable as plugins or as autonomous scheduled jobs on the Claude Platform. Claude also now runs inside Excel, PowerPoint, and Word via Microsoft 365 add-ins, carrying context between applications. Eight new data connectors went live alongside a Moody’s MCP app covering more than 600 million companies; Claude’s broader finance connector ecosystem includes FactSet, S&P Capital IQ, and PitchBook.
Merger Arb Funds on Using AI
The FT has a story on how merger arbitrage hedge funds are using AI to get a quicker read on deal dynamics by reading dense legal documents.
Traditionally, reading through the complex, lengthy deal documents — which often stretch to over 100 pages — would take an investment professional over an hour. Even a quick review would take 15 to 20 minutes. But the use of AI has now reduced the process to seconds.
“We think of AI as a very fast, very thorough intern who is brilliant at analysing big datasets,” says Daniel Caplan, chief executive of London-based Sand Grove.
This is the kind of use case that makes sense to me. Asking ChatGPT or Claude: “Where are the most important disclosures I should focus on in this document?” An actual merger arb specialist would ask a more sophisticated question than this, but you get the idea.
ROUNDUP
What Else I’m Reading
BMO Turns to AI and Quantum Computing to Predict Earthquakes BBG
Former Citadel Chief Technology Officer Joining Motive Partners BBG
Lloyds in tie-up with Google to build AI agents City AM
Subquadratic claims 1,000x AI efficiency gain with SubQ model VentureBeat
Bessent Warns of Threat of AI-Powered Bank Account Hacks PYMTS
Back in New York
I’ll be back in New York to attend STAC Summit on May 20. Sessions include discussions on memory bottlenecks in inference, AI research at BlackRock, extracting structured data from SEC filings with LLMs, and deploying models into trading systems and engineering workflows. Registration is free for end users. Come say hi!
I’ll be at the Women in Quant Finance conference the next day, May 21.
This Week in AI Street
CALENDAR
Upcoming AI + Finance Conferences
AI in Financial Services – May 14 • Chicago
Practitioner-heavy conference on building, scaling, and governing AI in regulated financial institutions.STAC Summit – May 20 • NYC
Trading and analytics infrastructure, applied AI for research and execution, scaled deployment, and benchmark-driven insights across quant workflows. ← I’m attending.
AI & RegTech for Financial Services & Insurance – May 20–21 • NYC
Covers AI, regulatory technology, and compliance in finance and insurance.
Women in Quantitative Finance – May 21 • NYC
Quants discussing current work in asset pricing, trading, risk, and portfolio construction. ← I’m attending.
Thanks for reading!
I’m always happy to receive comments, questions, and feedback.
Connect with me on LinkedIn, or
Send an email to matt [at] ai-street.co







