AI Street : Research

Why AI Agents Struggle to Use the Web

Matt Robinson — Tue, 02 Jun 2026 15:32:31 GMT

Hey, it’s Matt. I’m a former Bloomberg News reporter, and you’re reading AI Street, where I report on how Wall Street uses AI.

AI agents are supposed to search the web for us, but the internet wasn’t built for bots.

A couple of months ago, I bought a Mac Mini, downloaded NanoClaw (an operating system for agents), and set one up to gather SEC/CFTC/news/web data for a daily digest. I thought, naively, that connecting my agent to the internet would be straightforward. Instead, I wasted a few hours and spent 8 bucks in API calls trying to pull pages from a government website because it kept blocking my bot.

A lot of the focus with agentic AI is on the model. I think that’s in part because you don’t really think about the underlying infrastructure of the internet other than when you have to confirm your humanity by selecting which photos have a pedestrian crosswalk in them.

The word “bot” has a negative connotation: spam, fake traffic, credential attacks etc. Bots/agents are now performing tasks on behalf of actual humans. The current web was built to block most of that behavior. It’s not really designed to sort out legitimate agents from hostile bots. This, I think, has to change because more bots are coming. CEO Matthew Prince says bot traffic could exceed human traffic by 2027, up from about 20% before the generative AI era.

This may feel a little far afield for AI in finance, but I expect agents to be the new junior analysts, doing the rote work of running fundamental research, diligencing potential acquisitions, monitoring portfolio companies, etc. But firms need to know where their data is coming from and prove that provenance to regulators. That means having: source, timestamp, collection rights, login or proxy use, repeatability and a compliance trail.

We’re still in AI’s early days because we don’t have basic definitions yet: what exactly is an agent? It’ll take a while to build consensus. In February, NIST launched an AI Agent Standards Initiative and asked for input on agent security, identity and authorization, including how agents can operate securely on behalf of users. NIST’s typical timeline is years, not months.

The gap has opened a market for companies building the access, identity and compliance tools agents need before formal standards are settled.

I spent the last few weeks talking to folks building in this space to better understand why agents still struggle with a web built for humans, and what is being built to fix it.

In the rest of the piece for paid subscribers, I go through the emerging stack for an agentic web: access, retrieval and documentation.

For paid subscribers, I map the companies building this agentic web stack. If you’re burning through tokens or trying to figure out why agents still break on basic web tasks, consider becoming a paid subscriber.

JPMorgan Seeks Patent for AI-Generated Stock Ratings

Matt Robinson — Tue, 19 May 2026 13:09:24 GMT

JPMorgan is seeking patent protection for an AI system that generates stock-rating predictions, applying AI to one of Wall Street’s most familiar research formats: buy, hold and sell calls.

The AI rater draws on company fundamentals, market data, financial news and sentiment to produce analyst-style stock recommendations that are tested against future returns, according to the patent application, which was published in April and initially filed in February 2025. The system generates one of five outputs: Strong Buy, Moderate Buy, Hold, Moderate Sell, or Strong Sell.

A JPMorgan spokesperson said the application came from the bank’s AI research group and was filed to protect the underlying research rather than to commercialize AI-generated stock ratings.

This is one of the first examples I’ve come across of a major Wall Street firm describing an AI system that could generate analyst-style stock-rating predictions. It is a structured pipeline that compresses news, scores sentiment, packages fundamentals and return data, prompts an LLM to reason through future rating horizons, then grades the output against realized forward returns. It’s not asking ChatGPT what stocks to buy.

I don’t read this as evidence that JPMorgan is about to automate sell-side research. It’s more a framework for overworked analysts dealing with information overload.

From the application:

Traditional stock rating methods rely heavily on the expertise of financial analysts and face several challenges such as data overload, inconsistencies in filings, and delayed reactions to market events. The rapid integration of advanced machine learning techniques, particularly Large Language Models (LLMs), presents opportunities to enhance the equity stock rating process.

JPM also notes that the models face limits around context windows, numerical and tabular data, training bottlenecks and the risk of inaccurate responses.

The bank’s CEO, Jamie Dimon, has long been bullish on AI. In his 2023 shareholder letter, Dimon compared the technology’s potential impact to the printing press, steam engine, electricity, computing and the internet.

PATENT DATA

AI Street Patent Review Tracker

I’ve reviewed the relevant finance and AI patent applications published so far in 2026, and I’m continuing to review new publications as they appear, with the help of AI, of course.

This database is a research aid for paid subscribers. It collects finance, trading, market-structure, banking, AI, and infrastructure-related patent applications that appear potentially relevant to Wall Street, fintech, exchanges, clearing, settlement, fraud detection, and institutional data systems.

So far, the filings include applications from:

JPMorgan around stock-rating predictions and financial time-series analysis
CME and ICE/NYSE around exchange resiliency, matching engines, risk controls, and clearing mechanics
Bank of America around AI-driven data plumbing and application connectivity
BlackRock, Schwab, Fidelity, Morgan Stanley, and others around portfolio analytics, wealth infrastructure, model governance, and institutional data systems

Paid subscribers can scroll down to the end of this post to download the tracker.

More Specifics on JPM’s AI Stock Rater

Core task: Generate stock-rating predictions with an LLM.
Rating scale: Strong sell, moderate sell, hold, moderate buy or strong buy.
Prediction horizons: Over 1, 3, 6, 12 and 18 months.
Basic idea: Build a structured dataset around a company, date and future horizon, then ask the LLM to reason through the information and produce an analyst-style rating.

What data goes into the system

Company identifiers: Company name, ticker and relevant date.
Market data: Historical returns, price data, volatility and other technical indicators.
Fundamentals: Financial metrics such as earnings, revenue, return on assets and other company-level data.
News: Company and sector news, including raw articles and summarized versions.
Sentiment: Scores derived from news summaries, with negative, neutral or positive readings.
Forward-return labels: Future stock-return data used later to train or evaluate the rating predictions.

How the news pipeline works

Filtering: A pre-processing LLM removes articles that are not relevant to the company.
Summarization: The same preprocessing step condenses the remaining articles into short company-specific summaries.
Key-event extraction: The summaries are designed to preserve the important developments without overwhelming the prediction model.
Sentiment scoring: The summarized news is converted into a sentiment score from -5 to +5.
Purpose: The news pipeline turns a large, noisy set of articles into a compact signal the rating model can use.

ICYMI Interview

Back in December 2024, I spoke with one of JPM’s patent co-authors, Tucker Balch, who’s now back in academia at Emory, about where he sees the best AI and investing use cases.

One example that I still remember is using AI to expand data sources in other languages:

For instance, if you can listen to the news in Vietnam, translate it in real time, and identify relevant information for specific stocks, you greatly expand your data sources.

How prediction works

Prompt construction: The LLM receives a prompt telling it to act as a financial analyst and predict stock ratings.
Example answer: The prompt can include an input-output example so the model knows the expected format.
Future dates: The model is asked to identify which future months correspond to each prediction horizon.
Explanation: The model is asked to explain the reasoning behind its ratings.
Final output: The model then produces ratings for the specified future horizons.

How hallucination checks work

Date check: The system asks the LLM to calculate the future dates tied to each horizon.
Verification logic: If the model gets the dates wrong, that is treated as a warning sign about the reliability of the rating.
Explanation check: The model’s explanation is used to see whether the rating is actually supported by the input data.
Chain-of-verification: The system uses these intermediate steps to catch cases where the model may be producing unsupported answers.

How training and fine-tuning work

Prompt-label pairs: Training examples pair a prompt with a correct rating label.
Ground-truth labels: The labels are based on future stock performance, not just analyst opinions.
Forward-return quintiles: Future returns are divided into quintiles and mapped to rating categories.
Loss function: The system computes cross-entropy loss between the model’s predicted rating and the ground-truth rating.
LoRA fine-tuning: The model can be fine-tuned using low-rank adaptation, which updates smaller added matrices rather than retraining the full LLM.
Validation: The system can split the data into training and validation sets to test whether fine-tuning improves performance.

How the system checks whether the prediction was right

Forward returns: The system looks at how the stock actually performed after the prediction date.
Peer comparison: The stock’s return is compared with other companies over the same period.
Sector-relative return: The company’s return can be adjusted against sector performance.
Rating match: If the stock’s future-return quintile matches the model’s rating category, the prediction is treated as correct.
Error measurement: Mean absolute error is used to measure how far the predicted rating was from the ground-truth rating.

What the results show

LLMs did better in shorter-term tests: The application says the LLM may perform better on short-term predictions, while analyst errors declined over longer horizons and were slightly better in the 18-month period.
Fundamentals mattered most: The best-performing setups were the ones using fundamentals, especially fundamentals plus sentiment.
News alone helped less: News summaries and sentiment by themselves did not outperform the fundamentals-based setup.
Sentiment added only modestly: Fundamentals plus sentiment performed slightly better than fundamentals alone.
News may skew positive: The results suggest that news-derived inputs may push the model toward more positive ratings.
Short-term versus longer-term signals: News appears more useful for short-term predictions, while fundamentals appear more useful across the main 3-, 6- and 12-month horizons.

AI Street Patent Review Tracker

Paid subscribers can download the Excel file below, which uses AI-assisted review to identify 300 patent applications published this year that appear tied to AI in trading and investing.

Revolut Trains AI Model on Its Own Data

Matt Robinson — Wed, 06 May 2026 15:31:14 GMT

Hey, it’s Matt. You’re reading AI Street, where I report on how Wall Street uses AI.

Kim Posnett, the co-head of investment banking at Goldman, argued in the FT last year that AI may turn overlooked corporate data into a newly valuable asset:

Imagine how a textbook company might use its archives of technical manuals and coursework to train an AI system to do complex scientific processes.

AI models are only as good as the data they’re trained on. Hard-to-replicate, legacy data is more valuable in the age of AI. And legacy companies are generally the ones with the legacy data. Many corporations are sitting on valuable intellectual property and, I suspect, don't even know it.

But you don't need decades of data. Scale works. Revolut, the UK-based neobank with 70 million customers across 40 countries, has been collecting billions of data points. Its users generate a continuous stream of timestamped card transactions, peer-to-peer transfers, in-app navigation events, and communications.

Revolut researchers and Nvidia say they have used that stream of banking activity to train PRAGMA, a foundation model for financial event data. The model is designed to analyze a user’s event history of transactions, app activity, communications and profile data, allowing one underlying model to be adapted for tasks such as credit scoring, fraud detection and product recommendations.

A NOTE FROM OUR SPONSOR

When your agent reads Bloomberg or Reuters, is it finding an edge?

On Tesla’s Q1 earnings, Goldman held at $375, TD Cowen reiterated Buy at $490, JPMorgan stayed at $145. You and your competitors are reading the same call.

The dispersion across sell-side targets. The reasoning behind each one. The hedge buried in the fifth paragraph of an operator quote.

That’s where the analytical signal is. It lives in the paragraphs your agent isn’t getting.

Typical retrieval looks fine. The agent doesn’t know what it’s missing, and neither do you.

Seltz returns full context in hundreds of milliseconds, every result traceable to source. Built for workflows where deep research matters more than the headline.

If you’re running agents on financial news, Seltz will run an eval on your setup.

Email CEO Antonio Mallia at antonio@seltz.ai or ask me for an introduction.

What Revolut Did

PRAGMA was trained on 26 million anonymized user records from 111 countries, covering 24 billion events, according to a research paper posted to arXiv. The model is not a chatbot. It is designed to make predictions from banking histories, not generate text.

“Most ‘foundation model for finance’ discussions still default to text. But bank data is not text,” said Revolut’s head of AI Pavel Nesterov on LinkedIn.

His point is that financial activity has its own structure. Customers generate long sequences of transactions, app actions, communications, trading activity and profile changes. Turning all of that into text for a generic language model, Nesterov wrote, means “you lose too much structure and waste too many tokens.”

PRAGMA keeps the records closer to their original form. Each event is represented by what happened, the value attached to it and when it occurred. A card payment, for example, can include the transaction type, amount, currency, merchant category and time. The model then looks for patterns across long sequences of customer activity.

The authors say PRAGMA beat Revolut’s internal task-specific baselines across credit scoring, external fraud detection, product recommendation, communication engagement, recurrent-transaction detection and lifetime-value prediction.

Results

The biggest reported gains came in credit scoring and customer communications. Compared with Revolut’s existing models, PRAGMA improved one credit-scoring measure by 130% and one customer-communications measure by 79%. It also improved fraud recall by 65% and product-recommendation performance by 41%.

For Revolut, the operational goal is to reduce the need for separate models for every use case. Nesterov said the company is trying to move away from “a separate model stack for every narrow use case” and toward one shared model that can be adapted for tasks such as credit scoring, fraud detection and product recommendations.

PRAGMA fell short on one task: anti-money laundering, where it significantly underperformed Revolut's existing system. The authors say that is because money-laundering detection often depends on relationships among accounts, counterparties and transaction networks. PRAGMA analyzes one customer history at a time.

Revolut joins Netflix and Stripe, which both trained models on their own internal data. Netflix built a foundation model on hundreds of billions of user interactions that now underlies its personalization across search and recommendations. Stripe’s payments foundation model, trained on tens of billions of transactions, increased its detection rate for card-testing attacks from 59% to 97%.

That’s the fun part of AI for me. It reveals patterns that were always there but we didn’t have the computing power to see.

Thanks for reading!

I’m always happy to receive comments, questions, and feedback.

Connect with me on LinkedIn, or
Send an email to matt [at] ai-street.co

Manage how often you receive AI Street

Update Email Frequency

How Hedge Funds and Market Makers Are Using AI

Matt Robinson — Tue, 05 May 2026 15:31:17 GMT

Hey, it’s Matt. You’re reading AI Street, where I report on how Wall Street uses AI.

I’ve found myself searching through AI Street archives to pull together what I’ve reported on how hedge funds and market makers use AI. So, I created a running tracker that puts it in one place, combining news stories, regulatory filings, and some of my own reporting.

(If you think I should do the same for private equity firms or sovereign wealth funds, let me know. There is, in fact, a human behind the text you’re reading.)

I think of hedge funds and market makers as using AI in two main ways. I’ve put them in the same broad category because, as the FT has written, the two are converging.

The first way is straightforward: using large language models from frontier labs for familiar tasks like summarizing documents and surfacing ideas, plus more sophisticated workflows such as Man Group’s use of AI to generate and test trading signals.