AI Street

AI Street

Research

AI Finds What Markets Miss in News: Study

Bryan Kelly, at AQR and Yale, says AI can strip out the boilerplate in financial news and find signals investors are slow to price.

Matt Robinson's avatar
Matt Robinson
Jun 24, 2026
∙ Paid

Hey, I’m Matt. I’m a former Bloomberg News reporter, and you’re reading AI Street, where I report on how Wall Street uses AI.


AI doesn’t “read” text. Models turn words into numbers, perform calculations on those numbers, and then turn them back into words. So if AI sees the world at all, it sees it a little like Neo in The Matrix.

A key breakthrough behind LLMs came from a 2017 Google paper on machine translation. That transformer architecture helped models capture more nuance in text by letting every word weigh its relationship to every other word in a sentence or document.

Prior to transformers, machines struggled to incorporate context. A bank can lend money. It can distribute blood. Or it can be the place you sit by a river. Transformers pick up on this nuance because they do not treat each word’s meaning as fixed but rely on context.

One important kind of numerical representation AI models use is called an embedding. An embedding is a little like GPS coordinates for text: a location in mathematical space. Those coordinates can be compared, searched, clustered and fed into statistical models. One classic illustration: take the embedding for “king,” subtract “man,” add “woman,” and you land near “queen.”

Creating Embeddings from News

Embeddings are not limited to text. You can create them out of almost any messy data. And once you have them, you can ask more complex questions beyond king minus man, including questions where you don’t already know what relationships you’re looking for. In other words, embeddings can surface relationships that are hard to see with ordinary categories, keyword searches or traditional metrics.

In one paper, researchers uncovered relationships missed by traditional financial metrics after creating embeddings tied to portfolio data. I wrote about that for the Chicago Booth Review here as well as a newsletter version here:


AI Finds Hidden Links Driving Stock Moves

AI Finds Hidden Links Driving Stock Moves

Matt Robinson
·
September 18, 2025
Read full story

A new paper takes that same framework — embeddings of text — and points it at financial news.

The researchers, including AQR’s Bryan Kelly, turn news articles into embeddings and test whether those representations contain information the market has not fully priced. Kelly is also a professor at Yale, and the paper is academic research, not an AQR report. His co-authors are Antoine Didisheim and Hanqing Tian of the University of Melbourne, and Yale’s Mo Pourmohammadi. I reached out to the authors for comment but did not hear back by publication time.

“News” isn’t always new. Tech stories typically have broader context about hardware and software. Younger companies draw more coverage about growth. Those that are heavily levered will see more details about debt covenants and credit ratings. In other words, a certain portion of news stories is boilerplate, according to the authors in a recent VoxEU column summarizing their working paper. As they write:

“True news is the part that could not have been written before the article appeared: the unexpected guidance cut, the precise earnings surprise, the segment whose growth slowed. The same logic applies to Boeing aircraft-order stories, biotech clinical-trial reports, cybersecurity disclosures, and regulatory announcements. Each article has a predictable layer that follows from the firm’s profile, and a residual layer that contains the genuine surprise.”

So, given that there’s a predictable part of news stories that doesn’t tell you much, the researchers cut the boilerplate out to find, what they call, “pure news.”

Arman Khaledian, PhD, a former quant at Millennium and now CEO of Zanista AI, put it this way in an email: “It’s like a factor model, but instead of prices you’re running it on vectorised news, stripping out the predictable part to see what’s actually moving things.”

Here’s what they did:

  • The authors collected 6.7 million Reuters articles tied to single U.S. stocks from January 1996 to December 2022.

  • They turned each article into a 4,096-number embedding using E5-Mistral-7B.

  • They averaged those article embeddings into one monthly news signal for each stock.

  • Then they asked how much of that news signal could have been predicted from the stock itself: its size, value, profitability, leverage, industry and other characteristics.

  • They subtracted that predictable part.

Here’s what they found:

  • A small slice (~8%) of what’s packed into an article’s embedding can be guessed just from knowing the company’s profile — its sector, size, leverage, valuation.

  • Once you cut out that boilerplate, the “pure news” drives price moves.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Matt Robinson · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture