AI Street

AI Street

Research

Tracking Shifts in Earnings Call Narratives

LLMs are better at tracking which metrics leadership spotlights—and which ones they’ve quietly stopped talking about.

Matt Robinson's avatar
Matt Robinson
Mar 24, 2026
∙ Paid

Regulatory rules dictate how companies report performance, but there are virtually no rules governing what management chooses to talk about on an earnings call.

The C-suite can choose what numbers to highlight — revenue per user, lifetime value, total addressable market, etc.

These selected metrics support the corporate narrative, but they’re not static. When a number is strong, it gets airtime. When it softens, it tends to quietly disappear from the script, replaced by whatever metric tells a better story that quarter.

Shifting corporate narratives happens so often it has a name: “moving targets.” This tracks the fraction of previously highlighted metrics that go missing from the next comparable earnings call. Research showed that firms with high metric turnover tend to underperform in subsequent months. The more a company reshuffles the numbers it talks about, the worse its stock tends to do.

The challenge is detection. Most approaches rely on keyword matching across transcripts, comparing terms quarter over quarter. You need to be able to link “revenue growth” to “top-line expansion.” Keyword search can’t distinguish “North America cloud revenue” from “revenue.”

At scale, this becomes difficult to track. Following the metrics that appear and disappear across thousands of earnings calls is not feasible to do consistently by hand. This is where LLMs fit: extracting and standardizing how companies describe performance over time. This is tedious work for humans and easy to scale with AI.

A group of researchers at MIT, BlackRock, and J.P. Morgan asked whether LLMs could close this detection gap.

Here’s what they did:

  • Instead of scanning transcripts for predefined terms, they use an LLM to extract full phrases with context. Where keyword methods pull “revenue,” the model pulls “North America cloud revenue.” Where it grabs “dividends,” the model also captures “cash flow,” “share repurchases,” and “cash flow from operations.”

  • They then compare metrics across quarters using semantic similarity rather than exact matches. Instead of forcing a binary match, they allow for an “ambiguous” range where similarity is scaled.

  • They apply this across firms listed in the S&P 100 index from January 2010 to December 2024, yielding 5,615 firm-quarter observations across 64 quarters.

  • To test it, they sort stocks by how much their metrics shift and compare returns, then run cross-sectional regressions with standard controls for size, valuation, and prior returns.

Here’s what they found:

User's avatar

Continue reading this post for free, courtesy of Matt Robinson.

Or purchase a paid subscription.
© 2026 Matt Robinson · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture