DATA
AI Trails Humans on Wall Street

Made with ChatGPT
AI may eventually surpass the average Wall Street analyst, but right now, general models like ChatGPT perform investing tasks poorly.
A third-party benchmark shows how far they need to go to match an average Wall Street analyst.
Vals AI, a San Francisco startup that evaluates AI models across industries ranging from healthcare to finance, found that the best models today score around 55 percent accuracy on a test designed to mimic the work of an entry-level financial analyst. Most don’t even reach 50 percent.
The benchmark is designed to measure how well AI agents handle real-world finance tasks. Vals AI collaborated with investment bankers, hedge fund analysts, and professionals from a major global bank to help create the 537 test questions.
They cover nine different types of financial tasks - everything from simple information lookup to complex financial modeling. Each question was reviewed for accuracy.
The researchers gave the AI models helpful tools to answer these questions, including access to Google Search and the EDGAR database (where companies file their official financial documents).
Some questions were easy:
What was the quarterly revenue of Salesforce (NYSE:CRM) for the quarter ended December 31, 2024?
Others more challenging:
What is Lemonade Insurance’s Adjusted EBITDA for the year ended December 31, 2024?
And still others requiring multiple steps:
Which Geographic Region has Airbnb (NASDAQ: ABNB) experienced the most revenue growth from 2022 to 2024?
Models with extended reasoning performed better than their standard counterparts. Claude Sonnet 4.5 (Thinking) achieved 55.3% accuracy, outpacing the non-thinking version at 49.3%. Claude Opus 4.1 (Thinking) followed the same pattern, scoring 50.9% versus 46.1% for the non-thinking model.
Extended reasoning comes at a cost in processing time. Claude Sonnet 4.5 (Thinking) required 166.9 seconds to achieve its top 55.3% accuracy rate, while the faster Sonnet 4.5 variant completed tasks in 122.7 seconds but scored only 49.3%. GPT-5, the slowest model at 504.1 seconds, managed just 46.9% accuracy, while o3 delivered 48.3% accuracy in 180.1 seconds.
Top-performing Claude models command higher per-test costs ranging from $1.28 to $4.40, correlating with improved accuracy on financial analyst tasks. Budget alternatives like o3 cost $0.74 per test with competitive speed and reasonable accuracy, though users sacrifice performance compared to market leaders.
The data suggests that extended thinking—the ability to work through problems methodically rather than generating answers immediately—genuinely improves financial analysis performance.
That improvement comes at a price: extended thinking models cost more and take significantly longer to run.
Compared to humans, though, AI is a lot cheaper and a lot faster. The benchmark said humans took about 17 minutes to answer a question for a cost of $25.66 per query.
But no one is going to rely on a model that is as good as a coin flip.
Interestingly, once a model reaches about a dollar per task, each incremental gain in accuracy comes at exponentially higher cost. Beyond a certain point, more data and parameters yield diminishing analytical value.
So this suggests that these models won’t surpass these accuracy hurdles by more computing power alone.
Closing the sizable accuracy gap will require models that reason better, not just grow larger. Systems that can plan, verify, and cross-check their work will likely outperform the next trillion-parameter model.
Takeaway
AI struggles with multi-step, complex tasks, but performs far more reliably on narrow, single-step use cases. So prompts should focus the model on one clear action at a time, not broad requests like “What is your analysis of X?”
Further Reading

SPONSORSHIPS
Reach Wall Street’s AI Decision-Makers
Advertise on AI Street to reach a highly engaged audience of decision-makers at firms including JPMorgan, Citadel, BlackRock, Skadden, McKinsey, and more. Sponsorships are reserved for companies in AI, markets, and finance. Email me ([email protected]) for more details.

RECAPS
In Case You Missed It*
*Not investment advice
