• AI Street
  • Posts
  • Goldman CDO: AI Can Clean Its Own Data

Goldman CDO: AI Can Clean Its Own Data

Hey, it’s Matt. This week on AI Street:

🧑‍💼 The Role of Data in AI: Goldman CDO

📄 BlackRock, Snowflake Back Data Standards

🎙️ Q&A: The AI Hedge Fund That Updates Itself

📏 California Passes First AI Safety Law 

Forwarded this? Subscribe here. Join readers from McKinsey, Citadel, BlackRock & more.

SPONSOR

A top hedge fund ran 1,000+ web scrapers — and had engineers spending 120 hours a week maintaining them.

They switched to Kadoa.

In six weeks, they migrated 300+ scrapers. Kadoa’s self-healing technology cut maintenance to nearly zero, expanded coverage 5x, and reduced costs by 40% compared to in-house development. Data now flows directly into their warehouse and trading systems.

Kadoa is used by leading hedge funds and asset managers to monitor SEC filings, news sites, company websites, and more — without the engineering bottleneck.

AI Street readers can get a free trial of Kadoa’s full platform. Contact Tavis Lochhead at [email protected] or ask me to make an introduction.

BACKEND AI

The Role of Data in AI, according to Goldman’s Chief Data Officer

Chief Data Officer Neema Raphael said on a recent firm podcast:

AI can help organize data itself, creating a feedback loop where AI agents assist with data cleansing, normalization, and linking, speeding up the process of making enterprise data AI-ready

So absolutely in the same way where we're seeing software being created by these agents, there's also a feedback loop of data cleansing and normalization and wrangling.

—Neema Raphael

We've already exhausted public data sources, but vast amounts of valuable enterprise data remain untapped behind corporate firewalls—this proprietary data will be key to creating differentiated business value

A lot of trapped enterprise data still has not been harnessed.

—Neema Raphael

Data quality directly determines AI output quality. Organizations must invest in cleaning, normalizing, and creating clear semantic links between their data to unlock value beyond consumer-level AI capabilities

Whatever data patterns you are feeding this machine is what it's going to learn and what it's going to extrapolate from.

—Neema Raphael

Takeaway:

The supply of public AI training data is drying up, but enterprise data remains largely untapped. Big firms hold an advantage with large proprietary datasets, but only if that data is cleaned, structured, and usable first. (See the item below on BlackRock and Snowflake backing a data standard to get at this standardization problem.)

FRONTEND AI

JPM Uses AI for Pitch Decks, Tests Agents

  • About half of 250,000 bank employees who have access to JPMorgan’s LLM Suite use it daily.

  • AI created an investment banking deck in 30 seconds, replacing hours of analyst work, according to a demo seen by CNBC.

  • The bank has begun deploying agentic AI to handle complex, multistep tasks for employees, beyond simple drafting or summarization.

  • Operations staff in the bank’s consumer banking division will shrink by at least 10% over the next five years due to AI use.

Chief Analytics Officer Derek Waldron told CNBC he sees a future in which AI is woven into the fabric of the company:

Every employee will have their own personalized AI assistant; every process is powered by AI agents, and every client experience has an AI concierge.

JPM’s Chief Analytics Officer Derek Waldron

Takeaway:

It’s still very early days in AI adoption, but JPMorgan, with its $18 billion tech budget, is pointing to what the future may hold with everyone getting their own AI Assistants.

DATA

BlackRock, Snowflake, and Salesforce Back Open Data Standards

BlackRock, Snowflake, and Salesforce are backing an initiative to standardize data metrics across companies, tackling one of AI's biggest bottlenecks: conflicting data definitions.

The problem is when an AI agent tries to answer: "What's our Q3 revenue growth?" it needs to know which revenue definition to use. If the Finance department defines it one way and the Sales team defines it another way, the AI either gives conflicting answers or picks the “wrong” one.

As we talked about last week, AI previously couldn’t “understand” Paris → France, Berlin → ______.” They didn’t know those were related concepts, or that both were city–country pairs. AI just saw strings of text.

This wasn't super critical before. When data lived in siloed reports, conflicting definitions stayed contained. But now that AI can actually use these semantic relationships, inconsistent definitions get amplified across every query and analysis. Without standard terms, it’s hard to scale AI.

Getting all of these companies to agree on a nomenclature won’t be easy or happen fast, but these are big companies so maybe they’ll get everyone at the table.

Takeaway:

AI is only as good as the data it interprets. Different definitions of 'revenue' aren’t an AI model problem, they’re a data problem.

Related:

I interviewed Snowflake's Jonathan Regenstein last fall on the trouble of bad data in AI adoption.

ICYMI INTERVIEW

Meet the AI Hedge Fund That Updates Itself

Aric Whitewood runs a systematic hedge fund that “evolves” with markets.

XAI Asset Management's self-updating AI framework strengthens useful signals, fades out noise, and adapts with limited human oversight.

Unlike many systematic shops, XAI is built as a closed-loop system, designed to learn continuously while keeping human intervention minimal. Whitewood calls it a “causal reasoning engine,” drawing on his background in radar systems, signal processing, and machine learning at Credit Suisse.

𝗧𝗵𝗲 𝗲𝗺𝗲𝗿𝗴𝗶𝗻𝗴 𝗺𝗮𝗻𝗮𝗴𝗲𝗿'𝘀 𝗿𝗲𝗰𝗲𝗻𝘁 𝗿𝗲𝘀𝘂𝗹𝘁𝘀:

  • 2022: +38.6%

  • 2023: +0.2%

  • 2024: +16.3%

  • 2025 YTD: +20.3%

In our Q&A, he explains:   

  • How XAI’s AI system “evolves” with markets through a closed-loop, Bayesian approach   

  • Why LLMs are just one tool among many—not the center of intelligence

  • The risks of chasing scale over efficiency in AI research   

  • How the same "causal reasoning engine" could be applied beyond finance

Matt: Tell me about your fund.

Aric: The vision of the firm is to create a kind of multi-strat, but with AI creating all the pods. I know other people have claimed, ‘Oh, we have LLM traders, they do everything for you,’ but I’m not convinced by that. What we have is a real track record—actual trading of real assets and with double-digit returns over multiple years.

We’ve done it for macro assets, to some extent for stocks, and we’re now looking at options and other asset classes. The idea is to have pods, but all powered by a very consistent underlying framework—what I call a causal reasoning platform.

This platform pulls together elements of signal processing from my aerospace days, combined with AI and ML. It handles regime shifts and uncertainty. In fact, it embraces uncertainty. That was one of my early realizations: many quants see uncertainty as a nuisance. They widen their distributions, or they avoid it altogether because it doesn’t fit neatly into an equation.

But in signal processing, uncertainty is everywhere. In radar systems, you’re trying to detect targets with imperfect data, constant noise, and competing signals. Sometimes even your own radar system interferes with what you’re trying to see. In finance, the signal-to-noise ratio is just as bad and worse, it changes over time. That’s the challenge, but also the opportunity.

Our system makes uncertainty a feature, not a bug. It’s fundamentally Bayesian in nature. When you fly drones, you often use Markov decision processes to control them. The environment is uncertain, you never fully know what’s going on, but as you observe more data, you refine your understanding. That’s exactly what we’re doing in financial markets: continuously observing, updating, and adapting as prices come in and regimes shift.

SPONSORSHIPS

Reach Wall Street’s AI Decision-Makers

Advertise on AI Street to reach a highly engaged audience of decision-makers at firms including JPMorgan, Citadel, BlackRock, Skadden, McKinsey, and more. Sponsorships are reserved for companies in AI, markets, and finance. Email me ([email protected]) for more details.

REGULATION

California Passes Law for AI Safety Measures

California Governor Gavin Newsom signed the first state law aimed at frontier AI systems. It requires large labs like OpenAI, Anthropic, Meta and DeepMind to:

  • Report serious incidents, such as cybersecurity failures, misuse, and safety risks within 15 days.

  • Extend whistleblower protections — with fines up to $1M per violation.

  • Authorizes state oversight to monitor compliance and investigate violations.

This makes California one of the first jurisdictions to mandate public disclosure of AI safety practices, and could set a baseline for national standards.

At the federal level, a bipartisan bill from Sens. Josh Hawley and Richard Blumenthal would require AI companies to submit risk data to the Energy Department before releasing advanced systems, blocking deployment until the agency signs off.

Takeaway:

California's law is among the first to regulate frontier AI, requiring companies to disclose risk management practices rather than restricting training. Meanwhile, financial regulators have issued few AI-specific rules for Wall Street. This will eventually change as AI adoption picks up, but that’s a long way off.

WHAT ELSE I’M READING
  • OpenAI Test Shows Agents Closing Gap With Humans (Ethan Mollick)

  • Anthropic Says New Model Can Code for 30 Hours Straight (BBG)

  • OpenAI, Stripe Enable Purchases Inside ChatGPT (Finextra)

  • AI Researchers Turn to Virtual Worlds to Advance Learning (WSJ)

  • HSBC Says Quantum Computing Trial Beat Wall Street Rivals (BBG)

  • Moby Raises $5M Seed Round to Deliver AI-Powered Investing (PR)

CALENDAR

Upcoming AI + Finance Conferences

  • Open Source in Finance Forum - Oct. 21-22 • New York

    Finance and tech leaders tackle how open source and AI can be governed, scaled, and applied in financial services.

  • AIFin Workshop at ECAI 2025 – October 26, 2025 • Bologna, Italy

    One-day academic workshop on AI/ML in finance, covering trading, risk, fraud, NLP, and regulation.

  • AI in Finance 2025 – October 27–30, 2025 • Montréal

    Academic event covering ML in empirical asset pricing and risk.

  • ACM ICAIF 2025 – November 15–18, 2025 • Singapore

    Top-tier academic/industry conference on AI in finance and trading.

  • AI for Finance – November 24–26, 2025 • Paris

    Artefact’s AI for Finance summit, focused on generative AI, future of finance, digital sovereignty, and regulation 

  • NeurIPS Workshop: Generative AI in Finance – Dec. 6/7 • San Diego One-day academic workshop at NeurIPS focused on generative AI applications in finance, organized by ML researchers.

How did you like today's newsletter?

Login or Subscribe to participate in polls.

Reply

or to participate.