Hey, it’s Matt. This week on AI Street:

🎙 Man Group’s Ziang Fang on AlphaGPT

📃 Sorting SEC Data at Scale: Datamule Update

🗞 Latest AI & Wall Street News Rundown

Forwarded this? Subscribe here. Join readers from McKinsey, JPMorgan, BlackRock & more.

INTERVIEW

Inside Man Group’s AlphaGPT

Ziang Fang, senior portfolio manager at Man Numeric, on building AI for systematic investing

Man Group has built an internal AI system that generates trade ideas and subjects them to the same internal review as human research.

The system, AlphaGPT, proposes signals, writes the code, and runs backtests before any human sees the output. Only after that does it enter Man Numeric’s standard research and investment committee process.

The $214 billion hedge fund says the edge is speed and scale. AlphaGPT can produce viable research concepts in minutes rather than days, allowing researchers to evaluate far more investing ideas than would be feasible with a human-only process.

I spoke with Ziang Fang, Senior Portfolio Manager at Man Numeric, about his recent article detailing AlphaGPT’s architecture, how Man controls for lookahead bias and data mining, and the limits of AI in systematic research.

This interview has been edited for clarity and length.

Q: What has been the reaction to the AlphaGPT article?

A: I think it’s been very well received. The article on Man Institute, What AI Can (and Can't Yet) Do for Alpha, is one of the most read pieces we’ve published recently. At the same time, especially among our client base, larger organizations and allocators are thinking hard about how to adopt AI in their daily workflows.

A lot of people have been using AI as a chatbot for different things. But using it systematically, automating it, and applying it end-to-end is different. Many are interested in how we’ve built the process and how we bring it up to standard for delivering products and research outcomes.

Q: How did Man Group’s AI adoption evolve?

A: We did a really good job making AI accessible to everyone. Once ChatGPT became available, Man Group quickly rolled it out broadly. Once people had access, they started experimenting.

Suddenly, it felt new. You could bounce ideas off it or use it to prototype code. Before, you’d have to search online and dig through posts, which was slow. Now you put your idea in and get a prototype, even if it doesn’t work perfectly.

Last year, we started thinking about bringing everything together. If AI was already used across the research process, why not think about an end-to-end, integrated adoption?

Q: Why does the reasoning model matter for quantitative research?

A: Quant researchers want to show something that works, at least on paper. But you need a lot of vetting to understand the research process versus the final backtest. What matters is whether it works in live trading, not whether it looks good on paper.

The reasoning model gives us full transparency. At every step, when an agent makes a decision, it logs why it made that choice. That level of visibility is something you don’t always get from a human-driven process.

Q: What challenges did you encounter building this system?

A: Along the way we ran into a lot of issues—hallucination, lookahead bias, multiple testing, and many other things. One exciting part about AI is that as humans, we take a lot for granted in our daily work. Now we have to step back and reevaluate everything. You ask, why do we do things this way? It created another opportunity for internal debate about what the right approach actually is.

One interesting thing is that because the language model isn't part of the group, it doesn't develop the same blind spots. When you work alongside colleagues, you eventually start thinking alike. The model learns from us but doesn't sit next to us, so it can surface angles we might have missed.

Q: How does AI help with both volume and quality of ideas?

A: There’s been an explosion in data availability. No one can realistically go through thousands of alternative datasets, many of which are unstructured.

Previously, a researcher had to manually figure out how to handle all the alternative datasets, which took a long time and many steps are repetitive. LLM-based agentic workflow provides an opportunity to automate those tasks and help systematic teams to process information at much higher volume.

The full interview continues here.

Related Reading

DATA

Sorting Raw SEC Data for Financial AI

Datamule says its open-source tool beats LLM alternatives

The promise of LLMs in finance often his a roadblock at the data ingestion layer: raw SEC filings.

While access to SEC filings is free, stitching millions of filings into clean, queryable datasets is expensive.

This is a problem John Friedman ran into at UCLA. One of his PhD classmates couldn’t pursue a research project because the dataset they needed cost $35,000. Friedman built an alternative for a fraction of that cost.

What began as an open source project, his company, Datamule, has grown into production infrastructure used by several large companies and roughly a hundred startups.

I wrote earlier this year about Friedman’s effort to make aggregated SEC data cheaper and easier to use. A LinkedIn post about Datamule gained traction and led to free cloud credits. I recently followed up with Friedman over email.

LLMs can help turn messy filings into structured data, but it’s an expensive and slow process. Friedman released an open source deterministic parser that uses the SEC filings’ underlying tree structure instead of large language models. He says his algorithm is much faster. It can parse about 5,000 pages per second on a laptop versus one page every six seconds for an LLM.

“Some large companies are using it to save on token cost,” Friedman said in an email.

With the cloud credits, Friedman launched a low-cost API for researchers and startups. Last month, it served 20 terabytes of data across 131 million requests. A total of 108 customers have used the service, often downloading the full corpus to vectorize it for AI-native search, which allows AI systems to search the data by meaning rather than keywords.

Demand has expanded beyond raw filings. Companies have asked for datasets that identify every person named across the SEC corpus, vector embeddings for all 10-K filings, all Item 5.02 disclosures from 8-Ks, explanations for late filings, and extracted audit reports. Friedman says many of these datasets can be generated within hours at relatively low cost.

After turning down investment and acquisition offers, Friedman plans to spend the next few months making Datamule more robust and meeting enterprise needs. Longer term, he plans to apply machine learning techniques to better analyze the underlying data.

Friedman asked that AWS and Cloudflare be credited for supporting the project. He specifically noted Melissa Kargiannakis for helping Datamule join Cloudflare for Startups.

Takeaway

While LLMs have amazing capabilities, they don’t really work unless the data is accurate. There’s still a lot of work in the unglamorous task of wrangling data.

Further Reading

  • Open-Source Code That’s Making SEC API Calls Cheap | AI Street

NEWS

AI & Wall Street News Brief

AI Adoption

Junior research jobs ‘will be gone’ because of AI, says DBS

The head of investment banking at DBS said junior equity research roles “will be gone” within the next few years due to artificial intelligence, as south-east Asia’s largest bank doubles down on the use of the tech. The Banker $

BlackRock’s Talent Head on How AI Is Changing Hiring

Nigel Williams, BlackRock's global head of talent acquisition, said that AI is shifting his hiring priorities, and that fluency with the technology is now key to any strong application. Business Insider

Vanguard pushes back on AI job-loss fears

Vanguard research finds that roles most exposed to AI are still seeing job and wage growth in line with other occupations. The narrative that AI adoption will lead to massive job losses has not been supported by much data. Vanguard

AI Investment to Keep Rising and Boost World Economy: OECD

The OECD says the surge in AI investment supporting global growth is set to continue, with rising tech spending helping offset trade uncertainty after the group raised forecasts for major economies including the US. BBG

AI Regulation

Trump Order Puts DOJ in Charge of Battling State AI Laws

Trump issued an executive order centralizing AI oversight at the Justice Department, creating a task force to challenge state AI laws and threatening to withhold federal broadband funding from states deemed noncompliant.

The tech industry backs a single national framework, while states including Colorado and California are preparing legal challenges, arguing the federal government is overstepping and that states have the right to regulate AI risks such as discrimination and privacy. BBG, Reuters

SPONSORSHIPS

Reach Wall Street’s AI Decision-Makers

Advertise on AI Street to reach a highly engaged audience of decision-makers at firms including JPMorgan, Citadel, BlackRock, Skadden, McKinsey, and more. Sponsorships are reserved for companies in AI, markets, and finance. Email me ([email protected]) for more details.

QUOTABLE

“Technology change is going to cause massive dislocation in the credit market… I don’t know whether that’s going to be enterprise software, which could […] benefit or be destroyed by this. As a lender, I’m not sure I want to be there to find out.

Apollo’s Marc Rowan: FT
ROUNDUP

What Else I’m Reading

  • Why the A.I. Boom Is Unlike the Dot-Com Boom | NYTimes

  • Powell: AI isn’t a 'big part' of the job market’s story — for now | Yahoo

  • Senators Probe AI Data Centers in Rising Electricity Costs | NYTimes

  • Is There Enough Data Center Capacity for AI? | Goldman Sachs

  • BlackRock’s Crew of Quant PhDs Are On Track for a Record Year | BBG

  • AQR Roars Back With $179 Billion in Assets | BBG

  • Meet BlackRock's tech 'translator' spearheading agentic AI | BI

  • Arcesium adds new AI features for Aquata data platform | HedgeWeek

CALENDAR

Upcoming AI + Finance Conferences

I imagine I’m missing some events, if there’s one you think I should add, reply to this email or reach out: [email protected]

Reply

or to participate

Keep Reading

No posts found