The Hedge Fund Run by Machines Is Going Agentic
Numerai, the crowdsourced hedge fund, is moving from human quants to AI agents.
Richard Craib runs one of Wall Street’s most unconventional business models: a crowdsourced hedge fund. He also counts JPMorgan as his biggest backer.
Craib, a South African mathematician, launched Numerai in 2015 in San Francisco, far from the epicenter of finance in New York, with the goal of reinventing how hedge funds are built.
Numerai crowdsources stock market predictions from thousands of data scientists worldwide by providing encrypted financial data that obscures the underlying securities. It then aggregates those forecasts into a single trading strategy. Contributors stake the company’s cryptocurrency, Numeraire, on their models, earning rewards for strong performance and losing funds for poor results.
Despite its unconventional structure, Numerai manages real capital and in August secured a commitment of up to $500 million from JPMorgan Asset Management, potentially more than doubling the fund’s size. The investment followed a strong year for the fund, which reported a 25.45% net return in 2024 with a Sharpe ratio of about 2.75.
For much of its history, Numerai framed itself as a hedge fund built by machines but guided by humans.
Craib is now reworking Numerai for autonomous research. Last month, the firm outlined plans to redesign its system to support agents rather than just human data scientists, including a new Model Context Protocol interface that would give AI systems direct programmatic access. Under that framework, agents could create models, submit predictions, run validation tests and monitor performance on their own, effectively executing the full research cycle without manual intervention.
The shift reflects Craib’s view that advances in modern AI tools have changed who, or what, can participate. Human users are expected to move toward designing and supervising AI research assistants rather than building models themselves, while updated staking mechanisms would allow agents to manage financial exposure programmatically.
He expects agents to spread quickly across quantitative finance, potentially reshaping how ideas are generated, tested and traded.
In our chat, we discuss:
Why Numerai is redesigning its platform for autonomous AI agents, not just human quants
How large language models became capable of running the full research cycle with the right scaffolding
Why Craib believes future hedge funds will rely on “AI scientists” exploring vast idea spaces
How the JPMorgan investment came together and what it signals for institutional adoption
Why Craib thinks many traditional fund roles, and even star managers, could become obsolete
Here are some of my favorite quotes:
“I’m not the smart guy, but I made a website to be friends with all the smart people.”
“You’re just gonna see very quickly people feeling they’re doing
it wrong if they’re not using agents.”
“The way I see it is more like these models are themselves AI scientists,
and they weren’t a year ago.”
This interview has been edited for length and clarity.
Matt: You started Numerai about 10 years ago, when AI was not as prominent. Now you have JPMorgan investing. How were those first couple of years?
Richard: Actually, I thought when I was starting it, AI was a bubble in 2015. It felt that way. Google had acquired DeepMind for $500 million, which people thought was just really extreme. There was a lot of different kinds of hype at that time, and I guess we were more in the machine learning space, and we weren’t quite on LLMs yet. But that was AlphaGo in 2016, right when Numerai started. But it ended up not being a bubble at all. There was a lot more to come.
Matt: It’s still an unusual model for a hedge fund. Looking at your recent NumerCon announcements, it seems you are setting up the infrastructure for submissions that don’t necessarily come from humans.
Richard: We’ve actually always thought about it that way. When you signed up in 2016 on Numerai, it didn’t say “enter your username,” it said “name your AI.” You were not the one who was doing anything, except setting up the learning algorithm to start learning, and then AI would be the thing submitting. And now that’s become even more true, because even the code that you would write to generate the model, even that code can be written by AI. So, we just see it as another abstraction.
Put it this way, we were never asking data scientists to write machine learning algorithms in assembly code. They were using the most extreme abstractions, so they would use scikit-learn in Python, or TensorFlow, and now there’s another layer of abstraction, which is Claude can do TensorFlow for you, or PyTorch for you.
It’s natural for us since the beginning of ChatGPT since it’s always known about Numerai. It knew how to make a basic model, even on the first version, but then it got better and better. So, users have always been using the chat interface, but we never fully enabled native agent support until NumerCon.
Matt: What made you decide to focus more on this approach? When did it click?
Richard: In November, there was a tipping point that everyone in Silicon Valley felt. Models like Claude and ChatGPT Codex became capable of doing almost anything if you provided the right scaffolding.
That was the moment where it was like, okay, well, now we should really just lean into this, because you get the feeling that everyone will be here soon.
In the beginning of Numerai, there was a popular statistical programming language called R, and that was actually very popular, maybe half and half users used that. But then it moved to Python, PyTorch, almost completely, and I think it’s the same thing with this. You’re just gonna see very quickly people feeling they’re doing it wrong if they’re not using agents.
Matt: So, you see this more as the scaffolding and architecture. From the B2C side, it’s about which model does what, but for enterprises, they’re more concerned with the framework—the tracks it runs on.
Richard: This is the key thing. It’s not super well understood, but if we were to hire a PhD that was super smart, he would still make the basic errors in the first few weeks, because he wouldn’t know how to do a proper cross-validation backtest on financial time series data. And that’s the same with Claude. No matter how smart it gets, it doesn’t have Numerai Skills, which is the skills.md file we made. So, when you ask it, do a whole bunch of research on Numerai, sort of within the first hour, it’ll make three bad mistakes. It’s not really its fault, it’s not because it’s dumb, it just doesn’t know quite how we do things. And so, once it’s got access to the skills.md, it’ll be like, oh, well, if I need to do that, I’m just—I have to use the skills way of doing it. And so that’s how the scaffolding gets defined really nicely.
Matt: So, it’s tailored directions. And it sounds like you’d rather have good scaffolding with a lesser model than a good model without it.
Richard: Yeah, exactly. And the 'Skills' feature was released by Anthropic basically only two or three months ago. If you don’t have that, it’s almost like you haven’t had onboarding. It’s like Citadel University where you spend a month before they let you do anything. You have a whole bunch of training, and so you’re building up skills about how the organization works, how they do things, before they let you touch any production code. So that’s a training camp for AIs.
Matt: Have you heard of Man Group’s AlphaGPT?
Richard: Yeah, I don’t know what it really is, but I’ve seen that they’ve made announcements about it.
Matt: I spoke with Man Group’s Ziang Fang about it. He described it as an end-to-end idea test machine. Whatever passes through certain thresholds, the humans look at it. They’ve said the AI-generated ideas are passing their human benchmarks on the tests—it just lets them test more ideas and do more than they could before. Is that a similar concept to Numerai?
Richard: It’s one thing to say we have good infrastructure to test ideas. But it’s really, can you get to the point where you can span the full space of possible ideas? That’s the trouble with a quant signal. If you think about how many ways there are to order 6,000 stocks—because that’s what Numerai users are doing, ranking best to worst—there are 6,000 factorial. That’s 2 times 10 to the power of 200. There are more permutations than atoms in the universe. So, that’s why we still need the crowdsourcing, because we don’t know what to ask, or what model to build in the first place.
I think if you were 16 years old and you said, “Hey Claude, make me a quant fund from scratch,” and that’s the best prompt you could come up with, it would make you a very generic, mode of the distribution, quant fund infra. Whereas if you say, “I read this paper in 2023, which had these really strange ideas, and here’s the paper, and can you turn this into a Numerai idea in this way,” then that is a much more directed search into what’s possible. So that’s how we see users using this. They still have a role to play in, ‘This is the direction I want to go in,’ and I don’t want something average, because you don’t get paid for an average model on Numerai.
Matt: You’re looking for outliers.
Richard: We literally pay you for orthogonal alpha. So, if you make something crowded, that is the first thing Claude would come up with, and you’ll earn nothing.
Matt: HRT has said at some academic conferences that they’re training foundational models on financial time series—basically the language of financial data. What do you think about that approach?
Richard: We haven’t done that. I don’t think it’s that important. If you just train on the price time series, it’s crazy to say all you need is the price to predict the future. That’s a very 90s thing to say. You are getting paid by the market for adding strange new information to it, not the most commonly known information. So, we have 2,000 features, and not that many of them are based on price. You could maybe cobble something together, but that’s not the be-all, end-all.
The way I see it is more like these models are themselves AI scientists, and they weren’t a year ago. So why not just now run the scientific method more and more?
We have built language models. We built something called Numerai Predictive LLM, and we made it read news and then come up with a prediction from the news. That was an 8 billion parameter model. It actually doesn’t matter if you make it higher in terms of parameters. But that was a natural use case, because the current language models will not be able to predict from news what will happen, because they’re not trained to.
One example we give is, if you have a company like NVIDIA, and they make a press release that says, “we’re being investigated by the Department of Justice for monopolistic practices,” and the second paragraph is, “our revenue grew 150% year over year.” You ask ChatGPT, is this good or bad for the company? And ChatGPT will say, well, it’s neutral, because there’s some good things in it, and there’s bad. And actually, it’s extremely positive for the stock if you’ve fine-tuned the model. So, our model gets that right. It says this is amazing news, and the other models don’t. That’s one place where we are internally building features with language models, but it’s not on the time series of price.
Matt: How did the JPMorgan investment come about?
Richard: The thing about hedge funds is there’s quite a lot of short-term thinking, people want the first 3 years to be amazing. But we were like, let’s not even raise any LP money and raise venture capital, and then build something that no one can compete with in the long term. That’s why it’s been more of a tech company.
JPMorgan, the first meeting with them was something like 2018, 2019. We were probably below $100 million, maybe below $50 million, because they’ve had a lot of success investing in cutting-edge stuff. They’ve invested in early machine learning funds like Voleon and Voloridge, I believe.
In the early days, it was more like us saying, what do you want us to do to be fully institutional and ready for you? And they told us all the things they like. They like the 3-year track record, not a zero-year track record. They, in some ways, helped us make the product something that was institutional grade.
They’re not the first institutional investor—we have 3 endowments and many others besides. But they are the biggest one in terms of capacity, they want to invest $500 million.
Matt: Has that opened other doors?
Richard: Yeah, a lot. We now have a 6-year track record. We look very good compared to peers, and we’re getting better and better. Whereas other peers, maybe they got too big, and now they’re struggling to put up good numbers. But we’re snowballing, where our data is growing, our users are getting smarter, and everything’s kind of getting better, and risk management is getting better. So, yes, after the JP Morgan announcement, many of the big players have been reaching out to us. And probably in the next month or two, there might be other announcements, and we should be over at 1 billion quite soon. We’re almost $600 million, but $600 million is maybe 2 checks away from a billion.
Matt: You also have a different level of auditability and transparency compared to a typical hedge fund. There’s a lot more detail about how models are submitted and staked.
Richard: Yeah, and it’s interesting, because we’ve even had an endowment investor make a user account and submit a model. And he got to really see, okay, this makes sense. I didn’t understand this bit. And also, he got to see that he didn’t win. There were people who were a lot better than him. He sort of saw, okay, this talent is very good here. Anyone can watch our performance. Another thing you can watch that I like to watch is how well the metamodel is doing—the combination of all models—how well that’s doing against the benchmark models, which is just the free benchmarks we give away. But those models, we’ve tried our best to make them very good, so they’re our best internal model. And we said, here’s the baseline model, improve on it. Well, almost month after month, the edge widens, and it’s never looked as wide right now, where the crowd, the stake-weighted metamodel, is crushing the best we can do internally. Because the reality is, we’re good at data science. We have good data scientists. We’ve hired some of the top Numerai users over the years, but we don’t know how to beat everyone. And we don’t think we’ll ever beat everyone even with infinite AI scientist assistance.
Matt: Is that just the wisdom of the crowd?
Richard: No, I don’t even like that term, because the wisdom of the crowd is almost saying that the individuals are dumb, but the crowd is smart. I actually think it’s the opposite. It’s much more like an open source project, where about 1% of the users who’ve ever signed up to Numerai are the core contributors. And then the next 5% is also very important. Numerai, by asking people to stake their models, we are making it hard, on purpose, to do well. If you do badly, you will get your stake destroyed. So, Numerai is more like an API to find the best thousand data scientists in the world, versus let’s all make dumb guesses, and it’ll average out to something good.
Matt: So it’s more like winnowing it down to the best?
Richard: Yeah.
Matt: You’ve been building this a while. What’s been the most surprising thing?
Richard: The one thing I do think is true, and it makes total sense, is that the venture capital industry in this country is just amazing. We raised from the best VCs. Very quickly, they sort of saw a future where it’s like, okay, well, what if the way Millennium works is kind of gonna seem outdated in 2030? Where you hire all these people, and then pay them a lot, and then they read the newspaper and code. It’s weird. But the Numerai way was this new thing, and so it’s always been very easy for us to raise venture rounds. But I would say that the asset allocators, they’re more backward-looking. They say, well, Millennium has a 30-year track record. And we don’t trust AI yet. So that to me was quite tiring, in a way, to basically try to just educate, because when you heard about Numerai, there would often be three things you have to kind of know about. Blockchain, which no one knew about. Then machine learning. And then quantitative finance. So you had to have all three to like Numerai. You had to have a lot of knowledge of all three. And most people were kind of 1 out of 3. Now, I would say people are getting to 3 out of 3, because those are the technologies du jour.
Matt: What about the hype cycle? You’ve seen Numerai get labeled different things over the years.
Richard: Yeah, we’ve been a little bit bubble-averse. There was a similar time where people were talking about us as a blockchain company and hyping up our cryptocurrency. And I was just trying to put some cold water on that, because I just don’t want people to be disillusioned. That’s not really a hedge fund style. It’s supposed to be risk-adjusted, long-run. It’s not gambling.
Matt: Are you seeing more submissions since NumerCon?
Richard: Yeah, it has. NumerCon was less than a month ago, and there’s 150 to 200 MCP connections, and there’s only 500 staked users. They make many models per user. That’s surprising.
Matt: Are you going to get to the point where an agent is working for you and you’re just on the beach?
Richard: That’s the dream, but everybody has agents, too, including Numerai’s peer competitors. I really think that there are pods at Millennium that are paid $100 million a year with code that could be replicated in 40 hours by Claude. And so I don’t know what they do. I think that’s part of Numerai’s mission. We want fewer human beings in the hedge fund management industry. And I think we’ll get there, and Claude is helping.
Matt: What about the broader disruption to white-collar work? A lot of it turns out to be white-collar manual labor.
Richard: I think it’ll be looked back on in almost disgust by grandchildren. With trading, you don’t understand the human mind or intelligence if you think you can go out for breakfast at the St. Regis in New York, have coffee, and then on your way walking to work you’re like, “I should buy NVIDIA.” And then you go and buy it. It’s crazy that you have no information, except kind of the sort of amorphous blob of human thought.
Matt: But that’s not just the retail trader. You’re talking about hedge fund managers doing the same thing.
Richard: I’m worried more that it’s actually hedge fund managers who would be that person having coffee at the St. Regis, and they’d buy $100 million of NVIDIA based on their vibes. And you’re like, you know there’s 2,000 dimensions of data that Numerai has?
Matt: The majority of money managers underperform the benchmark.
Richard: At what point do you realize you had a lucky call, and it had nothing to do with you in some way? It was just an apparition in your mind? We’re very vulnerable to that type of thing.
Matt: Everyone on Wall Street wants to be the smart guy.
Richard: I’m not the smart guy, but I made a website to be friends with all the smart people.
An earlier version of this interview misspelled NumerCon.


