HRT Trains AI Models on Trading Data
The quant firm has developed transformer-based models using decades of market microstructure data.
Hey, it’s Matt. Welcome back to AI Street. This week:
HRT on Building “Foundation Models for Automated Trading”
Top Papers on AI in Finance Q4 2025: SSRN
JPMorgan’s Dimon: banks must invest in AI or get left behind. + More News
Hudson River Trading is building foundation-style models trained on decades of global market data, applying techniques similar to those used in frontier language models for automated trading.
The firm is training these models on more than two decades of data spanning equities, futures, and cryptocurrencies, totaling over 100 terabytes. That translates into “something like trillions of tokens, in the same realm as what you train frontier language models on,” said Marc Khoury, a researcher on HRT’s AI team, speaking at an academic conference.
At a high level, HRT’s goal is to model markets as sequences of interactions. Electronic markets generate detailed streams of activity, including full limit order books, executed trades, and order-level events such as placements, cancellations, and fills. According to Khoury, much of the predictive signal lies in how these sequences evolve over time, especially during fast-moving conditions.
(We’ve previously discussed how training transformers on specific domains — the weather, payments, grocery store sales — have shown promising results.)
Khoury showed charts indicating that more data boosted the model’s predictive performance. “As I increase the model size, the model continues to improve,” Khoury said, adding that the pattern mirrors what researchers see when scaling large language models.
To train a transformer model on markets, trade data has to be converted into tokens. In language models, tokenization typically breaks text into subword pieces. See the below example from OpenAI’s tokenizer:
In market data, tokenization means mapping events into discrete units a model can learn from. One approach groups events into fixed time intervals, such as one-minute windows. Another bundles a fixed number of events together. HRT described both as active design choices, each with tradeoffs for model performance and cost.
“Language modeling people would call this tokenization,” Khoury said. In this context, each token represents a unit of market activity rather than a word. The goal is to preserve market structure while keeping training and inference computationally feasible.
For more background on tokenization of market data, check out my chat with Juho Kanniainen, a professor at Tampere University’s Data Science Research Centre, on treating limit order books as “language” here.
Transformer models offer one way to address this problem. Because they are designed to process long sequences of diverse events, transformers can adapt to changing conditions without relying entirely on fixed, hand-engineered features.
Training models at this scale requires substantial infrastructure. Khoury did not disclose how many GPUs the firm operates, but he said HRT runs its own state-of-the-art data center and that at one point, the firm’s hardware purchases were large enough that he joked it “bottlenecked” GPU deliveries on the U.S. East Coast.
The research effort comes as HRT posts record results. Bloomberg reported that the firm generated an estimated $12.3 billion in net trading revenue in 2025. The company has also been expanding beyond high-speed trading into longer-horizon strategies, while continuing to invest heavily in artificial intelligence and data-driven trading models. The firm accounts for roughly 10 percent of U.S. equity trading volume.
An HRT spokesperson declined to comment on the firm’s AI training efforts.




