3 Comments
User's avatar
Guido's avatar

Interesting that the core bet here is treating market data as a 'language' to learn from scratch. What I'm curious about is the trade off: pre-training from zero vs. continued pre-training on an open-source base, then fine-tuning for trading objectives. The compute costs are orders of magnitude apart, but perhaps I'm failing to consider that HFs can go through expensive training runs so...

Great article.

Matt Robinson's avatar

I think the costs are high even for HF but I'm not sure that training on time series data in one domain translates into another. so maybe that's why? HRT's Marc Khoury goes into pretty good detail here on their thinking of tokenizing trading data. https://slideslive.com/39043823/foundation-models-for-automated-trading

User's avatar
Comment removed
Jan 30
Comment removed
Matt Robinson's avatar

Transformer models keep beating earlier architectures across more domains, so it’s hard to see GPU demand slowing.