Interesting that the core bet here is treating market data as a 'language' to learn from scratch. What I'm curious about is the trade off: pre-training from zero vs. continued pre-training on an open-source base, then fine-tuning for trading objectives. The compute costs are orders of magnitude apart, but perhaps I'm failing to consider that HFs can go through expensive training runs so...
I think the costs are high even for HF but I'm not sure that training on time series data in one domain translates into another. so maybe that's why? HRT's Marc Khoury goes into pretty good detail here on their thinking of tokenizing trading data. https://slideslive.com/39043823/foundation-models-for-automated-trading
Interesting that the core bet here is treating market data as a 'language' to learn from scratch. What I'm curious about is the trade off: pre-training from zero vs. continued pre-training on an open-source base, then fine-tuning for trading objectives. The compute costs are orders of magnitude apart, but perhaps I'm failing to consider that HFs can go through expensive training runs so...
Great article.
I think the costs are high even for HF but I'm not sure that training on time series data in one domain translates into another. so maybe that's why? HRT's Marc Khoury goes into pretty good detail here on their thinking of tokenizing trading data. https://slideslive.com/39043823/foundation-models-for-automated-trading
Transformer models keep beating earlier architectures across more domains, so it’s hard to see GPU demand slowing.