Member commentary: A perspective on Numerai
Luca Taroni is a member of our Exponential Do community. He shared the below take on Numerai with us in the Exponential Do Slack, and I thought it too good not to be shared more widely. Enjoy.
Hi, Luca Taroni here. Given my background in algorithmic trading and inspired by the announcement of the Numerai podcast, I’ve decided to dive deeper into Numerai. Here is a recap (the model, the dataset, market neutrality, users and incentives) and why I think this is a great project from a trader’s perspective, but not so great for users.
An ensemble-learning based meta-model
The core of the project is the proprietary meta-model, the notion building upon ensemble learning, which is basically machine learning with multiple models. In its simplest form, we can use multiple instances of the same algorithm with different random seeds or data slices, and average them to achieve more robust and less overfitted results. In this case we only require each model to have a predictive ability higher than 50% (randomness). As meta-models get more complex we can use different kinds of algorithms, and we no longer strictly need that each contributing model does better than randomness. We do need each model to positively contribute to the overall meta-model though.
That’s exactly what Numerai is trying to do: they are more and more focused on rewarding the marginal contribution of each model to the meta-model. Let's say we have a crappy model which is outstanding only in rare, extreme volatility, market conditions. On average this model runs at a heavy loss, but if the meta-model is smart enough to activate it only in the appropriate market conditions, we have a positive marginal contribution. Ensemble learning thrives on marginally successful models - they have to be diverse. Diversity is core. You want your contributing models to be orthogonal or as orthogonal as they can be.
Numerai uses this approach to manage a fund of financial time series across countries, markets and sectors. Their track record shows success and is improving over time.
Dataset
In machine learning, the learning dataset is of the utmost importance. A clean, curated, coherent dataset to learn from is the necessary premise for every successful pattern discovery process/pipeline. What Numerai does here is new and, as far as I know, unprecedented as well: in order to provide an expensive and generally not freely distributable universal stock market dataset to every potential algo contributor, they normalised and anonymised a whole universe of world financial time series, financial data and indicators. Those are provided for free to model builders who use the data to train their own models. But there is a huge caveat: The data is obfuscated, users don’t know on which time series/indicator they are working on, nor do they know the timespan they are working on. They just work on a massive amount of chunks of obfuscated time series.
Market neutrality
Numerai is as well, at the moment, trying to build a market neutral model. This means they don’t want to be exposed to specific countries, indexes or sectors. This is their way to assess whether their meta-model is able to extract actual alpha from the data, which means to assess whether the model actually has an edge independently from market conditions. To achieve that, the meta-model coordinates user provided models (forecasts actually) and keeps a hedged position (you might for instance be both long and short US energy sector, in dollar terms, by taking long positions on some specific energy stocks with some models, and short positions on different US energy stocks with different models). They try to achieve this neutrality across countries, markets, sectors - across everything possible. Definitely cool. And if you are a user? (Strategy provider)
Users and incentives
Numerai provides encrypted, anonymised financial data to users.
Users build their own model.
Users upload their predictions to Numerai which provides feedback over a few metrics:
The actual ability of the model to forecast what it is trained to forecast through a simple correlation (though they don’t know which assets they are forecasting).
The true contribution (TC) defined as the gradient of the portfolio return with respect to the amount staked by the user (more on staking later).
The meta model contribution (MMC). Some sort of marginal contribution to the whole portfolio.
Neutral Correlation, a measurement of how likely the model is to be overfitted with a small amount of features rather than being a diversified, robust model.
Once users get an assessment over the potential (hypothetical, “past performance is no guarantee” etc..) of their model, users buy NMR tokens (Numerai's own tokens) and stake them, linking the staking process to a specific model's forecasts.
Users can choose whether they want to be rewarded (in NMR tokens, let's remember this) by their model’s ability to forecast their targets (correlation, the easy one) or by their model’s TC. If their forecast is successful after a month, they get rewarded depending on how successful those forecasts are, and on how much NMRs they decided to stake. The same happens when choosing the TC metric, with the possibility to leverage rewards a bit.
If the model is not successful, users' tokens will be burned, so they lose money.
This is what happens if they use obfuscated data. As of recently there's a new option: if users purchase their own data, they might decide to upload specific forecasts for specific time series (let’s say Google, Apple, SP500 etc.). In that case they can again choose to be paid according to the correlation between their forecasts and/or the actual results, or by the MMC metric discussed previously, but mind you, the payment will again be proportional to the NMR token staked and will be in NMR tokens.
What I like
Using meta-models is nothing new.My first commercial model was a meta-model, and deep learning was not even around - yes I’m that old! Having trader communities join forces is nothing new as well. Proposing a blockchain token and a rather transparent mechanism to reward people contributing with their models to the proprietary meta-model is, as far as I know, something new. Providing reliable, universal, normalised data in order to achieve this is definitely another novelty, with lots of caveats though.
Having achieved a positive track record with their real fund, actually an improving performance so far is an outstanding success. Financial time series forecasting is the hardest job out there. Academically, it’s still debated whether there is a theoretical possibility to forecast any market. Market conditions change constantly. What is once a successful strategy suddenly implodes and you find yourself starting from scratch. It’s like building a skyscraper on quicksand, but it’s a fascinating, intriguing, thought provoking, addictive endeavour.
Challenges
I'm not a fan of Amazon's Mechanical Turk. It's just my opinion, but I find it exploitative. The skill set required to participate in Numerai is not trivial, but I wonder whether, conceptually, there's a substantial difference in the role of those who contribute their work to Numerai compared to Amazon’s Mechanical Turk. Actually they might be a bit worse off. Why?
Users spend tens to hundreds of hours designing a decent model. Let's imagine it works - it does better than random after costs and commissions (a non-trivial task at all). They face a few hurdles, major ones in my perspective:
In order to earn any money, you need to buy the native NMR tokens, which has no utility aside from being used this way, and invest those tokens in your model by staking. I see a fundamental flaw here as an unacceptable risk source pops in your investment strategy. Numerai claims that the average return of their model contributors is 12% a month. Looks nice, but what if I told you that in the last year the NMR token lost 73% of its value? Will it make any sense to make such an effort to build a successful model and earn returns in a coin that is as volatile as any minor cryptocurrency? I wouldn’t do it.
Let’s imagine you don’t care about money and risk sources. You do it for the sake of the scientific challenge. If your model is successful, you earn some tokens whose value is, at best, unpredictable, but you created a working model! Well, you did, but you don't know what it works against as the data is anonymised. Basically you can't reproduce your own work outside of Numerai.
If you resort to using your own data and paying for those data, why wouldn’t you invest directly using one of the many automated investing platforms out there, with the benefit of being paid in a stable currency? Why use Numerai?
I believe that business-wise this is a clever and successful project, however it has too many of the characteristics of web2 rather than web3. (Yes I’m using an idealised vision of web.X, for the sake of simplicity).
People contribute their ideas and help create a truly orthogonal and diversified portfolio strategy. Though they don’t benefit from the actual gains of the fund, they might just benefit from potential rewards (or losses) according to the NMRs they risk on their own model. Rewards in a crypto token which exist because of them buying it, with tokenomics they don't control, which has no other value or use case. Yes they could hedge against it, but that comes with costs, time and a further counterparty risk source in their strategy. The real alpha of the project, on the other hand, lies in the clever and complex design of the meta-model, though that is completely obfuscated to the contributor. Numerai is slowly moving towards reward metrics that are correlated with the marginal contribution of each model to the bigger meta-model. This makes perfect sense in Numerai’s perspective. Strategy providers though will find those metrics more and more obscure. This is already happening - a quick look at the forums would confirm that.
In the end, as an algo/quant trader I consider this a terrific project. As a web3 naïve enthusiast this is web2, a chimaera: A “Mechanical Facebook”.
*Incidentally, I stumbled into an outstanding algo-quant-trading book referenced in the Medium articles by Numerai, the first quality content in years on actual, serious algo finance. If you are a quant you might be aware of most of the content, but there might be a couple of things you’ve missed and they do have value. The discussion over the pitfalls of covariance matrices in portfolio optimisation is a must read: Marcos Lopez De Prado, “Advances in financial machine learning”.