Systematic Portfolio Analysis and Construction with Time-series Data
A Guide to Successful Forecast Generation
Some of you might be familiar with traditional forecasting methods. A lot of common practice models extrapolate over time, taking signals in a dataset to learn a function extrapolates it out into the future. The goal is to use all the available data that we have from the history of the data set and predict a single point in the future.
Here, we’ll focus on looking at multiple inputs, single outputs, time-series problems, and then how they can be generalized further as a data-driven approach to predictive modeling, signal discovery, and regression. We'll cover a multivariate time-series regression approach, which takes a slightly different stance and, instead of predicting arbitrarily into the future, focuses on one singular time point or multiple time points into the future to try and predict those things and predict them well.
We'll address time-series and consider how the signals that we choose have a large influence.
Most importantly, we'll examine the importance of good signals to predict uncertainties and touch upon a few uncertainty-based prediction methods based on looking on certainties provided by models rather than backward-looking.
So what do we mean by time-series data?
In the world of securities, we could be dealing with assets, returns, commodities—anything that varies over time, irrespective of the data cadence forms.
We call this a time-series.
A univariate time-series only varies one parameter over time. So a good example of this could be the price of gold over time. As we go into the land of multivariate time-series, we extend our data to encapsulate more than one variable over time.
With gold prices, we have some fundamental data we can see. The price four weeks ago is a feature. The rolling variance is a feature. Any technical indicator is a feature. And so on.
With time-series regression, we first transform the problem from a time-series problem into a regression type of problem. This would form an observation data point we would use to train our model.
So let's move on to feature generation and what we mean by feature generation in a time-series sense.
Feature generation is all about being able to encapsulate, in the case of time-series, relevant and discovered information about the history of the data such that it maps to a single row in a dataset.
Developing a strategy for a data-driven time-series forecasting approach, we almost always start with some sort of a universe selection of instruments or securities.
We offer model backtests to get indications of how good the discovered signals are and to see whether we can actually extract value signal and predictive power out of the time-series data.
A multimodal time-series regression approach generates bespoke feature sets and signal sets for each point in the future-- through model backtesting to test historical performance through to predicting into the future.
Feature generation can be approached differently depending on each domain. We generate features en masse, going through many stages to avoid overfitting and ballooning of dimensionality.
Some common ways involve: detecting periods where there are sort of high seasonality, doing various scans, and statistics such as exponential, weighted moving averages.
With data coming in over time, we have numerous points in time representing the history of that data. We likely don't need all those points in order to be able to accurately reconstruct it. Detecting historical points of interest can be thought of in terms of how we choose as few points as possible to give an indication of the information contained within a signal.
Questions that must be considered at this point:
- How do we go from an initial selected universe of instruments?
- How do we create predictive models that predict each one either independently or jointly?
- How do we discover signals that are able to offer predictive information about it?
- How do we backtest our models?
- And ultimately, how might we start to think about it in a joint sense and build a predictive uncertainty based strategy around forecasting?
Uncertainty is provided by variances and the light, so it will often be conceptual.
Before we do any sort of portfolio construction, at least in this approach, we will predict each relevant data source independently. We'll formulate a data set, and then create a model that is able to forecast the point prediction and uncertainty. A lot of focus around this step must be on generating signals that are able to generate good uncertainties through robust subset selection.
Now, we actually want to predict a portfolio, so that we can perform eventual allocation, repeating this process for every security or index in our data set. We construct a dataset of historical time-series data that encapsulates fundamental and technical information about a number of different commodities.
In the case of gold data, we'll deal with roughly thirteen years worth of weekly data, just to get an idea of sizes.
Before we go onto signal or feature generation, we should address data conditioning. With any sort of machine learning model, it's important to make sure that that data has been processed appropriately for the model at hand.
The number of times we, across our company, see people make mistakes on this, or at least need guidance on this using very complex models with ill-defined data, it just means we should highlight the importance of data conditioning.
There are four things to discuss with time-series specifically. The first is detrending.
If we're just taking prices, they'll have various trends. It's really a good idea to predict trends directly, especially if we're dealing with models that are highly nonlinear. We want to go through a detrending step and predict something that's being detrended.
In the simplest case, that might be predicting returns rather than predicting real prices, but then if we look at complicated models such as ETS and various other exponential families or models, they actually go about explicitly removing the trends to predict the differences later downstream, knowing that at least that data will not have any persistent linear or exponential trends.
To further data conditioning, we should also talk about stationarisation.
Stationarisation is incredibly important for nonlinear models. Nonlinear models are often very bad at extrapolating two values they have never seen before. Specifically with time-series data when we have underlying statistics such as mean and variances that fluctuate over time, we place an assumption on the sort of lookback window that we'll calibrate to. We want to do everything we can to make the statistical drivers of the signal as time independent as possible. There are a number to measure stationarity.
Make sure to remove as much time dependence in the statistical signals as possible, and with stationarisation we can attempt to do that.
Data alignment can often arise as problematic. With technical indicators, fundamental indicators if we're combining it with alt data, for example, the data is very unlikely to all come in at the same time. You might have some which is weekly, as well as some which is daily or hourly.
It's important to align your data appropriately by a forward filling or in resampling or more sophisticated methods. Similarly with encoding, often data comes in strange formats—tweets, for example. We need to encode data into either a binarised or an alternative method that a machine learning model can take in.
Now one of those streams can generate thousands of signals, far too many for any sort of robustness. Also, there would be redundancies beyond what most machine learning models can diminish or manage.
There are a number of different approaches to reduce these—methods ranging from correlations, covariances, and information based approaches. It depends on the domain and the data, but often you can optimize and find the best approach for initial filtering to reduce signals down to something more manageable.
One thing that's incredibly important, especially if we want to get any idea of future uncertainty in terms of predictions, is that we need to make sure our input signals going into the model are sparse.
We want to be able to select a range of signals that are both stable and predictive as everything we have done so far has been totally unbiased in that it's not taken into account how good these signals are at predicting or even predicting volatility.
This is actually a very computationally intensive task in many cases. Internally, we use an exhaustive feature set optimization. As we go from the hundreds into the dozens, the overall aim is to reach signals that are not only good at predicting, but also uncorrelated with very few redundancies. To accomplish this, select signals which are inherently predictive rather than allow that burden to be put on the model itself.
So in the case of gold, if we're trying to predict the return of gold tomorrow, and we're looking back three years, that should hopefully generate signals that raise questions if they're incorrect.
That's the overlap between how we can adopt a fundamental view on the signals, especially if we're not using super complicated models to make the predictions and we favor sort of well principled and formulated feature sets.
Once we've got these signal sets, how do we test if it's any good?
Much of the work around time-series regression is generating these features such that we encapsulate the time-series problem as a regression problem. We can actually train the model that we use in a general regression sense.
Take our feature sets, which we have carefully selected, not just in terms of predicting the future, but ones that we have selected that are good at predicting the future at a given time point, and put those into a regression model. If we want to get predictive uncertainties out, we should make sure the model is able to form a well principled, uncertainty model.
There are a number of things we need to consider when modeling, and that's not just the capability of the model itself to provide uncertainty estimates, but actually how good was our signal set to begin with. Often, if we start off with a good signal set, the model itself, as long as it's well principled, doesn't need to be complex.
Taking gold as an example, the outputs of this model will be the predictive uncertainty on the predicted return of gold, and we will get that as a point prediction for any given point in time.
How do we test this model on historical, real-world data, remaining cautious against overfitting or anything that will mislead us?
The crucial part here in the concept of model backtesting is different from strategy backtesting. We test how well the model performs at predicting with the given signal sets on historical data that we’ve seen before. More specifically, track against historical data that was not used to generate the initial signals in the first place. Usually around 25-30% of your historical data is a good starting point for backtesting.
Any good model backtesting metric eventually should incorporate not just the error term, but also any relevant, uncertainty estimates around it.
With anything that we do, we want to know exactly why a model does what it does, even if it’s just a single output. In fact, if we're going on uncertainty, it's even important that we know this.
There are three ways to analyze the influence of the generated signals, specifically the smaller signal sets. A factor-based approach is probably the simplest of the lot, but it is also the most unbiased.
When we generated the thousands of features, we did some initial filtering to reduce them, not necessarily into the robust subsets that we knew were independent, but rather initial subsets in the ballpark of hundreds. We can then filter by looking at the ones which have the highest correlation or mutual information with the target points in the future.
Considering the gold return for one week in the future, we would actually map them back to the original time-series we uploaded.
That's incredibly useful for identifying influences that might actually be important factors for all predictive models, even if we separate predictive models and signal sets for our individual forecasts.
We then move on to signal-based influences—the robust subsets we were talking about earlier. We should be able to believe conceptually that the signals going into a model are not only independent, but also make fundamental sense. If it raises any alarm bells, we should surface that, and we certainly shouldn't assume that everything is fine.
Depending on the methods use to collect a robust subset of signals, this signal-based analysis could be biased or unbiased in the eyes of the model.
If we've used a model to get to that subset, of course, it is biased. If we're doing some other sort of statistical approach, or an unsupervised approach, then it's much less biased.
It’s worth looking at model-based signal influences. Once we train a model, what is the model-centric view of the signal importances? Just because we fed five signals into a model doesn't mean they're all going to be equally as important. In fact, if it's a linear model, the outputs might even be just the relevant correlation factors with the target.
Central to this is the importance of the signals through the eyes of the predictive algorithm. It’s a biased method, but ultimately it tells us the decisioning of the model itself. It’s important that we monitor that as well for algorithms complex enough that should be reported.
Successfully generated forecasts
That completes our end-to-end in terms of generating forecasts. Through our robust method of generating features, filtering features, selecting features through multidimensional ways, we can come up with estimates with the right algorithms.
From a practitioner perspective, the takeaway here is that every predictive model should generate forward-looking uncertainty estimates. As long as we know that these uncertainty estimates came from signals that were statistically robust, uncorrelated, and independent, we can assume through the backtests, that the uncertainties will prove to be accurate and useful.
That's arguably better at predicting variance than any sort of historic-based approaches. But, of course, this is all just for one indicator. The final crucial point is that we need to make sure that our predictive uncertainty is updated correctly with every prediction for every security. Whether that's done jointly as a bigger model or independently, the most important thing is having good signal sets at the start. We want to repeat this for every security that we're trying to predict.
We can generate a multi-input, multi-output forecast set with robust uncertainty estimates and see good returns over the last few years.
There are many merits in predictive uncertainties as a whole. If there's one takeaway message I would like to share, it’s that however we slice things, it all comes down to selecting robust signals.
Any type of forecast model should, in the end, be explainable.
READ THE CASE STUDY
I'm the product owner for Mind Foundry Horizon, which is our time-series offering for the quantitative finance user. We do three things in Horizon, ranging from signal engineering where you can upload your time-series later. Whether that's financial data, old data, and anything that varies over time really, it will automatically allow you to either generate your own features or scan and select good statistically significant features to help in your predictive modeling. Whether you're quantitative or fundamental, we can generate signals that suit the relevant use case where you do forecasts.
How do you address overfitting in the context of predictive uncertainty?
Overfitting is obviously a big problem and not just with time-series, but these sciences and machine learning as a whole. Predictive uncertainties are not going to be of any use or usefulness if your model doesn't either have good protections against overfitting or alternatively, you've not selected your signals well enough such that they aren't too redundant.
There are a number of ways in which we can protect against overfitting and we've gone through at least three stages to reducing down dimensionality. Crucially data separation is important. Being able to select good uncorrelated signals is important. And thirdly, utilize a model with good shrinkage.
Could neural networks be used in this type of approach?
Certainly we've seen a lot recently around sort of LSTM based approaches to forecasting, which can often be great. The places in which neural networks would fit best within this approach are actually in the feature discovery itself. That is to say, if we can use a neural network to generate good cross-sectional features, we don't need to rely on a complex neural network to make the predictions themselves. Part of that is because it's harder to get good error bounds out of neural networks, though it is possible. It very much depends on the problem that you're looking to solve.
From my experiences in the past, I've found that with this type of data where you don't need to look back too far in a lot of cases, and you're often relying on a good isolation between signals, from the transparency and explainability standpoint, I'd rather go for complex signals and humble models than vice versa.
How do you merge factors and signals in the model and the analysis?
Consider cross-sectional inputs, that are inherently modeled by uploading or by processing the cross-sectional dimensions. In terms of cross security cross assets, we'll be able to model that. We then have the concept of how we model the predictive covariance or the historical covariance in a model. A more traditional panel type approach would be required to do that. With fixed effects or random effects at one end, we can combine those with sophisticated feature generation to have signals that take into account a full cross-section of data, both temporally and in terms of parameters. In short, we've seen comparable performance between the two, certainly in terms of predicting returns. The long term direction that will yield the best results is combining panel methods for feature generation rather than panel methods for the prediction itself.
Continue your journey in Part 2.