DEV Community

Brandon Lau
Brandon Lau

Posted on

Timely Considerations

Forecasting is one of the more complicated challenges to tackle when it comes to data, and before me lays such a challenge, sprawled out on the python embroidered divan of my notebook like one of Leo's proverbial French girls.
Alt Text
But unlike our maritime Romeo time series forecasting isn't as simple as putting charcoal to paper to describe the dimly lit subject before you. There are a number of difficulties that sequentially dependent data presents that simpler datasets aren't concerned with.

Let's take a look at an example of such a series in the ever intriguing Bitcoin data.
Alt Text

This sort of graph isn't unfamiliar to anyone who's ever been in a math class, and it turns out that much of the data in the world has properties that are time dependent. The graph shows a significant amount of volatility, trends in both the positive and negative, and no clear periodicity. How would one tackle a forecasting problem using data such as this?

Because of the auto-correlation between the current observation and past observations there are several features that become very important to approaching time series:

  • Level - What are the baseline values of my series if it were "flattened?"
  • Trend - Is there an overall positive/negative drift to my data over time?
  • Seasonality - Are there cyclical patterns in my data, either regular or irregular?

Possible Approaches

Model-less

The most naive take is to make predictions off of a model-less approach. Simply take the mean/median (depending on the structure and volatility of your data) at certain times in the past to make predictions for similar times in the future. While simplistic this approach often provides reasonable results and can actually be difficult to surpass with more complicated methods.

ARIMA and its offspring

One of the simplest models for time series modeling is the ARIMA model. ARIMA stands for auto-regressive integrated moving average and is fairly adept at handling simple time-series. However ARIMA makes several assumptions that may not be true of real world data, namely:

  • The data is stationary
  • The data is univariate

It also helps if your data is relatively clean of missing values and extreme outliers. Obviously most real world data violates these assumptions to one degree or another, though there are ways to manipulate your data to accommodate said violations. For stationarity you can difference the model, subtracting previous terms from current terms to remove seasonality and trend. For volatile data like the Bitcoin shown above you may be able to break the data set into smaller periods and apply ARIMA over sub-periods and later combine to provide better overall prediction. There are variations on the basic ARIMA structure such as SARIMA (seasonal-ARIMA) that attempt to account for data that does not naturally follow the assumptions of the base model.

Nets

As the popularity of neural nets continues to grow it is only natural that they would be applied to time-series. As it turns out there are several fields where NN's are the best approach, or provide significant boosts over classical approaches. Neural nets have the potential to interpret structures and meaning in our data that may be difficult to extract using classical methods.

  • Language - Sentences and the meaning of the words within them are inherently dependent on sequence, and neural nets have proven a powerful tool in capturing that information in tasks such as translation and word generation
  • Audio - Whether it be music, sounds, or spoken word, audio data is also sequentially dependent. The notes that precede each-other determine or at least heavily influence the notes that come after.
  • Visual - In the field of computer vision images are processed as sequences of encoded pixels, transforming something that may not intuitively be a sequence into a time series. The order of the pixels determine the overall "meaning" of an image, and to interpret them out of order would ignore much of the inherent information in a picture. Neural nets are integral to this field of study as classical methods were previously unable to make much headway in the past.

While a well constructed neural net may be able to provide results in any given application they are both computationally and temporally expensive approaches and may be wholly unnecessary if they provide only slight improvements over much simpler methods.

Considerations

As with any modeling approach there are less obvious factors that one must take into account when dealing with time series.

How much data do I have to work with?
Depending on the density/quantity of your data and how far in the future you are attempting to predict this question can be vital. Generally the more data you have at your disposal and the smaller the horizon of your prediction the less of an issue this will be, but in the real world you are often forced to work with limited data and may be required to make predictions far enough out that you may feel it better to just pray to RNG-sus for the answer. This can also make the traditional train-test split used to evaluate the efficacy of a model a significant challenge

What's the uncertainty of my model?
Sometimes the accuracy of a prediction is less important than the confidence you have in that prediction. Spot on predictions are nearly impossible in the real world given the nigh infinite influences that may be present within or outside of your data, and so the uncertainty of your predictions because an important detail. In the financial world the uncertainty is sometimes even the focus of your modeling as opposed to a point forecast.

How often do I need to retrain my model?
Many time-series problems involve data that is highly variable and shows significant differences in behavior over time. Image data is one instance where this is less of a factor, as most objects maintain their overall appearance over time (cats twenty years ago look much like cats today, and probably cats 20 years in the future). However financial data does not typically have this feature. This requires that effective models be retrained on new data on a regular basis. In order for the model to correct its outputs to reflect changes in behavior or new external pressures it will need to see up to date info that wasn't present in your original training. Otherwise you end up with predictions that eventually shift to a constant, like this:
Alt Text

All that to be said, time-series are a relatively complicated challenge to tackle and requires a different set of approaches that non-time series datasets may not require.

Top comments (0)