AI News, Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python

Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python

We can approach prediction task using different methods, depending on the required quality of the prediction, length of the forecasted period, and, of course, time we have to choose features and tune parameters to achieve desired results.

As an example let’s use some real mobile game data on hourly ads watched by players and daily in-game currency spent: Before actually forecasting, let’s understand how to measure the quality of predictions and have a look at the most common and widely used metrics Excellent, now we know how to measure the quality of the forecasts, what metrics can we use and how to translate the results to the boss.

Let’s start with a naive hypothesis — “tomorrow will be the same as today”, but instead of a model like ŷ(t)=y(t−1) (which is actually a great baseline for any time series prediction problems and sometimes it’s impossible to beat it with any model) we’ll assume that the future value of the variable depends on the average n of its previous values and therefore we’ll use moving average.

Smoothing by last 4 hoursplotMovingAverage(ads, 4) Smoothing by last 12 hoursplotMovingAverage(ads, 12) Smoothing by 24 hours — we get daily trendplotMovingAverage(ads, 24) As you can see, applying daily smoothing on hour data allowed us to clearly see the dynamics of ads watched.

And now let’s take a look at what happens if instead of weighting the last nn values of the time series we start weighting all available observations while exponentially decreasing weights as we move further back in historical data.

Until now all we could get from our methods in the best case was just a single future point prediction (and also some nice smoothing), that’s cool but not enough, so let’s extend exponential smoothing so that we can predict two future points (of course, we also get some smoothing).

We’ve learnt to predict intercept (or expected series value) using previous methods, and now we will apply the same exponential smoothing to the trend, believing naively or perhaps not that the future direction of the time series changes depends on the previous weighted changes.

Now we get a new system: Intercept now depends on the current value of the series minus corresponding seasonal component, trend stays unchanged, and the seasonal component depends on the current value of the series minus intercept and on the previous value of the component.

Then using cross-validation we will evaluate our chosen loss function for given model parameters, calculate gradient, adjust model parameters and so forth, bravely descending to the global minimum of error.

The question is how to do cross-validation on time series, because, you know, time series do have time structure and one just can’t randomly mix values in a fold without preserving this structure, otherwise all time dependencies between observations will be lost.

That’s why we will have to use a bit more tricky approach to optimization of the model parameters, I don’t know if there’s an official name to it but on CrossValidated, where one can find all the answers but the Answer to the Ultimate Question of Life, the Universe, and Everything, “cross-validation on a rolling basis” was proposed as a name.

Now, knowing how to set cross-validation, we will find optimal parameters for the Holt-Winters model, recall that we have daily seasonality in ads, hence the slen=24 parameter In the Holt-Winters model, as well as in the other models of exponential smoothing, there’s a constraint on how big smoothing parameters could be, each of them is in the range from 0 to 1, therefore to minimize loss function we have to choose an algorithm that supports constraints on model parameters, in our case — Truncated Newton conjugate gradient.

If you take a look at the modeled deviation, you can clearly see that the model reacts quite sharply to the changes in the structure of the series but then quickly returns deviation to the normal values, “forgetting” the past.

We’ll apply the same algorithm for the second series which, as we know, has trend and 30-day seasonality Looks quite adequate, model has caught both upwards trend and seasonal spikes and overall fits our values nicely Before we start modeling we should mention such an important property of time series as stationarity.

If the process is stationary that means it doesn’t change its statistical properties over time, namely mean and variance do not change over time (constancy of variance is also called homoscedasticity), also covariance function does not depend on the time (should only depend on the distance between observations).

After attaching the last letter we find out that instead of one additional parameter we get three in a row — (P,D,Q) Now, knowing how to set initial parameters, let’s have a look at the final plot once again and set the parameters:tsplot(ads_diff[24+1:], lags=60) Now we want to test various models and see which one is better Let’s inspect the residuals of the modeltsplot(best_model.resid[24+1:], lags=60) Well, it’s clear that the residuals are stationary, there are no apparent autocorrelations, let’s make predictions using our model In the end we got quite adequate predictions, our model on average was wrong by 4.01%, which is very very good, but overall costs of preparing data, making series stationary and brute-force parameters selecting might not be worth this accuracy.

That means some of the models will never be “production ready” as they demand too much time for the data preparation (for example, SARIMA), or require frequent re-training on new data (again, SARIMA), or are difficult to tune (good example — SARIMA), so it’s very often much easier to select a couple of features from the existing time series and build a simple linear regression or, say, a random forest.

Lags of time series, of course Window statistics: Date and time features: Target encoding Forecasts from other models (though we can lose the speed of prediction this way) Let’s run through some of the methods and see what we can extract from our ads series Shifting the series n steps back we get a feature column where the current value of time series is aligned with its value at the time t−n.

If it’s undesirable to explode dataset by using tons of dummy variables that can lead to the loss of information about the distance, and if they can’t be used as real values because of the conflicts like “0 hours <

This problem can be approached in a variety of ways, for example, we can calculate target encoding not for the whole train set, but for some window instead, that way encodings from the last observed window will probably describe current series state better.

Second model — Lasso regression, here we add to the loss function not squares but absolute values of the coefficients, as a result during the optimization process coefficients of unimportant features may become zeroes, so Lasso regression allows for automated feature selection.

First, make sure we have things to drop and data truly has highly correlated features We can clearly see how coefficients are getting closer and closer to zero (thought never actually reach it) as their importance in the model drops Lasso regression turned out to be more conservative and removed 23-rd lag from most important features (and also dropped 5 features completely) which only made the quality of prediction better.

As a good example SARIMA model mentioned here not once or twice can produce spectacular results after due tuning but might require many hours of tambourine dancing time series manipulation, as in the same time simple linear regression model can be build in 10 minutes giving more or less comparable results.

Excel - Time Series Forecasting - Part 1 of 3

Part 2: Part 3: This is Part 1 .

How to perform exponential smoothing in Excel 2013

Visit us at: for more videos and Excel/stats help

Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data Science | Simplilearn

This Time Series Analysis (Part-2) in R tutorial will help you understand what is ARIMA model, what is correlation & auto-correlation and you will alose see a use ...

8. Time Series Analysis I

MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013 View the complete course: Instructor: Peter ..

FORECAST.ETS Function (Exponential Triple Smoothing) in Excel

This video demonstrates how to predict values with exponential triple smoothing using the FORECAST.ETS function in Microsoft Excel 2016. The “Forecast ...

Forecasting Trend and Seasonality

Using dummy variables and multiple linear regression to forecast trend and seasonality.

Maths Tutorial: Patterns and Trends in Time Series Plots (statistics)

VCE Further Maths Tutorials. Core (Data Analysis) Tutorial: Patterns and Trends in Time Series Plots. How to tell the difference between seasonal, cyclical and ...

Time Series

Time Series Analysis is used to identify characteristics and predict future values from a set of data observed with the passing of time. B-Box™ contains the ...

Create a predictive time-series forecast for planning data: SAP Analytics Cloud (version 2017.15.3)

In this video tutorial, you'll create a predictive time-series forecast to predict how planning values are going to trend based on historical data.

Tableau and R Forecasting

Tableau 8.1 includes enhancements to Tableau's native forecasting capability as well as the ability to connect Tableau with R for complex forecast analysis.