AI News, 7 Ways Time-Series Forecasting Differs from Machine Learning
- On Thursday, June 7, 2018
- By Read More
7 Ways Time-Series Forecasting Differs from Machine Learning
However, it is assumed that he or she has experience developing machine learning models (at any level) and handling basic statistical concepts.
It was a challenging, yet enriching, experience that gave me a better understanding of how machine learning can be applied to business problems.
In most cases, a prediction is a specific value, e.g., the kind of object in a picture, the value of a house, whether a mail is spam or not, etc.
You can think of this type of variable in two ways: If you have experience working in machine learning, you must make some adjustments when working with time series.
As a machine learning practitioner, you may already be used to creating features, either manually (feature engineering) or automatically (feature learning).
Is it also possible to combine time series with feature engineering using time series components and time-based features.The first refers to the properties (components) of a time series, and the latter refers to time-related features, which have definite patterns and can be calculated in a deterministic way.
Time series components are highly important to analyzing the variable of interest in order to understand its behavior, what patterns it has, and to be able to choose and fit an appropriate time-series model.
Both time series components and features are key to interpreting the behavior of the time series, analyzing its properties, identifying possible causes, and more.
You may be used to feeding thousands, millions, or billions of data points into a machine learning model, but this is not always the case with time series.
But in reality, there are some benefits to having small- to medium-sized time series: This does not mean that you will not be working with huge time series, but you must be prepared and able to handle smaller time series as well.
Some of these datasets come from events recorded with a timestamp, systems logs, financial data, data obtained from sensors (IoT), etc.
Since TSDB works natively with time series, it is a great opportunity to apply time series technique to large-scale datasets.
One of the most important properties an algorithm needs in order to be considered a time-series algorithm is the ability to extrapolate patterns outside of the domain of training data.
While this is a default property of time series models, most machine learning models do not have this ability because they are not all based on statistical distributions.
While evaluation metrics help determine how close the fitted values are to the actual ones, they do not evaluate whether the model properly fits the time series.
As you are trying to capture the patterns of a time series, you would expect the errors to behave as white noise, as they represent what cannot be captured by the model.
If the model you built is unbiased, the mean of the residuals will be zero or close to zero, and therefore the sum of the residuals will be close to zero:
Alternatively, you could choose to use thestandard deviation of the residuals as the sample standard deviation, allowing the confidence intervals tobe calculated using an appropriate distribution, like the normal or exponential.
For some models, e.g., neural networks, which are not based on probability distributions, you can run simulations of the forecasts and calculate confidence intervals from the distribution of the simulations.
When this occurs, it is preferable to first evaluate the impact, and then, if required, update the forecasts using recent data after the event has passed.
What Is Time Series Forecasting?
Time series forecasting is an important area of machine learning that is often neglected.
Understanding a dataset, called time series analysis, can help to make better predictions, but is not required and can result in a large technical investment in time and expertise not directly aligned with the desired outcome, which is forecasting the future.
In descriptive modeling, or time series analysis, a time series is modeled to determine its components in terms of seasonal patterns, trends, relation to external factors, and the like.
In contrast, time series forecasting uses the information in a time series (perhaps with additional information) to forecast future values of that series —
Time series analysis involves developing models that best capture or describe an observed time series in order to understand the underlying causes.
The primary objective of time series analysis is to develop mathematical models that provide plausible descriptions from sample data —
The purpose of time series analysis is generally twofold: to understand or model the stochastic mechanisms that gives rise to an observed series and to predict or forecast the future values of a series based on the history of that series —
This is often at the expense of being able to explain why a specific prediction was made, confidence intervals and even better understanding the underlying causes behind the problem.
Perhaps the most useful of these is the decomposition of a time series into 4 constituent parts: All time series have a level, most have noise, and the trend and seasonality are optional.
For example, they may be added together to form a model as follows: Assumptions can be made about these components both in behavior and in how they are combined, which allows them to be modeled using traditional statistical methods.
This adds an honesty to time series forecasting that quickly flushes out bad assumptions, errors in modeling and all the other ways that we may be able to fool ourselves.
Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python
We can approach prediction task using different methods, depending on the required quality of the prediction, length of the forecasted period, and, of course, time we have to choose features and tune parameters to achieve desired results.
As an example let’s use some real mobile game data on hourly ads watched by players and daily in-game currency spent: Before actually forecasting, let’s understand how to measure the quality of predictions and have a look at the most common and widely used metrics Excellent, now we know how to measure the quality of the forecasts, what metrics can we use and how to translate the results to the boss.
Let’s start with a naive hypothesis — “tomorrow will be the same as today”, but instead of a model like ŷ(t)=y(t−1) (which is actually a great baseline for any time series prediction problems and sometimes it’s impossible to beat it with any model) we’ll assume that the future value of the variable depends on the average n of its previous values and therefore we’ll use moving average.
Smoothing by last 4 hoursplotMovingAverage(ads, 4) Smoothing by last 12 hoursplotMovingAverage(ads, 12) Smoothing by 24 hours — we get daily trendplotMovingAverage(ads, 24) As you can see, applying daily smoothing on hour data allowed us to clearly see the dynamics of ads watched.
And now let’s take a look at what happens if instead of weighting the last nn values of the time series we start weighting all available observations while exponentially decreasing weights as we move further back in historical data.
Until now all we could get from our methods in the best case was just a single future point prediction (and also some nice smoothing), that’s cool but not enough, so let’s extend exponential smoothing so that we can predict two future points (of course, we also get some smoothing).
We’ve learnt to predict intercept (or expected series value) using previous methods, and now we will apply the same exponential smoothing to the trend, believing naively or perhaps not that the future direction of the time series changes depends on the previous weighted changes.
Now we get a new system: Intercept now depends on the current value of the series minus corresponding seasonal component, trend stays unchanged, and the seasonal component depends on the current value of the series minus intercept and on the previous value of the component.
Then using cross-validation we will evaluate our chosen loss function for given model parameters, calculate gradient, adjust model parameters and so forth, bravely descending to the global minimum of error.
The question is how to do cross-validation on time series, because, you know, time series do have time structure and one just can’t randomly mix values in a fold without preserving this structure, otherwise all time dependencies between observations will be lost.
That’s why we will have to use a bit more tricky approach to optimization of the model parameters, I don’t know if there’s an official name to it but on CrossValidated, where one can find all the answers but the Answer to the Ultimate Question of Life, the Universe, and Everything, “cross-validation on a rolling basis” was proposed as a name.
Now, knowing how to set cross-validation, we will find optimal parameters for the Holt-Winters model, recall that we have daily seasonality in ads, hence the slen=24 parameter In the Holt-Winters model, as well as in the other models of exponential smoothing, there’s a constraint on how big smoothing parameters could be, each of them is in the range from 0 to 1, therefore to minimize loss function we have to choose an algorithm that supports constraints on model parameters, in our case — Truncated Newton conjugate gradient.
If you take a look at the modeled deviation, you can clearly see that the model reacts quite sharply to the changes in the structure of the series but then quickly returns deviation to the normal values, “forgetting” the past.
We’ll apply the same algorithm for the second series which, as we know, has trend and 30-day seasonality Looks quite adequate, model has caught both upwards trend and seasonal spikes and overall fits our values nicely Before we start modeling we should mention such an important property of time series as stationarity.
If the process is stationary that means it doesn’t change its statistical properties over time, namely mean and variance do not change over time (constancy of variance is also called homoscedasticity), also covariance function does not depend on the time (should only depend on the distance between observations).
After attaching the last letter we find out that instead of one additional parameter we get three in a row — (P,D,Q) Now, knowing how to set initial parameters, let’s have a look at the final plot once again and set the parameters:tsplot(ads_diff[24+1:], lags=60) Now we want to test various models and see which one is better Let’s inspect the residuals of the modeltsplot(best_model.resid[24+1:], lags=60) Well, it’s clear that the residuals are stationary, there are no apparent autocorrelations, let’s make predictions using our model In the end we got quite adequate predictions, our model on average was wrong by 4.01%, which is very very good, but overall costs of preparing data, making series stationary and brute-force parameters selecting might not be worth this accuracy.
That means some of the models will never be “production ready” as they demand too much time for the data preparation (for example, SARIMA), or require frequent re-training on new data (again, SARIMA), or are difficult to tune (good example — SARIMA), so it’s very often much easier to select a couple of features from the existing time series and build a simple linear regression or, say, a random forest.
Lags of time series, of course Window statistics: Date and time features: Target encoding Forecasts from other models (though we can lose the speed of prediction this way) Let’s run through some of the methods and see what we can extract from our ads series Shifting the series n steps back we get a feature column where the current value of time series is aligned with its value at the time t−n.
If it’s undesirable to explode dataset by using tons of dummy variables that can lead to the loss of information about the distance, and if they can’t be used as real values because of the conflicts like “0 hours <
This problem can be approached in a variety of ways, for example, we can calculate target encoding not for the whole train set, but for some window instead, that way encodings from the last observed window will probably describe current series state better.
Second model — Lasso regression, here we add to the loss function not squares but absolute values of the coefficients, as a result during the optimization process coefficients of unimportant features may become zeroes, so Lasso regression allows for automated feature selection.
First, make sure we have things to drop and data truly has highly correlated features We can clearly see how coefficients are getting closer and closer to zero (thought never actually reach it) as their importance in the model drops Lasso regression turned out to be more conservative and removed 23-rd lag from most important features (and also dropped 5 features completely) which only made the quality of prediction better.
As a good example SARIMA model mentioned here not once or twice can produce spectacular results after due tuning but might require many hours of tambourine dancing time series manipulation, as in the same time simple linear regression model can be build in 10 minutes giving more or less comparable results.
Computational Intelligence and Neuroscience
This paper proposed a time-series forecasting model based on estimating a missing value followed by variable selection to forecast the reservoir’s water level.
In addition, this experiment shows that the proposed variable selection can help determine five forecast methods used here to improve the forecasting capability.
Previous studies of reservoir water levels have identified three important problems:(1)There are few studies of reservoir water levels: related studies [1–4] in the hydrological field use machine learning methods to forecast water levels.
Most of the water level forecasting of these flood stages collected the data about typhoons, specific climate, seasonal rainfall, or water levels.(2)Only a few variables have been used in reservoir water level forecasting.
It is difficult to determine the key variable set in the reservoir water level.(3)No imputation method used in datasets of reservoir water level: previous studies of water level forecasting in hydrological fields have shown that the collected data are noninterruptible and long-term, but most of them did not explain how to deal with the missing values from human error or mechanical failure.
The input layer is the set of source nodes, the second layer is a hidden layer high dimension, and the output layer gives the response of the network to the activation patterns applied to the input layer .
The use of radial basis functions results from a number of different concepts including function approximation, noisy interpolation, density estimation, and optimal classification theory .
The kNN algorithm is based on the notion that similar instances have similar behavior and thus the new input instances are predicted according to the stored most similar neighboring instances .
Random Tree is an ensemble learning algorithm that generates many individual learners and employs a bagging idea to produce a random set of data in the construction of a decision tree .
Filter models utilize statistical techniques such as principal component analysis (PCA), factor analysis (FA), independent component analysis, and discriminate analysis in the investigation of other indirect performance measures.
First, the proposed model used five imputation methods (i.e., median of nearby points, series mean, mean of nearby points, linear, and regression imputation).
Second, by identifying the key variable that influences the daily water levels, the proposed method ranked the importance of the atmospheric variables via factor analysis.
To identify that one that better fits with the imputation method, this paper utilized five imputation methods to estimate the missing values and then compared it with no imputation method to directly delete the missing value.
To determine this, the following steps were followed: (i)In order to rescale all numeric values in the range , this step normalized each variable value by dividing the maximal value for the five imputed datasets and then deleted the missing value dataset.
We also employed a 10-fold cross-validation approach to identify the imputation dataset that has best prediction performance.(iii)We utilized five forecast methods including Random Forest, RBF Network, Kstar, IBK (KNN), and Random Tree via five evaluation indices.
These include correlation coefficient (CC), root mean squared error (RMSE), mean absolute error (MAE), relative absolute error (RAE), and root relative squared error (RRSE).
This step could be introduced step-by-step as follows:(i)The imputed integrated datasets are partitioned into 66% training datasets and 34% testing datasets.(ii)Factor analysis ranked the importance of the variables.(iii)The variable ranking of factor analysis was used to iteratively delete the least important variable.
The remaining variables were studied with Random Forest, RBF Network, Kstar, KNN, and Random Tree until the RMSE can no longer improve.(iv)Based on the previous Step 3, the key variables are found when the lowest RMSE is achieved.(v)Concurrently, we used five evaluation indices (CC, RMSE, RRSE, MAE, and RAE) to determine which forecast method is a good forecasting model.
To verify the performance of the reservoir water level forecasting model, this step uses the superior imputed datasets with different variables selected to evaluate the proposed method.
The five criteria indices are listed as equations (1)–(5).where is the actual observation value of the data, is the forecast value of the model, and is the sample number.Correlation Coefficient (CC)where and are the observed and predicted values, respectively;
A detailed description is introduced in the following section.(1)To achieve better processing of the missing values dataset, this study applies series mean, regression, mean of nearby points, linear, or the median of nearby points’ imputation methods to estimate missing values.
After normalizing the six processed missing values datasets, we used two approaches to estimate the datasets: percentage spilt (dataset partition into 66% training data and 34% testing data) and 10-fold cross-validation.
This study utilizes the ordering of important variables and iteratively deletes the least important variable to iteratively implement the proposed forecasting model when the minimal RMSE is reached.
Finally, this study employs the Random Forest forecast method based on variable selection and full variables to build a forecast model for water level forecasting in Shimen Reservoir, respectively.(3)Model comparison: this study compares the proposed forecast model (using Random Forest) with the Random Forest, RBF Network, Kstar, IBK (KNN), and Random Tree forecast models (Tables 6 and 7).
After variable selection and model building, some key findings can be highlighted:(1)Imputation: after the two collected datasets were concatenated into an integrated dataset, there are missing values in the integrated dataset due to human error or mechanical failure.
Tables 3 and 4 show the integrated dataset that uses the median of nearby points, series mean, mean of nearby points, linear, regression imputation, and the delete strategy to evaluate their accuracy via five machine learning forecast models.
The results show that the integrated dataset that uses the mean of the nearby points’ imputation method has better forecasting performance.(2)Variable selection: this study uses factor analysis to rank the ordering of variables and then sequentially deletes the least important variables until the forecasting performance no longer improves.
After iterative experiments and variable selection, the key remaining variables are Reservoir_IN, Temperature, Reservoir_OUT, Pressure, Rainfall, Rainfall_Dasi, and Relative Humidity.(3)Forecasting model: this study proposed a time-series forecasting model based on estimating missing values and variable selection to forecast the water level in the reservoir.
These experimental results indicate that the Random Forest forecasting model when applied to variable selection with full variables has better forecasting performance than the listing models in the five evaluation indices.
The proposed time-series forecasting model with/without variable selection has better forecasting performance than the listing models using the five evaluation indices.
The key variables identified here could improve forecasting in these fields.(2)We might apply the proposed time-series forecasting model based on imputation and variable selection to forecast the water level of lakes, salt water bodies, reservoirs, and so on.
- On Monday, June 24, 2019
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Science Training | Edureka
Data Science Training - ) In this Edureka YouTube live session, we will show you how to use the Time Series Analysis in R ..
Time Series Analysis in Python | Time Series Forecasting | Data Science with Python | Edureka
Python Data Science Training : ** This Edureka Video on Time Series Analysis n Python will give you all the information you ..
Data Science - Part X - Time Series Forecasting
For downloadable versions of these lectures, please go to the following link: This lecture provides an ..
How to Predict Stock Prices Easily - Intro to Deep Learning #7
Only a few days left to signup for my Decentralized Applications course! We're going to predict the closing price of the S&P 500 using a ..
8. Time Series Analysis I
MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013 View the complete course: Instructor: Peter ..
How to Use Tensorflow for Time Series (Live)
Only a few days left to signup for my Decentralized Applications course! We're going to use Tensorflow to predict the next event in a ..
Nathaniel Cook - Forecasting Time Series Data at scale with the TICK stack
Description Forecasting time series data across a variety of different time series comes with many challenges. Using the TICK stack we demonstrate a workflow ...
Jeffrey Yau | Applied Time Series Econometrics in Python and R
PyData SF 2016 Time series data is ubitious, and time series statistical models should be included in any data scientists' toolkit. This tutorial covers the ...
Create Advanced Forecasting Models using Excel & Machine Learning
Learn how to develop your own customized forecasting models using advanced techniques in Excel based on real scenarios. You will learn about the new ...
SAP Predictive Analytics: Time Series Forecasting
In this video we show how SAP PA can be leveraged for making predictions of time-series such cashflows, sales volumes, etc. Armed with SAP PA, business ...