AI News, The Unofficial Google Data Science Blog

The Unofficial Google Data Science Blog

This could happen the way you suggest (seasonal), or it could be that time series with certain characteristics are better forecast by certain types of models, or it could be that different models do a better job at different points in the forecast horizon, or something else.

However, for our forecasting challenge, given the number and diversity of time series we faced, our risk-reward preferences, and our beliefs about the way our data arose, we preferred the approach described in the blog post.

How To Identify Patterns in Time Series Data: Time Series Analysis

In the following topics, we will first review techniques used to identify patterns in time series data (such as smoothing and curve fitting techniques and autocorrelations), then we will introduce a general class of models that can be used to represent time series data and generate predictions (autoregressive and moving average models).

Unlike the analyses of random samples of observations that are discussed in the context of most other statistics, the analysis of time series is based on the assumption that successive values in the data file represent consecutive measurements taken at equally spaced time intervals.

Detailed discussions of the methods described in this section can be found in Anderson (1976), Box and Jenkins (1976), Kendall (1984), Kendall and Ord (1990), Montgomery, Johnson, and Gardiner (1990), Pankratz (1983), Shumway (1988), Vandaele (1983), Walker (1991), and Wei (1989).

There are two main goals of time series analysis: (a) identifying the nature of the phenomenon represented by the sequence of observations, and (b) forecasting (predicting future values of the time series variable).

See also: As in most other analyses, in time series analysis it is assumed that the data consist of a systematic pattern (usually a set of identifiable components) and random noise (error) which usually makes the pattern difficult to identify.

The former represents a general systematic linear or (most often) nonlinear component that changes over time and does not repeat or at least does not repeat within the time range captured by our data (e.g., a plateau followed by a period of exponential growth).

For example, sales of a company can rapidly grow over years but they still follow consistent seasonal patterns (e.g., as much as 25% of yearly sales each year are made in December, whereas only 4% in August).

531) representing monthly international airline passenger totals (measured in thousands) in twelve consecutive years from 1949 to 1960 (see example data file G.sta and graph above).

If you plot the successive observations (months) of airline passenger totals, a clear, almost linear trend emerges, indicating that the airline industry enjoyed a steady growth over the years (approximately 4 times more passengers traveled in 1960 than in 1949).

This example data file also illustrates a very common general type of pattern in time series data, where the amplitude of the seasonal changes increases with the overall trend (i.e., the variance is correlated with the mean over the segments of the series).

The most common technique is moving average smoothing which replaces each element of the series by either the simple or weighted average of n surrounding elements, where n is the width of the smoothing 'window' (see Box

Thus, if there are outliers in the data (e.g., due to measurement errors), median smoothing typically produces smoother or at least more 'reliable' curves than moving average based on the same window width.

In the relatively less common cases (in time series data), when the measurement error is very large, the distance weighted least squares smoothing or negative exponentially weighted smoothing techniques can be used.

It is formally defined as correlational dependency of order k between each i'th element of the series and the (i-k)'th element (Kendall, 1976) and measured by autocorrelation (i.e., a correlation between the two terms);

The correlogram (autocorrelogram) displays graphically and numerically the autocorrelation function (ACF), that is, serial correlation coefficients (and their standard errors) for consecutive lags in a specified range of lags (e.g., 1 through 30).

Ranges of two standard errors for each lag are usually marked in correlograms but typically the size of auto correlation is of more interest than its reliability (see Elementary Concepts) because we are usually interested only in very strong (and thus highly significant) autocorrelations.

This implies that the pattern of serial dependencies can change considerably after removing the first order auto correlation (i.e., after differencing the series with a lag of 1).

Another useful method to examine serial dependencies is to examine the partial autocorrelation function (PACF) - an extension of autocorrelation, where the dependence on the intermediate elements (those within the lag) is removed.

However, in real-life research and practice, patterns of the data are unclear, individual observations involve considerable error, and we still need not only to uncover the hidden patterns in the data but also generate forecasts.

it is not easy to use, it requires a great deal of experience, and although it often produces satisfactory results, those results depend on the researcher's level of expertise (Bails

Most time series consist of elements that are serially dependent in the sense that you can estimate a coefficient or a set of coefficients that describe consecutive elements of the series from specific, time-lagged (previous) elements.

Independent from the autoregressive process, each element in the series can also be affected by the past error (or random shock) that cannot be accounted for by the autoregressive component, that is: xt = µ + t - 1*(t-1) - 2*(t-2) - 3*(t-3) - ...

Specifically, the three types of parameters in the model are: the autoregressive parameters (p), the number of differencing passes (d), and moving average parameters (q).

so, for example, a model described as (0, 1, 2) means that it contains 0 (zero) autoregressive (p) parameters and 2 moving average (q) parameters which were computed for the series after it was differenced once.

At this stage (which is usually called Identification phase, see below) we also need to decide how many autoregressive (p) and moving average (q) parameters are necessary to yield an effective but still parsimonious model of the process (parsimonious means that it has the fewest parameters and greatest number of degrees of freedom among all models that fit the data).

The estimates of the parameters are used in the last stage (Forecasting) to calculate new values of the series (beyond those included in the input data set) and confidence intervals for those predicted values.

For example, if the series is differenced once, and there are no autoregressive parameters in the model, then the constant represents the mean of the differenced series, and therefore the linear trend slope of the un-differenced series.

However, a majority of empirical time series patterns can be sufficiently approximated using one of the 5 basic models that can be identified based on the shape of the autocorrelogram (ACF) and partial auto correlogram (PACF).

For example, the model (0,1,2)(0,1,1) describes a model that includes no autoregressive parameters, 2 regular moving average parameters and 1 seasonal moving average parameter, and these parameters were computed for the series after it was differenced once with lag 1, and once seasonally differenced.

The main difference is that in seasonal series, ACF and PACF will show sizable coefficients at multiples of the seasonal lag (in addition to their overall patterns reflecting the non seasonal components of the series).

Different methods have been proposed to compute the SS for the residuals: (1) the approximate maximum likelihood method according to McLeod and Sales (1983), (2) the approximate maximum likelihood method with backcasting, and (3) the exact maximum likelihood method according to Melard (1984).

However, method 1 above, (approximate maximum likelihood, no backcasts) is the fastest, and should be used in particular for very long time series (e.g., with more than 30,000 observations).

Melard's exact maximum likelihood method (number 3 above) may also become inefficient when used to estimate parameters for seasonal models with long seasonal lags (e.g., with yearly lags of 365 days).

Another straightforward and common measure of the reliability of the model is the accuracy of its forecasts generated based on partial data so that the forecasts can be compared with known (original) observations.

However, a good model should not only provide sufficiently accurate forecasts, it should also be parsimonious and produce statistically independent residuals that contain only noise and no systematic components (e.g., the correlogram of residuals should not reveal any serial dependencies).

The major concern here is that the residuals are systematically distributed across the series (e.g., they could be negative in the first part of the series and approach zero in the second part) or that they contain some serial dependency which may suggest that the ARIMA model is inadequate.

The ARIMA method is appropriate only for a time series that is stationary (i.e., its mean, variance, and autocorrelation should be approximately constant through time) and it is recommended that there are at least 50 observations in the input data.

al., distinguish between three major types of impacts that are possible: (1) permanent abrupt, (2) permanent gradual, and (3) abrupt temporary.

simple and pragmatic model for a time series would be to consider each observation as consisting of a constant (b) and an error component (epsilon), that is: Xt = b + t.

If appropriate, then one way to isolate the true value of b, and thus the systematic or predictable part of the series, is to compute a kind of moving average, where the current and immediately preceding ('younger') observations are assigned greater weight than the respective older observations.

The specific formula for simple exponential smoothing is: St = *Xt + (1-)*St-1 When applied recursively to each successive observation in the series, each new smoothed value (forecast) is computed as the weighted average of the current observation and the previous smoothed observation;

if is equal to 0 (zero), then the current observation is ignored entirely, and the smoothed value consists entirely of the previous smoothed value (which in turn is computed from the smoothed observation before it, and so on;

(1982, Makridakis, 1983), has shown simple exponential smoothing to be the best choice for one-period-ahead forecasting, from among 24 other time series methods and using a variety of accuracy measures (see also Gross and Craig, 1974, for additional empirical evidence).

Obviously, looking at the formula presented above, should fall into the interval between 0 (zero) and 1 (although, see Brenner et al., 1968, for an ARIMA perspective, implying 0<<2).

After reviewing the literature on this topic, Gardner (1985) concludes that it is best to estimate an optimum from the data (see below), rather than to 'guess' and set an artificially low value.

Then is chosen so as to produce the smallest sums of squares (or mean squares) for the residuals (i.e., observed values minus one-step-ahead forecasts;

The most straightforward way of evaluating the accuracy of the forecasts based on a particular value is to simply plot the observed values and the one-step-ahead forecasts.

In addition, besides the ex post MSE criterion (see previous paragraph), there are other statistical measures of error that can be used to determine the optimum parameter (see Makridakis, Wheelwright, and McGee, 1983): Mean error: The mean error (ME) value is simply computed as the average error value (average of observed minus one-step-ahead forecast).

As compared to the mean squared error value, this measure of fit will 'de-emphasize' outliers, that is, unique or rare large error values will affect the MAE less than the MSE value.

It may seem reasonable to rather express the lack of fit in terms of the relative deviation of the one-step-ahead forecasts from the observed values, that is, relative to the magnitude of the observed values.

For example, when trying to predict monthly sales that may fluctuate widely (e.g., seasonally) from month to month, we may be satisfied if our prediction 'hits the target' with about ±10% accuracy.

As is the case with the mean error value (ME, see above), a mean percentage error near 0 (zero) can be produced by large positive and negative percentage errors that cancel each other out.

A quasi-Newton function minimization procedure (the same as in ARIMA is used to minimize either the mean squared error, mean absolute error, or mean absolute percentage error.

On the other hand, in practice, when there are many leading observations prior to a crucial actual forecast, the initial value will not affect that forecast by much, since its effect will have long 'faded' from the smoothed series (due to the exponentially decreasing weights, the older an observation the less it will influence the forecast).

The discussion above in the context of simple exponential smoothing introduced the basic procedure for identifying a smoothing parameter, and for evaluating the goodness-of-fit of a model.

The general idea here is that forecasts are not only computed from consecutive previous observations (as in simple exponential smoothing), but an independent (smoothed) trend and seasonal component can be added.

Again, in this case the sales increase by a certain factor, and the seasonal component is thus multiplicative in nature (i.e., the multiplicative seasonal component in this case would be 1.4).

In plots of the series, the distinguishing characteristic between these two types of seasonal components is that in the additive case, the series shows steady seasonal fluctuations, regardless of the overall level of the series;

see below): Additive model: Forecastt = St + It-p Multiplicative model: Forecastt = St*It-p In this formula, St stands for the (simple) exponentially smoothed value of the series at time t, and It-p stands for the smoothed seasonal factor at time t minus p (the length of the season).

This seasonal component is derived analogous to the St value from simple exponential smoothing as: Additive model: It = It-p + *(1-)*et Multiplicative model: It = It-p + *(1-)*et/St Put into words, the predicted seasonal component at time t is computed as the respective seasonal component in the last seasonal cycle plus a portion of the error (et;

If it is zero, then the seasonal component for a particular point in time is predicted to be identical to the predicted seasonal component for the respective time during the previous seasonal cycle, which in turn is predicted to be identical to that from the previous cycle, and so on.

If the parameter is equal to 1, then the seasonal component is modified 'maximally' at every step by the respective forecast error (times (1-), which we will ignore for the purpose of this brief introduction).

To remain with the toy example above, the sales for a toy can show a linear upward trend (e.g., each year, sales increase by 1 million dollars), exponential growth (e.g., each year, sales increase by a factor of 1.3), or a damped trend (during the first year sales increase by 1 million dollars;

In general, the trend factor may change slowly over time, and, again, it may make sense to smooth the trend component with a separate parameter (denoted [gamma] for linear and exponential trend models, and [phi] for damped trend models).

Analogous to the seasonal component, when a trend component is included in the exponential smoothing process, an independent trend component is computed for each time, and modified as a function of the forecast error and the respective parameter.

Parameter is a trend modification parameter, and affects how strongly changes in the trend will affect estimates of the trend for subsequent forecasts, that is, how quickly the trend will be 'damped' or increased.

If you plot those data, it is apparent that (1) there appears to be a linear upwards trend in the passenger loads over the years, and (2) there is a recurring pattern or seasonality within each year (i.e., most travel occurs during the summer months, and a minor peak occurs during the December holidays).

In general, a time series like the one described above can be thought of as consisting of four different components: (1) A seasonal component (denoted as St, where t stands for the particular point in time) (2) a trend component (Tt), (3) a cyclical component (Ct), and (4) a random, error, or irregular component (It).

The difference between a cyclical and a seasonal component is that the latter occurs at regular (seasonal) intervals, while cyclical factors have usually a longer duration that varies from cycle to cycle.

However, two straightforward possibilities are that they combine in an additive or a multiplicative fashion: Additive model: Xt = TCt + St + It Multiplicative model: Xt = Tt*Ct*St*It Here Xt stands for the observed value of the time series at time t.

Given some a priori knowledge about the cyclical factors affecting the series (e.g., business cycles), the estimates for the different components can be used to compute forecasts for future observations.

Let's consider the difference between an additive and multiplicative seasonal component in an example: The annual sales of toys will probably peak in the months of November and December, and perhaps during the summer (with a much smaller peak) when children are on their summer break.

Again, in this case the sales increase by a certain factor, and the seasonal component is thus multiplicative in nature (i.e., the multiplicative seasonal component in this case would be 1.4).

In plots of series, the distinguishing characteristic between these two types of seasonal components is that in the additive case, the series shows steady seasonal fluctuations, regardless of the overall level of the series;

In terms of our toy example, a 'fashion' trend may produce a steady increase in sales (e.g., a trend towards more educational toys in general);

as with the seasonal component, this trend may be additive (sales increase by 3 million dollars per year) or multiplicative (sales increase by 30%, or by a factor of 1.3, annually) in nature.

For example, a particular toy may be particularly 'hot' during a summer season (e.g., a particular doll which is tied to the release of a major children's movie, and is promoted with extensive advertising).

If the length of the season is even, then the user can choose to use either equal weights for the moving average or unequal weights can be used, where the first and last observation in the moving average window are averaged.

thus, the differences (in additive models) or ratios (in multiplicative models) of the observed and smoothed series will isolate the seasonal component (plus irregular component).

Specifically, the moving average is subtracted from the observed series (for additive models) or the observed series is divided by the moving average values (for multiplicative models).

The combined trend and cyclical component can be approximated by applying to the seasonally adjusted series a 5 point (centered) weighed moving average smoothing transformation with the weights of 1, 2, 3, 2, 1.

Finally, the random or irregular (error) component can be isolated by subtracting from the seasonally adjusted series (additive models) or dividing the adjusted series by (multiplicative models) the trend-cycle component.

If you plot those data, it is apparent that (1) there appears to be an upwards linear trend in the passenger loads over the years, and (2) there is a recurring pattern or seasonality within each year (i.e., most travel occurs during the summer months, and a minor peak occurs during the December holidays).

In general, a time series like the one described above can be thought of as consisting of four different components: (1) A seasonal component (denoted as St, where t stands for the particular point in time) (2) a trend component (Tt), (3) a cyclical component (Ct), and (4) a random, error, or irregular component (It).

The difference between a cyclical and a seasonal component is that the latter occurs at regular (seasonal) intervals, while cyclical factors usually have a longer duration that varies from cycle to cycle.

However, two straightforward possibilities are that they combine in an additive or a multiplicative fashion: Additive Model: Xt = TCt + St + It Multiplicative Model: Xt = Tt*Ct*St*It Where: Xt represents the observed value of the time series at time t.

Given some a priori knowledge about the cyclical factors affecting the series (e.g., business cycles), the estimates for the different components can be used to compute forecasts for future observations.

Consider the difference between an additive and multiplicative seasonal component in an example: The annual sales of toys will probably peak in the months of November and December, and perhaps during the summer (with a much smaller peak) when children are on their summer break.

Again, in this case the sales increase by a certain factor, and the seasonal component is thus multiplicative in nature (i.e., the multiplicative seasonal component in this case would be 1.4).

In plots of series, the distinguishing characteristic between these two types of seasonal components is that in the additive case, the series shows steady seasonal fluctuations, regardless of the overall level of the series;

In terms of the toy example, a 'fashion' trend may produce a steady increase in sales (e.g., a trend towards more educational toys in general);

as with the seasonal component, this trend may be additive (sales increase by 3 million dollars per year) or multiplicative (sales increase by 30%, or by a factor of 1.3, annually) in nature.

For example, a particular toy may be particularly 'hot' during a summer season (e.g., a particular doll which is tied to the release of a major children's movie, and is promoted with extensive advertising).

In fact, unlike many other time-series modeling techniques (e.g., ARIMA) which are grounded in some theoretical model of an underlying process, the X-11 variant of the Census II method simply contains many ad hoc features and refinements, that over the years have proven to provide excellent estimates for many real-world applications (see Burman, 1979, Kendal

When analyzing, for example, monthly revenue figures for an amusement park, the fluctuation in the different numbers of Saturdays and Sundays (peak days) in the different months will surely contribute significantly to the variability in monthly revenues.

The X-11 procedure includes provisions to deal with extreme values through the use of 'statistical control principles,' that is, values that are above or below a certain range (expressed in terms of multiples of sigma, the standard deviation) can be modified or dropped before final estimates for the seasonality are computed.

The X-11 method applies a series of successive refinements of the estimates to arrive at the final trend-cycle, seasonal, and irregular components, and the seasonally adjusted series.

The width of the average span at which the changes in the random component are about equal to the changes in the trend-cycle component is called the month (quarter) for cyclical dominance, or MCD (QCD) for short.

the initial trend-cycle estimate is computed via a centered 4-term moving average, the final trend-cycle estimate in each part is computed by a 5-term Henderson average.) Following the convention of the Bureau of the Census version of the X-11 method, three levels of printout detail are offered: Standard (17 to 27 tables), Long (27 to 39 tables), and Full (44 to 59 tables).

In the description of each table below, the letters S, L, and F are used next to each title to indicate, which tables will be displayed and/or printed at the respective setting of the output option.

Higher income will change people's choice of rental apartments, however, this relationship will be lagged because it will take some time for people to terminate their current leases, find new apartments, and move.

The simplest way to describe the relationship between the two would be in a simple linear relationship: Yt = i*xt-i In this equation, the value of the dependent variable at time t is expressed as a linear function of x measured at times t, t-1, t-2, etc.

common problem that often arises when computing the weights for the multiple linear regression model shown above is that the values of adjacent (in time) values in the x variable are highly correlated.

In extreme cases, their independent contributions to the prediction of y may become so redundant that the correlation matrix of measures can no longer be inverted, and thus, the beta weights cannot be computed.

+ q*iq Almon could show that in many cases it is easier (i.e., it avoids the multicollinearity problem) to estimate the alpha values than the beta weights directly.

The term 'spectrum' provides an appropriate metaphor for the nature of this analysis: Suppose you study a beam of white sun light, which at first looks like a random (white noise) accumulation of light of different wavelengths.

In essence, performing spectrum analysis on a time series is like putting the series through a prism in order to identify the wave lengths and importance of underlying cyclical components.

To contrast this technique with ARIMA or Exponential Smoothing, the purpose of spectrum analysis is to identify the seasonal fluctuations of different lengths, while in the former types of analysis, the length of the seasonal component is usually known (or guessed) a priori and then included in some theoretical model of moving averages or autocorrelations.

If so, then if we were to record those phenomena (e.g., yearly average temperature) and submit the resulting series to a cross-spectrum analysis together with the sun spot data, we may find that the weather indeed correlates with the sunspot activity at the 11 year cycle.

(The reasons for smoothing, and the different common weight functions for smoothing are discussed in the Single Spectrum (Fourier) Analysis.) The square root of the sum of the squared cross-density and quad-density values is called the cross- amplitude.

The result is called the squared coherency, which can be interpreted similar to the squared correlation coefficient (see Correlations - Overview), that is, the coherency value is the squared correlation between the cyclical components in the two series at the respective frequency.

for example, when the spectral density estimates in both series are very small, large coherency values may result (the divisor in the computation of the coherency values will be very small), even though there are no strong cyclical components in either series at the respective frequencies.

The large spectral density estimates for both series, and the cross-amplitude values at frequencies = 0.0625 and = .1875 suggest two strong synchronized periodicities in both series at those frequencies.

In fact, the two series were created as: v1 = cos(2**.0625*(v0-1)) + .75*sin(2**.2*(v0-1)) v2 = cos(2**.0625*(v0+2)) + .75*sin(2**.2*(v0+2)) (where v0 is the case number).

The 'wave length' of a sine or cosine function is typically expressed in terms of the number of cycles per unit time (Frequency), often denoted by the Greek letter nu ( ;

For example, the number of letters handled in a post office may show 12 cycles per year: On the first of every month a large amount of mail is sent (many bills come due on the first of the month), then the amount of mail decreases in the middle of the month, then it increases again towards the end of the month.

As mentioned before, the purpose of spectrum analysis is to decompose the original series into underlying sine and cosine functions of different frequencies, in order to determine those that appear particularly strong or important.

One way to do so would be to cast the issue as a linear Multiple Regression problem, where the dependent variable is the observed time series, and the independent variables are the sine functions of all possible (discrete) frequencies.

Such a linear multiple regression model can be written as: xt = a0 + [ak*cos(k*t) + bk*sin(k*t)]    (for k = 1 to q) Following the common notation from classical harmonic analysis, in this equation (lambda) is the frequency expressed in terms of radians per unit time, that is: = 2**k, where is the constant pi=3.14...

What is important here is to recognize that the computational problem of fitting sine and cosine functions of different lengths to the data can be considered in terms of multiple linear regression.

in order for a sinusoidal function to be identified, you need at least two points: the high peak and the low peak.) To summarize, spectrum analysis will identify the correlation of sine and cosine functions of different frequency with the observed data.

In many textbooks on spectrum analysis, the structural model shown above is presented in terms of complex numbers, that is, the parameter estimation process is described in terms of the Fourier transform of a series into real and imaginary parts.

It is useful to think of real and imaginary numbers as forming a two dimensional plane, where the horizontal or X-axis represents all real numbers, and the vertical or Y-axis represents all imaginary numbers.

For example, the complex number 3+i*2 can be represented by a point with coordinates {3,2} in this plane. You can also think of complex numbers as angles, for example, you can connect the point representing a complex number in the plane with the origin (complex number 0+i*0), and measure the angle of that vector to the horizontal line.

Specifically, the periodogram values above are computed as: Pk = sine coefficientk2 + cosine coefficientk2 * N/2 where Pk is the periodogram value at frequency k and N is the overall length of the series.

For example, you may find large periodogram values for two adjacent frequencies, when, in fact, there is only one strong underlying sine or cosine function at a frequency that falls in-between those implied by the length of the series.

Because the frequency values are computed as N/t (the number of units of times), we can simply pad the series with a constant (e.g., zeros) and thereby introduce smaller increments in the frequency values.

In fact, if we padded the example data file described in the example above with ten zeros, the results would not change, that is, the largest periodogram peaks would still occur at the frequency values closest to .0625 and .2.

In essence, a proportion (p) of the data at the beginning and at the end of the series is transformed via multiplication by the weights: wt = 0.5*{1-cos[*(t - 0.5)/m]}    

In that case, we want to find the frequencies with the greatest spectral densities, that is, the frequency regions, consisting of many adjacent frequencies, that contribute most to the overall periodic behavior of the series.

The Daniell window (Daniell 1946) amounts to a simple (equal weight) moving average transformation of the periodogram values, that is, each spectral density estimate is computed as the mean of the m/2 preceding and subsequent periodogram values.

In the Tukey (Blackman and Tukey, 1958) or Tukey-Hanning window (named after Julius Von Hann), for each frequency, the weights for the weighted moving average of the periodogram values are computed as: wj = 0.5 + 0.5*cos(*j/p)    (for j=0 to p) w-j = wj    (for j 0) Hamming window.

Hamming) window or Tukey-Hamming window (Blackman and Tukey, 1958), for each frequency, the weights for the weighted moving average of the periodogram values are computed as: wj = 0.54 + 0.46*cos(*j/p)    (for j=0 to p) w-j = wj    (for j 0) Parzen window.

In the Parzen window (Parzen, 1961), for each frequency, the weights for the weighted moving average of the periodogram values are computed as: wj = 1-6*(j/p)2 + 6*(j/p)3    (for j = 0 to p/2) wj = 2*(1-j/p)3    (for j = p/2 + 1 to p) w-j = wj    (for j 0) Bartlett window.

In the Bartlett window (Bartlett, 1950) the weights are computed as: wj = 1-(j/p)    (for j = 0 to p) w-j = wj    (for j 0) With the exception of the Daniell window, all weight functions will assign the greatest weight to the observation being smoothed in the center of the window, and increasingly smaller weights to values that are further away from the center.

If the distribution of the observations follows the normal distribution, such a time series is also referred to as a white noise series (like the white noise you hear on the radio when tuned in-between stations).

Again, if the input is a white noise series with respect to those frequencies (i.e., it there are no significant periodic cycles of those frequencies), then the distribution of the periodogram values should again follow an exponential distribution.

Thus, even with today's high-speed computers , it would be very time consuming to analyze even small time series (e.g., 8,000 observations would result in at least 64 million multiplications).

It will use the simple explicit computational formulas as long as the input series is relatively small, and the number of computations can be performed in a relatively short amount of time.

For time series of lengths not equal to a power of 2, we would like to make the following recommendations: If the input series is small to moderately sized (e.g., only a few thousand cases), then do not worry.

In order to analyze moderately large and large series (e.g., over 100,000 cases), pad the series to a power of 2 and then taper the series during the exploratory part of your data analysis.

How to Choose the Right Forecasting Technique

To handle the increasing variety and complexity of managerial forecasting problems, many forecasting techniques have been developed in recent years.

The selection of a method depends on many factors—the context of the forecast, the relevance and availability of historical data, the degree of accuracy desirable, the time period to be forecast, the cost/ benefit (or value) of the forecast to the company, and the time available for making the analysis.

If the forecaster can readily apply one technique of acceptable accuracy, he or she should not try to “gold plate” by using a more advanced technique that offers potentially greater accuracy but that requires nonexistent information or information that is costly to obtain.

The availability of data and the possibility of establishing relationships between the factors depend directly on the maturity of a product, and hence the life-cycle stage is a prime determinant of the forecasting method to be used.

Our purpose here is to present an overview of this field by discussing the way a company ought to approach a forecasting problem, describing the methods available, and explaining how to match method to problem.

Again, if the forecast is to set a “standard” against which to evaluate performance, the forecasting method should not take into account special actions, such as promotions and other marketing devices, since these are meant to change historical patterns and relationships and hence form part of the “performance” to be evaluated.

On the other hand, if management wants a forecast of the effect that a certain marketing strategy under debate will have on sales growth, then the technique must be sophisticated enough to take explicit account of the special actions and events the strategy entails.

Generally, the manager and the forecaster must review a flow chart that shows the relative positions of the different elements of the distribution system, sales system, production system, or whatever is being studied.

Note the points where inventories are required or maintained in this manufacturing and distribution system—these are the pipeline elements, which exert important effects throughout the flow system and hence are of critical interest to the forecaster.

All the elements in dark gray directly affect forecasting procedure to some extent, and the color key suggests the nature of CGW’s data at each point, again a prime determinant of technique selection since different techniques require different kinds of inputs.

In the part of the system where the company has total control, management tends to be tuned in to the various cause-and-effect relationships, and hence can frequently use forecasting techniques that take causal factors explicitly into account.

The flow chart has special value for the forecaster where causal prediction methods are called for because it enables him or her to conjecture about the possible variations in sales levels caused by inventories and the like, and to determine which factors must be considered by the technique to provide the executive with a forecast of acceptable accuracy.

These differences imply (quite correctly) that the same type of forecasting technique is not appropriate to forecast sales, say, at all stages of the life cycle of a product—for example, a technique that relies on historical data would not be useful in forecasting the future of a totally new product that has no history.

Such techniques are frequently used in new-technology areas, where development of a product idea may require several “inventions,” so that RD demands are difficult to estimate, and where market acceptance and penetration rates are highly uncertain.

The multi-page chart “Basic Forecasting Techniques” presents several examples of this type (see the first section), including market research and the now-familiar Delphi technique.1 In this chart we have tried to provide a body of basic information about the main kinds of forecasting techniques.

One of the basic principles of statistical forecasting—indeed, of all forecasting when historical data are available—is that the forecaster should use the data on past performance to get a “speedometer reading” of the current rate (of sales, say) and of how fast this rate is increasing or decreasing.

We shall return to this point when we discuss time series analysis in the final stages of product maturity.) Once the analysis is complete, the work of projecting future sales (or whatever) can begin.

This assumption is more likely to be correct over the short term than it is over the long term, and for this reason these techniques provide us with reasonably accurate forecasts for the immediate future but do quite poorly further into the future (unless the data patterns are extraordinarily stable).

For this same reason, these techniques ordinarily cannot predict when the rate of growth in a trend will change significantly—for example, when a period of slow growth in sales will suddenly change to a period of rapid decay.

When historical data are available and enough analysis has been performed to spell out explicitly the relationships between the factor to be forecast and other factors (such as related businesses, economic forces, and socioeconomic factors), the forecaster often constructs a causal model.

We shall trace the forecasting methods used at each of the four different stages of maturity of these products to give some firsthand insight into the choice and application of some of the major techniques available today.

The technique selected by the forecaster for projecting sales therefore should permit incorporation of such “special information.” One may have to start with simple techniques and work up to more sophisticated ones that embrace such possibilities, but the final goal is there.

For example, priority pattern analysis can describe consumers’ preferences and the likelihood they will buy a product, and thus is of great value in forecasting (and updating) penetration levels and rates.

The prices of black-and-white TV and other major household appliances in 1949, consumer disposable income in 1949, the prices of color TV and other appliances in 1965, and consumer disposable income for 1965 were all profitably considered in developing our long-range forecast for color-TV penetration on a national basis.

This reinforces our belief that sales forecasts for a new product that will compete in an existing market are bound to be incomplete and uncertain unless one culls the best judgments of fully experienced personnel.

The basic tools here are the input-output tables of U.S. industry for 1947, 1958, and 1963, and various updatings of the 1963 tables prepared by a number of groups who wished to extrapolate the 1963 figures or to make forecasts for later years.

(Other techniques, such as panel consensus and visionary forecasting, seem less effective to us, and we cannot evaluate them from our own experience.) Before a product can enter its (hopefully) rapid penetration stage, the market potential must be tested out and the product must be introduced—and then more market testing may be advisable.

At this stage, management needs answers to these questions: Significant profits depend on finding the right answers, and it is therefore economically feasible to expend relatively large amounts of effort and money on obtaining good forecasts, short-, medium-, and long-range.

sales forecast at this stage should provide three points of information: the date when rapid sales will begin, the rate of market penetration during the rapid-sales stage, and the ultimate level of penetration, or sales rate, during the steady-state stage.

A company’s only recourse is to use statistical tracking methods to check on how successfully the product is being introduced, along with routine market studies to determine when there has been a significant increase in the sales rate.

For example, it is important to distinguish between sales to innovators, who will try anything new, and sales to imitators, who will buy a product only after it has been accepted by innovators, for it is the latter group that provides demand stability.

and in general, we find, scientifically designed consumer surveys conducted on a regular basis provide the earliest means of detecting turning points in the demand for a product.

But, more commonly, the forecaster tries to identify a similar, older product whose penetration pattern should be similar to that of the new product, since overall markets can and do exhibit consistent patterns.

When black-and-white TV was introduced as a new product in 1948–1951, the ratio of expenditures on radio and TV sets to total expenditures for consumer goods (see column 7) increased about 33% (from 1.23% to 1.63%), as against a modest increase of only 13% (from 1.63% to 1.88%) in the ratio for the next decade.

(A similar increase of 33% occurred in 1962–1966 as color TV made its major penetration.) Probably the acceptance of black-and-white TV as a major appliance in 1950 caused the ratio of all major household appliances to total consumer goods (see column 5) to rise to 4.98%;

Whereas it took black-and-white TV 10 years to reach steady state, qualitative expert-opinion studies indicated that it would take color twice that long—hence the more gradual slope of the color-TV curve.

At the same time, studies conducted in 1964 and 1965 showed significantly different penetration sales for color TV in various income groups, rates that were helpful to us in projecting the color-TV curve and tracking the accuracy of our projection.

The forecasts were accurate through 1966 but too high in the following three years, primarily because of declining general economic conditions and changing pricing policies We should note that when we developed these forecasts and techniques, we recognized that additional techniques would be necessary at later times to maintain the accuracy that would be needed in subsequent periods.

As we have seen, this date is a function of many factors: the existence of a distribution system, customer acceptance of or familiarity with the product concept, the need met by the product, significant events (such as color network programming), and so on.

As well as by reviewing the behavior of similar products, the date may be estimated through Delphi exercises or through rating and ranking schemes, whereby the factors important to customer acceptance are estimated, each competitor product is rated on each factor, and an overall score is tallied for the competitor against a score for the new product.

Medium- and long-range forecasting of the market growth rate and of the attainment of steady-state sales requires the same measures as does the product introduction stage—detailed marketing studies (especially intention-to-buy surveys) and product comparisons.

When a product has entered rapid growth, on the other hand, there are generally sufficient data available to construct statistical and possibly even causal growth models (although the latter will necessarily contain assumptions that must be verified later).

Here we have used components for color TV sets for our illustration because we know from our own experience the importance of the long flow time for color TVs that results from the many sequential steps in manufacturing and distribution (recall Exhibit II).

The inventories all along the pipeline also follow an S-curve (as shown in Exhibit VI), a fact that creates and compounds two characteristic conditions in the pipeline as a whole: initial overfilling and subsequent shifts between too much and too little inventory at various points—a sequence of feast-and-famine conditions.

One main activity during the rapid-growth stage, then, is to check earlier estimates and, if they appear incorrect, to compute as accurately as possible the error in the forecast and obtain a revised estimate.

In the case of color TV, we found we were able to estimate the overall pipeline requirements for glass bulbs, the CGW market-share factors, and glass losses, and to postulate a probability distribution around the most likely estimates.

It is possible that swings in demand and profit will occur because of changing economic conditions, new and competitive products, pipeline dynamics, and so on, and the manager will have to maintain the tracking activities and even introduce new ones.

In planning production and establishing marketing strategy for the short and medium term, the manager’s first considerations are usually an accurate estimate of the present sales level and an accurate estimate of the rate at which this level is changing.

On the other hand, a component supplier may be able to forecast total sales with sufficient accuracy for broad-load production planning, but the pipeline environment may be so complex that the best recourse for short-term projections is to rely primarily on salespersons’ estimates.

In general, however, at this point in the life cycle, sufficient time series data are available and enough causal relationships are known from direct experience and market studies so that the forecaster can indeed apply these two powerful sets of tools.

People frequently object to using more than a few of the most recent data points (such as sales figures in the immediate past) for building projections, since, they say, the current situation is always so dynamic and conditions are changing so radically and quickly that historical data from further back in time have little or no value.

In practice, we find, overall patterns tend to continue for a minimum of one or two quarters into the future, even when special conditions cause sales to fluctuate for one or two (monthly) periods in the immediate future.

Consider what would happen, for example, if a forecaster were merely to take an average of the most recent data points along a curve, combine this with other, similar average points stretching backward into the immediate past, and use these as the basis for a projection.

To avoid precisely this sort of error, the moving average technique, which is similar to the hypothetical one just described, uses data points in such a way that the effects of seasonals (and irregularities) are eliminated.

Furthermore, the executive needs accurate estimates of trends and accurate estimates of seasonality to plan broad-load production, to determine marketing efforts and allocations, and to maintain proper inventories—that is, inventories that are adequate to customer demand but are not excessively costly.

(We might further note that the differences between this trend-cycle line and the deseasonalized data curve represent the irregular or nonsystematic component that the forecaster must always tolerate and attempt to explain by other methods.) In sum, then, the objective of the forecasting technique used here is to do the best possible job of sorting out trends and seasonalities.

Unfortunately, most forecasting methods project by a smoothing process analogous to that of the moving average technique, or like that of the hypothetical technique we described at the beginning of this section, and separating trends and seasonals more precisely will require extra effort and cost.

We have found that an analysis of the patterns of change in the growth rate gives us more accuracy in predicting turning points (and therefore changes from positive to negative growth, and vice versa) than when we use only the trend cycle.

One of the best techniques we know for analyzing historical data in depth to determine seasonals, present sales rate, and growth is the X-11 Census Bureau Technique, which simultaneously removes seasonals from raw information and fits a trend-cycle line to the data.

In particular, when recent data seem to reflect sharp growth or decline in sales or any other market anomaly, the forecaster should determine whether any special events occurred during the period under consideration—promotion, strikes, changes in the economy, and so on.

Generally, even when growth patterns can be associated with specific events, the X-11 technique and other statistical methods do not give good results when forecasting beyond six months, because of the uncertainty or unpredictable nature of the events.

Because economic forecasts are becoming more accurate and also because there are certain general “leading” economic forces that change before there are subsequent changes in specific industries, it is possible to improve the forecasts of businesses by including economic factors in the forecasting model.

(A later investigation did establish definite losses in color TV sales in 1967 due to economic conditions.) In 1969 Corning decided that a better method than the X-11 was definitely needed to predict turning points in retail sales for color TV six months to two years into the future.

Using data extending through 1968, the model did reasonably well in predicting the downturn in the fourth quarter of 1969 and, when 1969 data were also incorporated into the model, accurately estimated the magnitude of the drop in the first two quarters of 1970.

While some companies have already developed their own input-output models in tandem with the government input-output data and statistical projections, it will be another five to ten years before input-output models are effectively used by most major corporations.

Doubtless, new analytical techniques will be developed for new-product forecasting, but there will be a continuing problem, for at least 10 to 20 years and probably much longer, in accurately forecasting various new-product factors, such as sales, profitability, and length of life cycle.

Mod-02 Lec-02 Forecasting -- Time series models -- Simple Exponential smoothing

Operations and Supply Chain Management by Prof. G. Srinivasan , Department of Management Studies, IIT Madras. For more details on NPTEL visit

12. Time Series Analysis III

MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013 View the complete course: Instructor: Peter Kempthorne This is the last of three lectures..

6. Monte Carlo Simulation

MIT 6.0002 Introduction to Computational Thinking and Data Science, Fall 2016 View the complete course: Instructor: John Guttag Prof. Guttag discusses the Monte..

Mod-06 Lec-24 Sequencing and scheduling -- Assumptions, objectives and shop settings

Operations and Supply Chain Management by Prof. G. Srinivasan , Department of Management Studies, IIT Madras. For more details on NPTEL visit

Choice Modeling, How To Use and Analyze The Results

Choice modeling can be used to measure the importance of factors that go into consumer decision-making. Choice modeling simulates their shopping process, with all of the important variables...

Lec-17 State Estimation

Lecture Series on Estimation of Signals and Systems by Prof.S. Mukhopadhyay, Department of Electrical Engineering, IIT Kharagpur. For more details on NPTEL visit

Statistics 101: Simple Linear Regression, The Very Basics

This is the first video in what will be, or is (depending on when you are watching this) a multipart video series about Simple Linear Regression. In the next few minutes we will cover the basics...

Lecture 13 - Validation

Validation - Taking a peek out of sample. Model selection and data contamination. Cross validation. Lecture 13 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa....

Around the Bureaus: Forecasting the Ferocious

Forecasting the Ferocious: The Predictive Science Behind NWS Forecasts for Tornadoes and Floods Greg Carbin, Chief of Forecast Operations for the NWS Weather Prediction Center, NOAA Despite...