# CUHK- Shenzhen STOR 556 - Forecasting

What is this about?Some simple forecasting methodsEvaluating forecast accuracyMore on cross-validationSome other forecasting methods: Exponential smoothingForecast combinationsReadingReferencesMore on ForecastingVladas Pipiras, STOR @ UNC-CHMarch, 2022What is this about?We have looked at a few forecasting methods, e.g. based on ARIMA models. Here:• A few other forecasting methods, including more naïve.• How does one think about forecast accuracy?• Which method to use? And perhaps not to use?Some simple forecasting methodsAverage methodˆyT +h|T= ¯y = (y1+ · · · + yT)/T.R: meanf(y, h)Naïve methodˆyT +h|T= yT.R: naive(y, h) or rwf(y, h)Drift methodˆyT +h|T= yT+hT − 1TXt=2(yt− yt−1) = yT+ hyT− y1T − 1.R: rwf(y, h, drift=TRUE)Examplelibrary(fpp2)beer2 <- window(ausbeer,start=1992,end=c(2007,4))beerfit1 <- meanf(beer2,h=10)beerfit2 <- rwf(beer2,h=10)beerfit3 <- snaive(beer2,h=10)autoplot(window(ausbeer, start=1992)) +autolayer(beerfit1, series="Mean", PI=FALSE) +autolayer(beerfit2, series="Naïve", PI=FALSE) +autolayer(beerfit3, series="Seasonal naïve", PI=FALSE) +xlab("Year") + ylab("Megalitres") +ggtitle("Forecasts for quarterly beer production") +guides(colour=guide_legend(title="Forecast"))14004505001995 2000 2005 2010YearMegalitresForecastMeanNaïveSeasonal naïveForecasts for quarterly beer productionEvaluating forecast accuracyThe accuracy of forecasts can be determined by considering not model residuals but only how well a modelperforms on new data that were not used when fitting the model.Training and test sets• A model which fits the training data well will not necessarily forecast well.• A perfect fit can always be obtained by using a model with enough parameters.• Over-fitting a model to data is just as bad as failing to identify a systematic pattern in the data.Measures of forecast accuracy are based on:Forecast errorseT +h= yT +h− ˆyT +h|TScale-dependent errorsMean absolute error: MAE = mean(|et|),Root mean squared error: RMSE =qmean(e2t).Percentage errors The percentage error is given by pt= 100et/yt.Mean absolute percentage error: MAPE = mean(|pt|).Scaled errorsMASE = mean(|qj|),where for a non-seasonal timme seriesqj=ej1T − 1TXt=2|yt− yt−1|2and for a seasonal time seriesqj=ej1T − mTXt=m+1|yt− yt−m|.Examplebeer3 <- window(ausbeer, start=2008)accuracy(beerfit1, beer3)## ME RMSE MAE MPE MAPE MASE ACF1## Training set 0.000 43.62858 35.23438 -0.9365102 7.886776 2.463942 -0.10915105## Test set -13.775 38.44724 34.82500 -3.9698659 8.283390 2.435315 -0.06905715## Theil's U## Training set NA## Test set 0.801254accuracy(beerfit2, beer3)## ME RMSE MAE MPE MAPE MASE## Training set 0.4761905 65.31511 54.73016 -0.9162496 12.16415 3.827284## Test set -51.4000000 62.69290 57.40000 -12.9549160 14.18442 4.013986## ACF1 Theil's U## Training set -0.24098292 NA## Test set -0.06905715 1.254009accuracy(beerfit3, beer3)## ME RMSE MAE MPE MAPE MASE ACF1## Training set -2.133333 16.78193 14.3 -0.5537713 3.313685 1.0000000 -0.2876333## Test set 5.200000 14.31084 13.4 1.1475536 3.168503 0.9370629 0.1318407## Theil's U## Training set NA## Test set 0.298728Another examplegoogfc1 <- meanf(goog200, h=40)googfc2 <- rwf(goog200, h=40)googfc3 <- rwf(goog200, drift=TRUE, h=40)autoplot(subset(goog, end = 240)) +autolayer(googfc1, PI=FALSE, series="Mean") +autolayer(googfc2, PI=FALSE, series="Naïve") +autolayer(googfc3, PI=FALSE, series="Drift") +xlab("Day") + ylab("Closing Price (US\$)") +ggtitle("Google stock price (daily ending 6 Dec 13)") +guides(colour=guide_legend(title="Forecast"))34004505005500 50 100 150 200 250DayClosing Price (US\$)ForecastDriftMeanNaïveGoogle stock price (daily ending 6 Dec 13)googtest <- window(goog, start=201, end=240)accuracy(googfc1, googtest)## ME RMSE MAE MPE MAPE MASE## Training set -4.296286e-15 36.91961 26.86941 -0.6596884 5.95376 7.182995## Test set 1.132697e+02 114.21375 113.26971 20.3222979 20.32230 30.280376## ACF1 Theil's U## Training set 0.9668981 NA## Test set 0.8104340 13.92142accuracy(googfc2, googtest)## ME RMSE MAE MPE MAPE MASE## Training set 0.6967249 6.208148 3.740697 0.1426616 0.8437137 1.000000## Test set 24.3677328 28.434837 24.593517 4.3171356 4.3599811 6.574582## ACF1 Theil's U## Training set -0.06038617 NA## Test set 0.81043397 3.451903accuracy(googfc3, googtest)## ME RMSE MAE MPE MAPE MASE## Training set -5.998536e-15 6.168928 3.824406 -0.01570676 0.8630093 1.022378## Test set 1.008487e+01 14.077291 11.667241 1.77566103 2.0700918 3.119002## ACF1 Theil's U## Training set -0.06038617 NA## Test set 0.64732736 1.709275Time series cross-validationA more sophisticated version of training/test sets is time series cross-validation.4The forecast accuracy is computed by averaging over the test sets.e <- tsCV(goog200, rwf, drift=TRUE, h=1)sqrt(mean(eˆ2, na.rm=TRUE))## [1] 6.233245sqrt(mean(residuals(rwf(goog200, drift=TRUE))ˆ2, na.rm=TRUE))## [1] 6.168928A good way to choose the best forecasting model is to find the model with the smallest RMSE computedusing time series cross-validation.More on cross-validationVariants of prequential approaches5Variants of cross validationSome observations from Cerqueira et al. (2020):• Empirical experiments suggest that blocked cross-validation can be applied to stationary time series.•When the time series are non-stationary, the most accurate estimates are produced by out-of-samplemethods, particularly the holdout approach repeated in multiple testing periods.Some other forecasting methods: Exponential smoothingSimple exponential smoothing (SES)For forecasting data with no clear trend or seasonal pattern:ˆyT +1|T= αyT+ α(1 − α)yT −1+ α(1 − α)2yT −2+ · · ·whereα ∈[0,1] is the smoothing parameter. This can be rewritten asˆyT +1|T=αyT+ (1− α)ˆyT |T −1andmore generally asˆyt+1|t= αyt+ (1 − α)ˆyt|t−1.This is also expressed in a component form asForecast equation ˆyt+h|t= `tSmoothing equation `t= αyt+ (1 − α)`t−1.The smoothing parameter α and the starting value `0are chosen to minimizeSSE =TXt=1(yt− ˆyt|t−1)2=TXt=1e2t.6Exampleoildata <- window(oil, start=1996)# Estimate parametersfc <- ses(oildata, h=5)fc\$model## Simple exponential smoothing#### Call:## ses(y = oildata, h = 5)#### Smoothing parameters:## alpha = 0.8339#### Initial states:## l = 446.5868#### sigma: 29.8282#### AIC AICc BIC## 178.1430 179.8573 180.8141fc## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95## 2014 542.6806 504.4541 580.9070 484.2183 601.1429## 2015

