This paper sets out to forecast the US monthly inflation rate using various statistical models and the FRED-MD database, which contains monthly observations on 127 macroeconomic variables from 1959 to 2022. Forecasting (expected) inflation rates is essential for the plans of private individuals and firms as well as governments’ and central banks’ policy decisions.
In particular, this paper tests the accuracy of various univariate time series models, time series models with exogenous regressors, and Machine Learning models in a pseudo-out-of-sample experiment. An AR(1) model will serve as the benchmark to which the performance of the other models is compared. The exogenous regressors are selected by theory and by correlation with the target variable. Finally, the paper considers how a combination of models can increase the accuracy of their forecasts.
CONTENTS
List of Abbreviations
List of Figures
List of Tables
1 Introduction
2 Data
3 Methodology
3.1 Time Series Models
3.2 Machine Learning Models
3.3 Forecast Combination
4 Results
Bibliography
Appendix
a Structural Breaks in the Target Variable
b Overview of the Exogenous Predictors
c Choosing the Optimal Window Length
d Forecast Evaluation
List of Abbreviations
Abbildung in dieser Leseprobe nicht enthalten
LIST OF FIGURES
Figure 1 US Consumer Price Inflation
Figure 2 Structural Break Tests
Figure 3 Ideal Window Length
Figure 4 Forecast Errors
Figure 5 Forecast Error Distribution
LIST OF TABLES
Table 1 Results of the Forecasting Experiment
1 INTRODUCTION
This paper sets out to forecast the US monthly inflation rate using various statistical models and the FRED-MD database, which contains monthly observations on 127 macroeconomic variables from 1959 to 2022. Such inflation rate forecasts are essential to governments’ and central banks’ policy decisions. In particular, I test the accuracy of various univariate time series models, time series models with exogenous regressors, and Machine Learning models in a pseudo-out-of-sample experiment. An AR(1) model will serve as the benchmark to which the performance of the other models is compared. The exogenous regressors are selected by theory as well as by correlation with the target variable. Finally, I consider how a combination of models can increase the accuracy of their forecasts.
The remainder of this paper proceeds as follows. Section 2 gives a description of the data and particularly the target variable. After that, section 3 will provide an overview of the methodology, while section 4 will present the results.
2 DATA
The analysis is based on the FRED-MD database, a large-scale collection of 127 monthly macroeconomic variables from January 1959 to October 2022, provided by the Federal Reserve Bank of St. Louis.1 Besides the target variable - the US consumer price inflation (CPI, all items) - it also provides data on other economic indicators, like GDP, employment, and trade. After transforming the data into a stationary and balanced panel, 761 observations on 114 exogenous predictors and the target variable remain. For more information on the variables and their transformation, please refer to McCracken and Ng (2016).
For the analysis, I use the CPI rate, defined as A2log(xt), where xt is the inflation index in month t, as provided in the untransformed data. Figure 1 illustrates the inflation index (Panel A) and the inflation rate (Panel B). Note the overall monotonous uptrend in the inflation index with overall few variations. The inflation rate itself fluctuates around 0 per cent, with some notable exceptions in the mid-1970s, the Great Financial Crisis, and the
Abbildung in dieser Leseprobe nicht enthalten
Figure 1: US Consumer Price Inflation. Panel A shows the US CPI Index for all items. The index is normalised to 100 in 1985. Panel B shows the corresponding inflation rate obtained through the transformation of the index. The shaded grey areas highlight recessions. Author’s visualisation based on data from FRED-MD.
Recent Covid-19 pandemic. Using a variety of structural change tests, I find no evidence of any structural break in the target variable (see Appendix A).
Two batches of exogenous regressors are selected from the FRED-MD dataset. The first batch is based on economic theory and includes nine predictors. The second batch comprises all those predictors with an absolute correlation coefficient with the target variable exceeding 0.15 (a total of ten predictors). Appendix B gives an overview of these predictors.
3 METHODOLOGy
The forecasting horizon of this paper is h = 2 months. All models are estimated and evaluated in a pseudo-out-of-sample experiment, which is based on a rolling window scheme of fixed length. The optimal window size is determined endogenously for the benchmark AR(1) model by additional out-of-sample experiments with window sizes varying between 100 and 500 (see Appendix C). A window length of R = 198 is found to minimise both the MAE as well as the RMSE and is subsequently used for all models.2
The estimation of the models requires up to p lags of the target variable and one lag of the exogenous predictors. The optimal lag length (as well as the optimal set of hyperparameters, see below) is specifically selected for each individual evaluation period.
The forecast performance of the different models is evaluated by comparing the accuracy of their forecast relative to the benchmark AR(1) model using the MAE and RMSE as loss functions. Note that the MAE and RMSE are reported relative to the reference AR(1) model, meaning that a value below 1 indicates an outperformance of the respective model relative to the AR(1) model. In contrast, a value larger than 1 indicates a worse performance. To test whether the differences in predictive accuracy between models and benchmark are statistically significant, this paper employs the modified Diebold-Mariano test (see Diebold and Mariano (1995) and Harvey et al. (1997)).
3.1 Time Series Models
The most basic time series models considered are the historical average (H.MEAN) and an exponential smoothing (ETS) model. The former produces a forecast that is the average of the past realisations of the target variable (yt = t-1T Ei-i yt-i)- The ETS model generates forecasts using past observations of the target variable, weighted with an exponentially decreasing weight. The AIC selects the precise configuration of the ETS model for each window.
3.1.1 ARMA Models
The brunt of the univariate time series models this paper considers are variations of the ARMA(p,q) model, which reads the following equation:
Abbildung in dieser Leseprobe nicht enthalten
Where p is the order of the autoregressive terms and q is the order of the moving average terms. Note that et ~ (0,o2). For the benchmark AR(1) model, p = 1 and q = 0. As an extension to the benchmark, I will consider an AR(2), AR(p), MA(1), MA(2), MA(q), as well as an ARMA(1,1) and ARMA(p,q) model, where the parameters p and q of the AR(p), MA(q) and ARMA(p,q) are chosen by the AIC for each window. The coefficients ^o, ^i, and 0i are all estimated via OLS.
3.1.2 ARX Models
I will also consider one AR(1)-X(1) and one AR(p)-X(1) model, where X is a matrix of the exogenous regressors:
Abbildung in dieser Leseprobe nicht enthalten
In the case of the AR(p)-X(1) model, p is again chosen for each window by minimising the AIC. Note that any model containing exogenous regressors is estimated twice: once for the predictors that were selected based on theory and once for the predictors based on correlation. The coefficients are again estimated via OLS.
3.1.3 VAR Model
The last traditional time series model I will consider is the VAR(1) model, which reads the following equation:
Abbildung in dieser Leseprobe nicht enthalten
Where y is a vector containing the target variable and one set of the (unlagged) predictors, and A is a coefficient matrix estimated by OLS.
3.2 Machine Learning Models
Besides the traditional time series models presented above, I will also use two classes of Machine Learning models. These are particularly useful in a high-dimensional setting due to the bias-variance tradeoff. Even though I will only consider a maximum of ten exogenous predictors in the correlation batch and nine in the theory batch, they tend to be moderately correlated, raising the question of predictor redundancy. Note that all regressors have undergone a z-transformation before being used as an input into the models.3
3.2.1 Ridge, LASSO, and Elastic Net Models
These shrinkage models append a penalty term to the OLS objective function, which can improve forecast accuracy. A general equation for these models reads:
Abbildung in dieser Leseprobe nicht enthalten
Where a = 0 in the case of a Ridge Regression (see Hoerl and Kennard 1970), a = 1 in the LASSO case (Tibshirani 1996) and a G [0,1] for the Elastic Net (E.NET) model (Zou and Hastie 2005). The A parameter - as well as a in the Elastic Net model - are determined by k-fold cross-validation for each window.
3.2.2 Neural Network
As a final model, I consider a simple neural network with a single hidden layer, which takes p lagged values of the target variable (determined by the AIC) as well as the exogenous regressors as input. The number of nodes in the hidden layer is set to half the number of input nodes plus one.4
3.3 Forecast Combination
As a final forecasting method, I consider a combination of the forecasts of the previous models, which may outperform individual forecasts (see, for example, Stock and Watson 2004, who applied this to output growth forecasting). In particular, this paper considers a simple average (C.MEAN) combination of all the previous forecasts, excluding the benchmark.
4 RESULTS
The results of the out-of-sample forecast experiment, as well as the forecasts produced by these models, are given in Table 1.
All the univariate time series models (Panel A) outperform the benchmark AR(1) model, with the MA(2) showing the best out-of-sample performance in terms of MAE and RMSE. Including external predictors can further boost the forecast accuracy relative to the benchmark but does not do so in all models. The Ridge, LASSO and Elastic Net models, in particular, show a solid out-of-sample performance. Interestingly, the models with exogenous variables tend to fare better when the predictors are chosen by correlation (Panel C) rather than theory (Panel B). Overall the Ridge regression with predictors based on correlation shows the best overall performance of a single model. The C.MEAN (Panel D), which is the average forecast over all the models presented in Table 1, shows an even more impressive performance relative to the benchmark, beating all other models.
Also shown are the results of the Diebold-Mariano test, which checks whether differences in predictive accuracy between each model and the benchmark are statistically significant. The results indicate that several models produce significant differences in predictive accuracy, particularly the univariate time series models. Only two of the models in Panel B can pass the DM test, whereas five models from Panel C can do so, thus further highlighting the difference that the choice of predictors can make. Appendix D gives a visual overview of the forecast performance of each model.
The last column of Table 1 shows the forecasted CPI rate for December 2022 for each individual model. All offer a negative forecast, ranging between -0.01% to -0.44%.
Table 1: Results of the Forecasting Experiment
Abbildung in dieser Leseprobe nicht enthalten
The first column shows the forecasting models used in this paper (see section 3). The following two columns give the performance of the models relative to the AR(1) benchmark in terms of the MAE and RMSE. The fourth and fifth columns show the DM test statistic and respective p-value. The sixth column shows the forecast for December 2022 (in %).
BIBLIOBRAPHY
Brown, R. L., Durbin J., and J. M. Evans (1975). “Techniques for Testing the Constancy of Regression Relationships over Time”. In: Journal of the Royal Statistical Soceity 37.2, pp. 149-192.
Chu, Chia-Shang J., Kurt Hornik, and Chung-Ming Kuan (1995). “MOSUM Tests for Parameter Constancy”. In: Biometrika 82.3, pp. 603-617.
Diebold, Francis X. and Roberto S. Mariano (1995). “Comparing Predictive Accuracy”. In: Journal of Business & Economic Statistics 13.3, pp. 253-263.
Harvey, David, Stephen Leybourne, and Paul Newbold (1997). “Testing the Equality of Prediction Mean Squared Errors”. In: International Journal of Forecasting 13.2, pp. 281291.
Hoerl, Arthur E. and Robert W. Kennard (1970). “Ridge Regression: Biased Estimation for Nonorthogonal Problems”. In: Technometrics 12.1, pp. 55-67.
McCracken, Michael W. and Serena Ng (2016). “FRED-MD: A Monthly Database For Macroeconomic Research”. In: Journal of Business & Economic Statistics 34.4, pp. 574589.
Ploberger, Werner and Walter Krämer (1992). “The CUSUM Test with OLS Residuals”. In: Econometrica 60.2, pp. 271-285.
Stock, James H. and Mark W. Watson (1999). “Forecasting Inflation”. In: Journal of Monetary Economics 44.1, pp. 293-335.
- (2004). “Combination Forecasts of Output Growth in a Seven-Country Data Set”. In: Journal of Forecasting 23.1, pp. 405-430.
Tibshirani, Robert (1996). “Regression Shrinkage and Selection via the Lasso”. In: Journal of the Royal Statistical Soceity 58.1, pp. 267-288.
Zeileis, Achim et al. (2002). “strucchange: An R Package for Testing for Structural Change in Linear Regression Models”. In: Journal of Statistical Software 7.2, pp. 1-38.
Zou, Hui and Trevor Hastie (2005). “Regularization and Variable Selection via the Elastic Net”. In: Journal of the Royal Statistical Soceity 67.2, pp. 301-320.
APPENDIX
A STRUCTURAL BREAKS IN THE TARGET VARIABLE
Structural breaks, that is, changes in the mean or variance of a time series, can significantly impact forecast performance. Figure 2 illustrates the results of various structural break tests, finding that the target variable is stationary and the forecasting models described in the Methodology section are suitable.
Abbildung in dieser Leseprobe nicht enthalten
Figure 2: Structural Break Tests. This figure visualises the results of four structural break tests. The OLS-based CUSUM (Brown et al. 1975) and Recursive CUSUM (Ploberger and Krämer 1992) tests are designed to capture one or multiple structural change points in the mean of a time series. The OLS-based and Recursive MOSUM tests (see Chu et al. 1995) are designed to capture structural changes in the variance of a time series. As can be seen, the null hypothesis of no structural break cannot be rejected in any case, suggesting that the target variable does not exhibit a structural break in its mean or variance. Tests computed with the strucchange package (see Zeileis et al. 2002). Author’s own visualisation.
Abbildung in dieser Leseprobe nicht enthalten
B OVERVIEW Of THE EXOGENOUS PREDICTORS
As highlighted in the Data section, two sets of exogenous regressors are used in the out-ofsample forecast experiment.
The first batch of regressors is based on economic theory: output, employment, interest rates, and commodity prices are generally thought to influence the inflation rate (see, for example, Stock and Watson (1999)). The specific regressors considered in this paper are (1) the real personal income, (2) the capacity utilisation in the manufacturing sector, (3) the civilian unemployment rate, (4) the total new privately owned housing starts, (5) the real personal consumption expenditures, (6) the real M2 money stock, (7) the 3-month Treasury Bill rate, (8) the 10-year treasury bill rate, and (9) the crude oil price.
The second set of regressors is chosen based on the size of their absolute correlation with the target variable. I have first lagged the predictors, such that the correlation is between the target variable in t and the predictors in t — 1. In particular, I chose all those variables in the FRED-MD database whose absolute (lagged) correlation with the target variable exceeds 0.15 during the entire sample period. This led to the selection of 10 predictors: (1) the industrial production of residential units, (2) the real M2 money stock, (3) the total reserves of depository institutions, (4) the crude oil price, (5) the consumer price index for commodities, (6) the consumer price index for all items less food, (7) the consumer price index for all items less shelter, (8) the consumer price index less medical care, (9) the personal consumption expenditure chain index, and (10) the personal consumption index of non-durable goods. These predictors do not only exhibit a relatively strong correlation to the target variable but also - in some cases - among themselves, making it ideal for models capable of handling predictor redundancy.
C CHOOSING THE OPTIMAL WINDOW LENGTH
The selection of an optimal window size is crucial in obtaining good forecasts. Instead of choosing an arbitrary value for the window size, I test for the ideal window size for the benchmark AR(1) model and the forecast horizon h = 2. The window size that produces the most accurate forecasts (i.e. the model with the lowest loss in terms of MAE and RMSE) is then selected. Figure 3 illustrates the MAE and RMSE for the AR(1) model for window sizes between 100 and 500. As can be seen, both loss functions show a minimal loss for a window size of 198.
Abbildung in dieser Leseprobe nicht enthalten
Figure 3: Ideal Window Length. This figure illustrates the MAE and RMSE loss functions for the benchmark AR(1) model and window sizes between 100 and 500. In the case of h = 2, both loss functions are minimal for a window size of R = 198, which is used in the out-of-sample forecasting experiment of this paper. Note that different horizons h produce different ideal window sizes. Author’s own visualisation.
D FORECAST EVALUATION
The following two plots visualise the forecast performance of each individual model.
Figure 4 shows the forecast error over time. Overall, the models tend to perform well in the 1980s and 1990s and then see an increase in the forecast error during the 2000s and into the early 2010s. This is followed by another period of lower forecast errors, which ends with the beginning of the Covid-19 pandemic.
Figure 5 shows the distribution of the forecast errors for each model. As can be seen, the forecast error distribution tends to be non-normal with fat tails. Also, some univariate time series models show evidence of a bimodal error distribution.
[...]
1 See https://research.stlouisfed.org/econ/mccracken/fred-databases
2 Ideally, one would like to compute an ideal window length for each model and each forecasting horizon. However, this is very computationally expensive and makes the different models less comparable. Still, this may be an interesting expansion for future research.
3 It is important to state that the predictors used for the parameter estimation and those inputted for forecasting were scaled to the same mean and variance.
Frequently asked questions
What is the purpose of this paper on forecasting US inflation?
The paper aims to forecast the US monthly inflation rate using various statistical and machine learning models, leveraging the FRED-MD database of macroeconomic variables. It seeks to evaluate the accuracy of different models, including time series approaches and machine learning techniques, in predicting inflation.
What data is used in this analysis?
The analysis utilizes the FRED-MD database, which contains 127 monthly macroeconomic variables from January 1959 to October 2022. The target variable is the US consumer price inflation (CPI), and the database also provides data on other economic indicators.
How is the target variable (CPI inflation rate) defined?
The CPI rate is defined as A2log(xt), where xt is the inflation index in month t, as provided in the untransformed FRED-MD data.
What is the methodology used for forecasting?
The paper employs a pseudo-out-of-sample experiment with a rolling window scheme. It compares various univariate time series models, time series models with exogenous regressors, and machine learning models. The performance is evaluated against an AR(1) benchmark model using MAE and RMSE loss functions. A modified Diebold-Mariano test is used to determine statistical significance.
What time series models are considered?
The time series models include historical average (H.MEAN), exponential smoothing (ETS), ARMA(p,q) variations (AR(1), AR(2), AR(p), MA(1), MA(2), MA(q), ARMA(1,1), ARMA(p,q)), ARX models (AR(1)-X(1), AR(p)-X(1)), and a VAR(1) model.
What machine learning models are used?
The machine learning models used are Ridge regression, LASSO regression, Elastic Net models, and a simple neural network with a single hidden layer.
How are exogenous regressors selected?
Exogenous regressors are selected in two ways: one batch is based on economic theory, and another batch comprises predictors with an absolute correlation coefficient with the target variable exceeding 0.15.
How is forecast accuracy evaluated?
Forecast accuracy is evaluated by comparing the MAE and RMSE of the different models relative to the benchmark AR(1) model. The modified Diebold-Mariano test assesses statistical significance of differences in predictive accuracy.
What are the main results of the forecasting experiment?
Univariate time series models generally outperform the AR(1) benchmark. Including external predictors can further boost accuracy, particularly with Ridge, LASSO, and Elastic Net models. Models with predictors selected by correlation tend to perform better than those selected by theory. The C.MEAN combination of forecasts shows the best overall performance.
What does the Diebold-Mariano test indicate about the results?
The Diebold-Mariano test results show that several models produce significant differences in predictive accuracy compared to the benchmark, especially the univariate time series models and models with predictors chosen by correlation.
What is the forecasted CPI rate for December 2022?
The forecasted CPI rates for December 2022 for individual models range from -0.01% to -0.44%.
What does the appendix contain?
The appendix contains additional analysis, including structural break tests on the target variable, an overview of the exogenous predictors used, the method used to chose the optimal window length, and a visual representation of forecast evaluation.
How was the optimal window length chosen?
The optimal window size was endogenously determined for the benchmark AR(1) model through out-of-sample experiments with window sizes varying from 100 to 500. The window length that minimized both the MAE and RMSE was selected, and the resulting optimal window length was found to be 198.
- Quote paper
- Niklas Humann (Author), 2023, Forecasting the Monthly US Inflation Rate, Munich, GRIN Verlag, https://www.hausarbeiten.de/document/1331607