Next Article in Journal
Tracing and Evaluating Life-Cycle Carbon Emissions of Urban Multi-Energy Systems
Previous Article in Journal
SOC Balancing and Coordinated Control Based on Adaptive Droop Coefficient Algorithm for Energy Storage Units in DC Microgrid
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identifying the Determinants of Crude Oil Market Volatility by the Multivariate GARCH-MIDAS Model

Economics and Management School of Wuhan University, Wuhan 430072, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Submission received: 7 March 2022 / Revised: 4 April 2022 / Accepted: 14 April 2022 / Published: 17 April 2022
(This article belongs to the Topic Frontier Research in Energy Forecasting)

Abstract

:
Many macro-level variables have been used in forecasting crude oil price volatility. This article aims to identify which variables have the greatest impact and give more accurate predictions. The GARCH-MIDAS model with variable selection enables us to incorporate many variables in a single model. By combining the log-likelihood function with adaptive lasso penalty, three most informative determinants have been identified, namely, macroeconomic uncertainty, financial uncertainty and default yield spread. Out-of-sample results show that using these three variables significantly improves prediction accuracy compared to baseline models. However, the variables widely studied by other scholars, such as the supply and demand of crude oil, industrial production index, etc., were not selected, indicating that the impact of these variables may be overestimated. When studying crude oil price volatility, macroeconomic and financial market uncertainties can be used as effective predictors for investors and market analysts. Crude oil market participants should focus on macroeconomic and financial market uncertainties to make risk management more efficient.

1. Introduction

Crude oil is one of the most important energy sources and is also known as “the blood of industrial economy”. Fluctuations in crude oil market not only directly affect the real economy and price levels but also spill over to financial markets such as stocks and futures [1,2]. Will macro fundamental information have an impact on crude oil volatility? According to the efficient market theory put forward by Fama [3], if the market is semi-strongly efficient, then using fundamentals to analyze the market will be ineffective. In fact, many studies have shown that the market efficiency of crude oil futures is not strong. Chen et al. [4] found that the effectiveness of commodity markets is weaker than that of financial markets, and exchange rates can effectively predict commodity prices. Tabak and Cajueiro [5] show that the crude oil market is not weakly efficient, and Wang and Liu [6] further confirm this finding by using detrended volatility analysis.
What triggers the changes of crude oil volatility has drawn much attention among academics and practitioners (see Literature Review). This paper aims to find the variables that have the greatest impact and to improve the forecasting accuracy. Disentangling the drivers of volatility in crude oil markets is meaningful for several reasons:
  • For policymakers, crude oil price volatility involves energy security issues.
  • For institutional investors, volatility presents an opportunity to capitalize on risk. Commodity derivatives evaluation and hedging are also closely associated with the volatility of the crude oil futures market.
In this paper, a multivariate GARCH-MIDAS model is proposed. We incorporate 20 variables, covering almost all variables used in crude oil volatility forecasting. However, models with numerous variables have to deal with a large number of parameters, which reduces the efficiency of estimation and introduces the problem of overfitting. Thus, the adaptive-lasso method proposed by Zou [7] is combined with the log-likelihood function to reduce the complexity of the model. Inspired by Ghysels and Qian [8], a part of “Beta weighting” parameters is fixed during the estimation process to avoid identification problems. Finally, the Generalized Information Criteria (GIC) proposed by Fan and Tang [9] is used to determine the optimal tuning parameter of adaptive-lasso.
This approach is attractive for two main reasons. First, the multivariate GARCH-MIDAS model has stronger predictive power than a classic GARCH-MIDAS model. Second, variable selection is introduced in the GARCH-MIDAS model, which enables us to identify the most important variables affecting the volatility of crude oil prices.
The empirical results of this paper can be summarized as follows:
(1)
Macroeconomic uncertainty, financial market uncertainty, and default yield spread are selected from those twenty variables. Crude oil supply and demand, which were widely discussed in previous studies, are surprisingly not selected.
(2)
Macroeconomic uncertainty and financial market uncertainty have positive impacts, and default yield spread has negative impacts on the crude oil market volatility.
(3)
The recursive out-of-sample forecast results show that our model significantly outperforms the other GARCH-MIDAS models, illustrating the predictive power of the selected variables.
Wei et al. [10] suggest that uncertainty is the most informative determinant in forecasting crude oil market volatility, and our empirical results yield similar conclusions. Crude oil market volatility has important implications for policymakers’ decisions and investors’ financial strategies, and we recommend focusing on the impact of uncertainty when forecasting crude oil price volatility.
The rest of the paper is organized as follows. Section 2 presents a brief literature review. Section 3 introduces the GARCH-MIDAS model and the multivariate GARCH-MIDAS model with variable selection. Section 4 introduces the data sources of this paper and the descriptive statistics of the variables. Section 5 lists the empirical results, including variable selection, post-selection estimation, and out-of-sample forecasting. Section 6 concludes.

2. Literature Review

In finance, volatility is defined as the variance of a return series, reflecting uncertainty of an asset return. The main methods for measuring volatility based on historical data are: stochastic volatility model (SV) [11], autoregressive conditional heteroskedasticity model (ARCH) [12] and generalized auto regressive conditional heteroskedactic model (GARCH) [13]. Among them, GARCH family models are widely popular and have been researched for more than 20 years [14,15,16,17].
However, most of the studies are strictly limited to imputing data at the same frequency. Crude oil price volatility is based on daily data of crude oil prices, while various macroeconomic variables tend to be measured on a monthly basis, or even longer.
To address the problem of different frequency data, Engle and Rangel [18] first introduced the Spline-GARCH model and decomposed the volatility model into two components: time series trend and macroeconomic impact. The Spline-GARCH model is useful for understanding long-term or low-frequency volatility in the macroeconomic environment, but the model does not relate macroeconomic variables to long-term volatility components. It was not until Engle et al. [19] proposed the GARCH-MIDAS model. The GARCH-MIDAS model has received a lot of attention since it was proposed and has been widely used to study the relationship between volatility and macro-level variables [20,21,22,23,24]. There are also many studies using macro level variables for crude oil volatility prediction [10,25].
Numerous determinants were examined for their capacity to predict crude oil price volatility. The classic explanation is that crude oil price relies on oil demand and supply. Dees et al. [26] suggest that crude oil prices are mainly influenced by oil supply, especially by OPEC production decisions. Kilian [27] splits oil price shocks into supply shocks, demand shocks, and precautionary demand shocks, and points out that the oil price rise until mid-2008 was mainly driven by growth in aggregate demand. Hamilton [28] also argued that changes in crude oil prices are demand driven.
The second category is macroeconomic variables. The discussion on the relationship between macroeconomic variables and volatility can be traced back as far as Schwert [29], and many scholars have further investigated the role of macroeconomic variables on volatility forecasts and the mechanisms of their influence [30,31]. Mo et al. demonstrates that macroeconomic variables are an important driver of crude oil price changes.
Recently, academics have become increasingly concerned about economic policy uncertainty and investor sentiment. Extensive literature suggests that including indicators describing uncertainty and investor sentiment in the model can help improve the accuracy of forecasts [10,32]. Pastor and Veronesi [33] show that uncertainty increases market volatility. Gong and Lin [34] found that, by adding an index that can reflect investor sentiment to the HAR model, the prediction results are significantly improved.
In addition, Chiang et al. [35] detect a stronger correlation and substantial volatility spillovers between equity and crude oil markets, giving grounds for analyzing the predictive power of financial market variables on crude oil volatility. Kang et al. [36] also argue that crude oil markets have become increasingly integrated with financial markets in recent years.
In summary, the variables that have been extensively studied can be broadly classified into four categories: crude oil fundamentals, macroeconomic data, uncertainty index, and financial market data. However, most of these studies consider only one or two variables at a time. It only answers the question “does a variable have an impact”, but it cannot answer the question “which variables have the greatest impact”. Few studies have put numerous variables that may affect crude oil price volatility into the same framework for comparison. We fill these gaps by using both the multivariate GARCH-MIDAS model and adaptive lasso method.

3. Model

3.1. The GARCH-MIDAS Model

The GARCH-MIDAS model is proposed by Engle et al. [19], which consists of both long-term and short-term components that characterize volatility. The short-term component follows a mean reverting high-frequency daily GARCH process, while the long-term component contains a low-frequency explanatory variable and its lag terms.
The crude oil daily log return r i t for the month t = 1 , , T and the day i = 1 , , N t is expressed as follows:
r i , t E i 1 , t r i , t = g i , t τ t ε i , t
ε i , t Γ i 1 N ( 0 , 1 )
where E i 1 , t r i , t is the conditional expectation given information Γ i 1 at day ( i 1 ) . The mean value of crude oil daily returns is very small, and its dynamic characteristics are mainly controlled by its variance. Following the practice of Sadorsky [13], the conditional mean of crude oil returns is replaced with a fixed constant: E i 1 , t r i , t = μ .
The Formula (1) shows that the conditional variance of crude oil daily returns can be divided into two parts: one is the short-term volatility component g i , t , and the other is the long-term volatility component τ t . Suppose the short-term volatility component follows a mean-reverting asymmetric GARCH(1,1) process:
g i , t = ( 1 α β γ / 2 ) + α + γ · 1 r i 1 , t μ < 0 · r i 1 , t μ 2 τ t + β g i 1 , t
where α > 0 , β > 0 and α + β + γ / 2 < 1 , this model guarantees that E [ g i t ] = 1 . The parameter γ contains asymmetric information, and when γ = 0 , the model degenerates into a simple GARCH(1,1) model.
The long-term component can be expressed as follows:
log τ t = m + θ k = 1 K φ k ω 1 , ω 2 X t k
where m is the intercept term, θ measures the influence of low-frequency variables on the long-term volatility; X can be negative, so we take the logarithm of τ t to keep long-term volatility positive. Regarding the weight φ of the low-frequency variable lag term, we adopt the Beta weighting scheme proposed by Ghysels et al. [37]:
φ k ω 1 , ω 2 = ( k / K ) ω 1 1 · ( 1 k / K ) ω 2 1 l = 1 K ( l / K ) ω 1 1 · ( 1 l / K ) ω 2 1
Obviously, the weight φ k is completely determined by the parameters ω 1 and ω 2 . The Beta weighting scheme has the following properties:
  • φ k > 0 for k = 1 , , K , and k = 1 K φ k = 1
  • when β 1 = 1 and β 2 > 1 , the weight is gradually decreased as the lag period increases.
The Beta weighting schemes can generate decaying, hump-shaped, or U-shaped weights.

3.2. Multivariate GARCH-MIDAS Model

The GARCH-MIDAS model is widely used to study the relationship between volatility and low-frequency economic variables. However, these articles tend to focus on only one variable at a time; in fact, many economic and financial variables can affect crude oil futures price volatility.
Therefore, we want a model that can incorporate as many variables as possible to maximize the role of economic fundamentals in predicting crude oil price volatility. On the other hand, putting many macroeconomic variables in the same frame allows for better comparisons. If we can identify the variables with the strongest explanatory and predictive power, we can better understand the transmission mechanism of crude oil price fluctuations.
Rewriting the Formula (3), the multivariate GARCH-MIDAS model is as follows:
log τ t = m + j = 1 J θ j k = 1 K φ k ω j , 1 , ω j , 2 X j , t k
where J is the number of explanatory variables, θ j measures the impact of the jth variable on the long-term volatility; X j , t k is a stationary time series variable that has been appropriately transformed (for example, by taking logarithms and first difference).
The log-likelihood function is given by Equation (6), where Φ represents all parameters to be estimated:
L L F ( Φ ) = 1 2 t = 1 T i = 1 N t log ( 2 π ) + log g i , t ( Φ ) τ t ( Φ ) + r i , t μ 2 g i , t ( Φ ) τ t ( Φ )
The number of parameters in Formula (6) is 3 J + 5 , which is a large number when J is large. With a large number of parameters to be estimated, it can be difficult to identify the variables that exhibit the strongest effects. It is a frequently discussed problem in the MIDAS model. One way is to do variable selection. For example, Marsilli [38] suggested combining the MIDAS model with the LASSO regression proposed by Robert [39]; Siliverstovs and Boriss [40] applied the MIDAS model and the “elastic net” regression introduced by Bai and Ng [41], and proposed the MIDASSO model.
Referring to the practice of the above articles, this paper uses the adaptive lasso regression proposed by Zou [7], and applies the log-likelihood function with a penalty term for variable selection:
P L L F λ ( Φ ) = 1 2 t = 1 T i = 1 N t log ( 2 π ) + log g i , t ( Φ ) τ t ( Φ ) + r i , t μ 2 g i , t ( Φ ) τ t ( Φ ) λ j = 1 J w ^ j θ j
P L L F λ ( Φ ) represents the log-likelihood function with a penalty term given hyperparameter λ , w ^ j is the adaptive weight of θ j . For a given penalty parameter λ , we maximize the log-likelihood function P L L F λ ( Φ ) with a penalty term under the linear constraint α > 0 , β > 0 , and α + β + γ / 2 < 1 . Φ ^ λ represents the optimal parameter value obtained when λ is given.

3.3. Parameter Estimation

The optimal hyperparameter λ is determined according to the Generalized Information Criteria (GIC) proposed by Fan and Tang (2013) [9]. GIC consists of two parts, the first part describes how well the model fits, and the second part penalizes the model complexity, which implies GIC trade-off between model fitting and model complexity:
GIC λ = 1 N 0 2 LLF ( Φ ^ ) P L L F λ Φ ^ λ + a N 0 , p θ ^ λ
a N 0 , p = log log N 0 · log ( p )
where a N 0 , p depends on the total number of samples N 0 , and the number of parameters p = 3 J + 1 in the long-term component. θ ^ λ denotes the number of non-zero elements in θ ^ λ , where θ ^ λ is estimated from Equation (8) given the tuning parameter λ .
To obtain the smallest GIC, we need to try different values of lambda. We select the λ that minimizes G I C λ in the range of [ 0 , λ m a x ] . As the tuning parameter λ increases, some θ is reduced to zero, and the corresponding variables are excluded from the model. In this way, we achieve variable selection, and the variables that ultimately remain in the model are the variables that best explain changes in the volatility of crude oil prices.
However, this approach may encounter identification problems. When the parameter θ j is zero, its corresponding ω j , 1 and ω j , 1 will not enter Equation (7). No matter what value they take, it will not affect the value of P L L F λ ( Φ ) . Thus, ω j , 1 and ω j , 1 are not identified.
To avoid the above problem, we divide all parameters to be estimated into two categories. Let Φ 2 = ω 1 , 1 , ω 1 , 2 , ω 2 , 1 , ω 2 , 2 , ω 20 , 1 , ω 20 , 2 be the Beta weighting parameters, and Φ 1 = ( μ , α , β , γ , m , θ 1 , θ 2 , , θ 20 ) be the rest of parameters. We mainly focused on Φ 1 , while the value of Φ 2 is relatively unimportant and has little impact on the overall result. According to Ghysels and Qian [8], if we set Φ 2 as fixed, which means Φ 2 = Φ ¯ 2 , then the identification problems can be avoided. It will also make the model run faster.
Specifically, we first maximize the Equation (6) and obtain ω ^ j and Φ ^ 2 . Then, when estimating the Formula (7), the adaptive weight is calculated as ω ^ j = 1 / | θ ^ j | η , and Φ 2 = Φ ^ 2 . We take η = 2 as Zou [7] suggested.

4. Data

The data used in this paper consists of two parts: one is daily crude oil prices and the other is the monthly macro-level data. The monthly dataset consists of crude oil fundamental data, macroeconomic data, economic uncertainty index, and financial market data. The macro-level data used in this paper covers almost all variables that are widely used in volatility research.
The spot price for West Texas Intermediate (WTI) crude oil is used in this paper. WTI crude oil has good liquidity and high transparency, and is one of the three benchmark prices in the world crude oil market. Chicago Mercantile Exchange (CME) called WTI crude oil prices “the preferred measure of world oil prices”. Most of the existing literature on crude oil prices and volatility uses WTI crude oil prices. The data interval used in this article is from 3 January 1986 to 30 December 2019. The daily crude oil return r i t is calculated as the log difference of the crude oil price and is multiplied by 100.
The monthly macro-level data consist of 20 variables in four categories. The data interval is also from January 1986 to December 2019. These variables are described in detail below.

4.1. Crude Oil Fundamental Data

Two crude oil fundamental variables are employed in this paper: global crude oil production and global economic index (GEA). The global crude oil production is used to describe the supply of crude oil, and the GEA is used to describe the demand for crude oil.
Kilian [27] and Mu and Ye [42] believe that the main factor of crude oil price fluctuation is the change of crude oil supply and demand. We use global crude oil production data as the supply of crude oil, which can be obtained from the monthly energy review of the US Energy Information Administration (EIA) website.
It is relatively difficult to directly find a variable to measure global crude oil demand. Referring to the practice of Wang et al. [43], this paper introduces the global economic index (GEA) proposed by Kilian [27] as a proxy variable for global crude oil demand. The GEA index is based on dry cargo single voyage ocean freight rates, and reflects the demand for industrial commodities in the global market.

4.2. Macroeconomic Data

Eight macroeconomic variables are employed in this paper: unemployment rate, industrial production index, consumer price index (CPI), producer price index (PPI), real personal consumption expenditure, housing starts, M1 base currency and Chicago national activity index (CFNAI).
CPI and PPI measure the degree of inflation. Housing starts are an indicator of changes in the economic cycle. The CFNAI is a weighted average of 85 national economic activity indicators, which measure overall economic activity and related inflationary pressures.
The earliest discussion of macroeconomic variables and volatility can be traced back to Schwert [29]. Many scholars have further studied the role and impact mechanism of macroeconomic variables on volatility forecasting [30,31]. Pan et al. [25] proved that macroeconomic fundamentals can provide useful information beyond the historical volatility of crude oil prices. Therefore, this paper selects a lot of macroeconomic variables, hoping to further explore the relationship between macroeconomics and crude oil price volatility.

4.3. Economic Uncertainty Index

Four economic uncertainty variables are employed in this paper: macroeconomic uncertainty index, financial uncertainty index, economic policy uncertainty (EPU), and consumer confidence index (CSI).
Uncertainty is defined as the conditional volatility of a disturbance that is unforecastable from the perspective of economic agents. Macroeconomic uncertainty index and financial uncertainty index are both proposed by Jurado et al. [44]. Another measure of uncertainty relies primarily on proxies or indicators of uncertainty, such as the implied or realized volatility. The study by Jurado et al. provides, for the first time, an objective measure of uncertainty and links it to macroeconomic activity.
The economic policy uncertainty (EPU) was proposed by Baker et al. [45]. The EPU reflects the relative frequency of economic (E), policy (P), and uncertainty (U)-related terms in national newspaper articles. Thus, a larger EPU indicates a larger shift in the country’s policy, and may cause serious divergence in the expectations of oil consumers, producers, and speculators.
Qiu and Welch [46] proved that Michigan Consumer Sentiment Index (CSI) can represent investor sentiment well. With this indicator, we can explore whether investor sentiment helps predict crude oil price volatility.

4.4. Financial Market Data

Five financial market variables are employed in this paper: American Stock Exchange (AMEX) oil index, stock market returns of oil industry, crude oil realized volatility (RV), stock variance (svar), term spread (tms), and default yield spread (dfy).
Morana et al. [47] proved that financial speculation can have an impact on the real price of oil. Zhang [48] also proved that financial market data can help predict crude oil prices. In addition, Wang [1] proved that the RV of crude oil has a significant positive impact on stock market volatility. Therefore, we have reason to speculate that the stock market and the crude oil market are inextricably linked.
The AMEX oil index is a classic oil stock index. Ratti et al. [49] and Chen [50] found that the use of oil-sensitive stocks, especially the oil stock index, can help predict crude oil prices.
Oil industry stock market return data are from Fama/French’s personal homepage. Crude Oil Realized Volatility (RV) is the square of the daily crude oil return per month. Stock market volatility (svar) is calculated by the square of the S&P 500 daily return. The term spread (tms) is the difference between the long-term yield on government bonds and the Treasury-bill. The default yield spread is the difference between BAA and AAA-rated corporate bonds yields. The latter three variables are selected from Welch and Goyal [51].

4.5. Descriptive Statistics

Unstandardized variables vary widely in value, and many time series variables have a unit root. Thus, we take the first difference of level data for GEA, unemployment rate, macroeconomic uncertainty index, financial uncertainty index, CSI, term spread, and default yield spread. We consider CFNAI in levels, and take annualized returns on crude oil production, industrial production index, consumer price index, real personal consumption expenditure, housing starts, M1 base currency, and the Amex Oil Index following Engle et al. [19] and Conrad and Loch [22].
Descriptive statistics of processed variables are reported in Table 1.
In addition, the correlation matrix between the variables is given. We use Spearman correlations and report them in Table 2. As can be seen in the table, the correlation between most of the variables is very low, except for AMEX oil index and oil industry returns, term spread and default yield spread.

5. Empirical Analysis

5.1. In-Sample Analysis

The adaptive lasso regression is used to perform variable selection on the above 20 variables. Lasso regression is a biased estimation method, which can refine the model and avoid overfitting by compressing some regression coefficients to zero. As the penalty term λ increases, the bias of the regression will increase, and the number of variables will decrease.
An important purpose of using adaptive lasso regression is to screen out several variables that have the greatest impact on the volatility of crude oil, so as to explain changes in the volatility of crude oil and guide economic decisions. Specifically, in the “In-Sample Analysis” part, we will do the following four steps:
  • Estimate the GARCH-MIDAS model with all 20 variables in Equation (6) and obtain the parameter estimates θ ^ j and Φ ^ 2 . Calculate the adaptive weights ω ^ j = 1 / | θ ^ j | η .
  • Estimate the GARCH-MIDAS model with variable selection in Equation (7) conditional on Φ 2 = Φ ^ 2 , with the tuning parameter λ on a grid of [ 0 , λ m a x ] . Obtain the parameter estimates Φ ^ 1 , and calculate the GIC for each value of λ .
  • Determine the optimal λ by GIC, and obtain the selected variables.
  • Do post-selection estimate with the selected variables, and test their significance.
To find a smallest GIC, the tuning parameter λ is considered over a 101-point grid of [0, 10] with an increment of 0.1. From Equation (8), it can seen that, as λ increases, the gap between PLLF and LLF will expand, and the first part of the formula will increase; at the same time, the number of variables will decrease, so the second part of the formula will decrease.
Figure 1 plots GIC and the number of variables for all lambdas.
When λ = 0 , all variables have a large initial value. With the increase of λ , some of these variables rapidly decay to 0, while others show a trend of oscillation and decay. The left figure shows that GIC generally decreases first and then increases, and, when a variable is excluded from the model, GIC may jump. The right figure shows that, as λ increases, the number of variables decreases, in the form of a step function. When λ = 7.6 , there are three non-zero variables, which are macroeconomic uncertainty, financial uncertainty, and default yield spread.
It can be seen from Figure 2 that the oil industry stock market returns is excluded from the model at λ = 0.2 , indicating that it has a small effect on crude oil price volatility. When λ = 7.6 , the 17th variable (unemployment rate) is excluded from the model (marked by vertical dashed lines in the figure), at which time the last three variables remain in the model and the GIC reaches its minimum; when λ = 10 , macroeconomic uncertainty and financial market uncertainty remain in the model. Because the GIC has been shown to be monotonically increasing at this time, this paper does not continue to report the situation after λ is greater than 10.
The macroeconomic uncertainty index and financial uncertainty index were constructed and proposed by Jurado et al. [44] in an article titled “Measuring Uncertainty”. That article aims to provide an economic estimate of uncertainty that is as reasonable as possible, which are neither affected by the structure of a particular theoretical model nor depend on any single (or small number) of observable economic indicators.
Multivariate GARCH-MIDAS models provide us with a new perspective on variable selection. The empirical results of this paper show that uncertainty is the most important determinant in explaining crude oil market volatility.
After variable selection, we do post-selection estimation with the selected variables. The long-term volatility component can be expressed as:
log τ t = m + θ U M k = 1 36 φ k ω 11 , ω 12 U M t K + θ U F k = 1 36 φ k ω 21 , ω 22 U F t k + θ D F k = 1 36 φ k ω 31 , ω 32 D F t k
where U M , U F , and D F represent the macroeconomic uncertainty index, financial uncertainty index, and bond credit spread, respectively. Thirty-six is the number of lag periods, i.e., using the last 36 months of lagged data.
All parameter estimates are reported in Table 3:
The estimate of θ is the most noteworthy, as it measures the impact of selected variables on long-term crude oil price volatility.
The θ of macroeconomic uncertainty and financial uncertainty are 0.736 and 1.121, respectively, and they are both significant at the 1% level, indicating that the increase of macroeconomic and financial uncertainty will have a positive impact on the long-term component of crude oil price volatility. Research by Pastor and Veronesi [33] shows that uncertainty increases market volatility. The empirical results of this paper continue to provide evidence for this view.
The θ of the default yield spread is −1.24, which is also significant at the 1% level, indicating that the default yield spread is negatively correlated with crude oil volatility. Default yield spread will shrink with the expansion of the economic cycle, and, when the economic cycle expands, the volatility of crude oil prices tends to increase. At the same time, default yield spread tends to have a strong lag effect, which is also one of the reasons why it is negatively correlated with crude oil price volatility.

5.2. Out-of-Sample Forecast Evaluations

To evaluate the out-of-sample forecast performance, the predicted volatility need to be compared with the true volatility. However, the real volatility is unobservable, and the 5-min realized volatility is generally used as a proxy for the real volatility [52].
The full sample is split into an estimated sample from January 1986 to December 2015 (360 months) and forecast period from January 2016 to December 2019 (48 months). The mean squared error of the forecast is given by:
MSFE = 1 N 0 t = 1 360 N t t = 361 T i = 1 N t R V i , t 5   min τ ^ t g ^ i , t 2
where N 0 is the total number of daily observations, N t is the number of days in month t; τ ^ t g ^ i , t is our daily volatility forecast.
The performance of our model is compared with the following models:
  • Basic GARCH model;
  • Univariate GARCH-MIDAS models (20 in total);
  • GARCH-MIDAS model with all 20 variables.
Specifically, the model in this paper is used as the benchmark model, and reported M S F E / M S F E b e n c h m a r k . A ratio above 1 indicates that our baseline model is performing better.
In addition, the significance level of the difference between them is given by the GW test proposed by Giacomini and White [53]. As can be seen in the Table 4, the model selected in this paper outperforms all compared models by a significant margin.

6. Conclusions

Many macro-level variables are believed to affect crude oil price volatility. This article explores which variables have the greatest impact.
We combine the log-likelihood function with adaptive lasso penalty to estimate a GARCH-MIDAS model with variable selection. By maximizing the penalized log-likelihood function under linear constraints, three most informative determinants are identified, namely, macroeconomic uncertainty, financial uncertainty, and default yield spread. The post-selection estimation results show a positive impact of macroeconomic uncertainty and financial uncertainty, which is consistent with Pastor and Veronesi [33]. While uncertainty is often used in volatility studies, this paper is, to our best knowledge, the first to demonstrate that macroeconomic uncertainty and financial market uncertainty have the most significant impact on crude oil price volatility. In addition, variables widely used by other scholars, such as crude oil supply and demand, industrial production index, etc., are surprisingly not selected.
The out-of-sample results show that, compared with the other GARCH-MIDAS model, the use of these three variables selected in this paper can significantly improve the prediction accuracy, further demonstrating the explanatory and predictive power of these three variables.
The academic opinion on the drivers of crude oil prices has gradually changed. In the mid to late 20th century, the supply of crude oil was largely monopolized by OPEC. Crude oil prices were most influenced by OPEC’s supply, and the prevailing academic view at this time was that crude oil prices were “supply driven” [26]. At the beginning of the 20th century, with the rapid development of some emerging countries and the high demand for crude oil, the dominant theory was “demand driven”, represented by Kilian [27]. In recent years, as crude oil has become increasingly financialized and its price has fluctuated dramatically in response to expectations, economic uncertainty has become a non-negligible component, including the COVID-19 pandemic and the recent war conflict in Ukraine. It may still take a long time for people to fully understand the mechanics of crude oil price volatility. However, the paper does provide some evidence for the idea that uncertainty has the greatest impact on crude oil price volatility.
Finally, some advice is given based on our empirical result. When forecasting crude oil prices, crude oil market participants should focus on macroeconomic and financial market uncertainties to make risk management more efficient. When studying crude oil price volatility, macroeconomic and financial market uncertainties can be used as effective predictors for investors and market analysts.

Author Contributions

Conceptualization, O.-C.C. and C.Y.; methodology, O.-C.C.; software, C.Y.; writing—original draft preparation, O.-C.C. and C.Y.; writing—review and editing, O.-C.C. and C.Y.; funding acquisition, O.-C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by MOE (Ministry of Education in China) Project of Humanities and Social Sciences Grant No. 18YJC790245.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. The daily price data of WTI and global crude oil production data can be downloaded from the website of the U.S. Energy Information Administration. The GEA index is available from the website of the Federal Reserve Bank of Dallas. CPI, PPI, and CFNAI are from the Federal Reserve Bank of Chicago (FRBC). Unemployment rate, industrial production index, real personal consumption expenditure, housing starts and M1 base currency are from the Philadelphia Fed’s Real-Time Data Research Center (RDRC). Macroeconomic uncertainty index and financial uncertainty index accessed on the author’s personal homepage (https://www.sydneyludvigson.com/data-and-appendixes/, accessed on 13 April 2022. EPU accessed on http://www.policyuncertainty.com/, accessed on 13 April 2022. CSI accessed on the website https://data.sca.isr.umich.edu/data-archive/mine.php, accessed on 13 April 2022. The Amex Oil index data are from Yahoo Finance. The 5-min continuous crude oil contract price data were obtained from kibot.com, accessed on 13 April 2022.

Acknowledgments

The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wang, J.; Huang, Y.; Ma, F.; Chevallier, J. Does high-frequency crude oil futures data contain useful information for predicting volatility in the US stock market? New evidence. Energy Econ. 2020, 91, 104897. [Google Scholar] [CrossRef]
  2. Wen, F.; Xiao, J.; Huang, C.; Xia, X. Interaction between oil and US dollar exchange rate: Nonlinear causality, time-varying influence and structural breaks in volatility. Appl. Econ. 2018, 50, 319–334. [Google Scholar] [CrossRef]
  3. Chairman, B.; Fama, E.F. Efficient capital markets: A review of theory and empirical work. J. Financ. 1970, 25, 383–417. [Google Scholar]
  4. Chen, Y.C.; Rogoff, K.; Rossi, B. Can Exchange Rates Forecast Commodity Prices? In The Quarterly Journal of Economics; MIT Press: Cambridge, MA, USA, 2010; Volume 125, pp. 1145–1194. [Google Scholar]
  5. Tabak, B.M.; Cajueiro, D.O. Are the crude oil markets becoming weakly efficient over time? A test for time-varying long-range dependence in prices and volatility. Energy Econ. 2007, 29, 28–36. [Google Scholar] [CrossRef]
  6. Wang, Y.; Li, L. Is WTI crude oil market becoming weakly efficient over time?: New evidence from multiscale analysis based on detrended fluctuation analysis. Energy Econ. 2010, 32, 987–992. [Google Scholar] [CrossRef]
  7. Hui, Z. Taylor and Francis Online: The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar]
  8. Ghysels, E.; Qian, H. Estimating MIDAS regressions via OLS with polynomial parameter profiling. Econom. Stat. 2018. [Google Scholar] [CrossRef]
  9. Fan, Y.; Tang, C.Y. Tuning parameter selection in high dimensional penalized likelihood. J. R. Stat. Soc. 2013, 75, 531–552. [Google Scholar] [CrossRef] [Green Version]
  10. Yu, W.; Liu, J.; Lai, X.; Hu, Y. Which determinant is the most informative in forecasting crude oil market volatility: Fundamental, speculation, or uncertainty? Energy Econ. 2017, 68, 141–150. [Google Scholar]
  11. Vo, M.T. Regime-switching stochastic volatility: Evidence from the crude oil market. Energy Econ. 2009, 31, 779–788. [Google Scholar] [CrossRef]
  12. Cheong, C.W. Modeling and forecasting crude oil markets using ARCH-type models. Energy Policy 2009, 37, 2346–2355. [Google Scholar] [CrossRef]
  13. Sadorsky, P. Modeling and forecasting petroleum futures volatility. Energy Econ. 2006, 28, 467–488. [Google Scholar] [CrossRef]
  14. Mohammadi, H.; Su, L. International evidence on crude oil price dynamics: Applications of ARIMA-GARCH models. Energy Econ. 2010, 32, 1001–1008. [Google Scholar] [CrossRef]
  15. Huang, Z.; Liu, H.; Wang, T. Modeling long memory volatility using realized measures of volatility: A realized HAR GARCH model. Econ. Model. 2016, 52, 812–821. [Google Scholar] [CrossRef]
  16. Salisu, A.; Fasanya, I.O. Modelling oil price volatility with structural breaks. Energy Policy 2013, 52, 554–562. [Google Scholar] [CrossRef]
  17. Klein, T.; Walther, T. Oil Price Volatility Forecast with Mixture Memory GARCH. Energy Econ. 2016, 58, 46–58. [Google Scholar] [CrossRef] [Green Version]
  18. Engle, R.F.; Rangel, J.G. The Spline-GARCH Model for Low-Frequency Volatility and Its Global Macroeconomic Causes. Rev. Financ. Stud. 2008, 21, 1187–1222. [Google Scholar] [CrossRef]
  19. Engle, R.F.; Ghysels, E.; Sohn, B. Stock market volatility and macroeconomic fundamentals. Rev. Econ. Stat. 2013, 95, 776–797. [Google Scholar] [CrossRef]
  20. Conrad, C.; Loch, K.; Rittler, D. On the macroeconomic determinants of long-term volatilities and correlations in U.S. stock and crude oil markets. J. Empir. Financ. 2012, 29, 26–40. [Google Scholar] [CrossRef]
  21. Asgharian, H.; Hou, A.J.; Javed, F. The Importance of the Macroeconomic Variables in Forecasting Stock Return Variance: A GARCH-MIDAS Approach. J. Forecast. 2013, 32, 600–612. [Google Scholar] [CrossRef] [Green Version]
  22. Conrad, C.; Loch, K. Anticipating Long-Term Stock Market Volatility. J. Appl. Econom. 2014, 30, 394. [Google Scholar]
  23. Su, Z.; Fang, T.; Yin, L. The role of news-based implied volatility among US financial markets. Econ. Lett. 2017, 157, 24–27. [Google Scholar] [CrossRef]
  24. Conrad, C.; Kleen, O. Two are better than one: Volatility forecasting using multiplicative component GARCH-MIDAS models. J. Appl. Econom. 2020, 35, 19–45. [Google Scholar] [CrossRef] [Green Version]
  25. Pan, Z.; Wang, Y.; Wu, C.; Yin, L. Oil price volatility and macroeconomic fundamentals: A regime switching GARCH-MIDAS model. J. Empir. Financ. 2017, 43, 130–142. [Google Scholar] [CrossRef]
  26. Dees, S.; Karadeloglou, P.; Kaufmann, R.K.; Sanchez, M. Modelling the world oil market: Assessment of a quarterly econometric model. Energy Policy 2007, 35, 178–191. [Google Scholar] [CrossRef]
  27. Kilian, L. Not All Oil Price Shocks Are Alike: Disentangling Demand and Supply Shocks in the Crude Oil Market. Am. Econ. Rev. 2009, 99, 1053–1069. [Google Scholar] [CrossRef] [Green Version]
  28. Hamilton, J.D. Understanding Crude Oil Prices. Energy J. 2009, 30, 179–206. [Google Scholar] [CrossRef] [Green Version]
  29. Schwert, G.W. Why Does Stock Market Volatility Change Over Time? J. Financ. 1989, 44, 1115–1153. [Google Scholar] [CrossRef]
  30. Paye, B.S. ‘Déjà vol’: Predictive regressions for aggregate stock market volatility using macroeconomic variables. J. Financ. Econ. 2012, 106, 527–546. [Google Scholar] [CrossRef] [Green Version]
  31. Charlotte, C.; Maik, S.; Andreas, S.; Andreas, S. A comprehensive look at financial volatility prediction by economic variables. J. Appl. Econom. 2012, 27, 956–977. [Google Scholar] [CrossRef] [Green Version]
  32. Gkillas, K.; Gupta, R.; Pierdzioch, C. Forecasting Realized Oil-Price Volatility: The Role of Financial Stress and Asymmetric Loss. J. Int. Money Financ. 2019, 104, 102137. [Google Scholar] [CrossRef]
  33. Pastor, L.; Veronesi, P. Uncertainty about Government Policy and Stock Prices. J. Financ. 2012, 67, 1219–1264. [Google Scholar] [CrossRef]
  34. Gong, X.; Lin, B. The incremental information content of investor fear gauge for volatility forecasting in the crude oil futures market. Energy Econ. 2018, 74, 370–386. [Google Scholar] [CrossRef]
  35. Chiang, I.H.E.; Hughen, W.K.; Sagi, J.S. Estimating Oil Risk Factors Using Information from Equity and Derivatives Markets. J. Financ. 2015, 70, 769–804. [Google Scholar] [CrossRef]
  36. Kang, B.; Nikitopoulos, C.S.; Prokopczuk, M. Economic determinants of oil futures volatility: A term structure perspective. Energy Econ. 2020, 88, 104743. [Google Scholar] [CrossRef]
  37. Ghysels, E.; Valkanov, A. MIDAS Regressions: Further Results and New Directions. Econom. Rev. 2007, 26, 53–90. [Google Scholar] [CrossRef]
  38. Marsilli, C. Variable Selection in Predictive MIDAS Models. SSRN J. 2014. [Google Scholar] [CrossRef] [Green Version]
  39. Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  40. Boriss, S. Short-term forecasting with mixed-frequency data: A MIDASSO approach. Appl. Econ. 2017, 49, 1326–1343. [Google Scholar] [CrossRef] [Green Version]
  41. Bai, J.; Ng, S. Forecasting economic time series using targeted predictors. J. Econom. 2009, 146, 304–317. [Google Scholar] [CrossRef] [Green Version]
  42. Mu, X.; Ye, H. Understanding the Crude Oil Price: How Important Is the China Factor? Energy J. 2011, 32. [Google Scholar] [CrossRef]
  43. Wang, Y.; Liu, L. Crude oil and world stock markets: Volatility spillovers, dynamic correlations, and hedging. Empir. Econ. 2016, 50, 1481–1509. [Google Scholar] [CrossRef]
  44. Jurado, K.; Ludvigson, S.C.; Ng, S. Measuring Uncertainty. Am. Econ. Rev. 2015, 105, 1177–1216. [Google Scholar] [CrossRef]
  45. Baker, S.R.; Bloom, N.; Davis, S.J. Measuring Economic Policy Uncertainty. Q. J. Econ. 2016, 131, 1593–1636. [Google Scholar] [CrossRef]
  46. Qiu, L.X.; Welch, I. Investor Sentiment Measures. NBER Work. Pap. 2004, 117, 367–377. [Google Scholar]
  47. Morana, C. Oil price dynamics, macro-finance interactions and the role of financial speculation. J. Bank. Financ. 2013, 37, 206–226. [Google Scholar] [CrossRef] [Green Version]
  48. Zhang, Y.J.; Wang, J.L. Do high-frequency stock market data help forecast crude oil prices? Evidence from the MIDAS models. Energy Econ. 2019, 78, 192–201. [Google Scholar] [CrossRef]
  49. Ratti, K. Oil shocks, policy uncertainty and stock market return. J. Int. Financ. Mark. Inst. Money 2013, 26, 305–318. [Google Scholar]
  50. Chen, S.S. Forecasting Crude Oil Price Movements with Oil-Sensitive Stocks. Econ. Inq. 2014, 52, 830–844. [Google Scholar] [CrossRef]
  51. Welch, I.; Goyal, A. A Comprehensive Look at the Empirical Performance of Equity Premium Prediction. Rev. Financ. Stud. 2009, 21, 1455–1508. [Google Scholar] [CrossRef]
  52. Ghysels, E.; Sinko, A. Volatility forecasting and microstructure noise. J. Econom. 2011, 160, 257–271. [Google Scholar] [CrossRef]
  53. Giacomini, R.; White, H. Tests of Conditional Predictive Ability. Econometrica 2006, 74, 1545–1578. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Plots the trend of GIC and the number of variables with lambda.
Figure 1. Plots the trend of GIC and the number of variables with lambda.
Energies 15 02945 g001
Figure 2. Shows the value of λ for each variable when the coefficient θ drops to zero.
Figure 2. Shows the value of λ for each variable when the coefficient θ drops to zero.
Energies 15 02945 g002
Table 1. Descriptive statistics of processed variables.
Table 1. Descriptive statistics of processed variables.
VariableMeanStd.Min.Max.SkewKurt.Uint Root Test
WTI log return0.012.5−40.6419.15−0.6313.53−19.38 ***
Oil Production1.971.69−57.2572.150.476.28−9.69 ***
GEA0.070.6−100.1969.46−0.715.9−9.32 ***
Unemployment rate−0.010−0.50.50.391.07−3.77 **
Industrial production2.22.4−40.7727.93−0.763.85−5.32 ***
CPI2.62.61−19.2917.83−0.868.61−7.40 ***
PPI2.272.44−30.8625.41−0.352.48−6.67 ***
Personal consumption2.962.9−26.2733.170.688.07−5.24 ***
Housing starts49.1−4.19−91.211219.033.5717.35−7.51 ***
Monetary base6.194.57−33.2995.612.8617.65−4.97 ***
CFNAI−0.040.02−2.60.6−2.5710.02−3.78 **
Macroeconomic uncertainty00.15−0.540.810.925.09−6.24 ***
Financial uncertainty0.010.15−1.952.140.46.46−7.13 ***
EPU114.08103.744.78284.141.271.78−4.15 ***
CSI0.01−0.2−12.717.30.011.41−8.40 ***
AMEX oil index34.0911.06−92.8679.52.7910.93−7.54 ***
Oil industry returns0.920.98−18.2919.13−0.10.89−7.59 ***
RV0.010.0100.266.667.21−6.77 ***
svar0.270.140.027.098.4188.13−5.52 ***
Term spread−0.05−0.2−11.69.60.440.95−5.55 ***
Default yield spread−0.01−0.01−7.236.9−0.667.66−8.23 ***
*** and ** indicate rejection of the presence of a unit root at the 1% and 5% significance levels, respectively.
Table 2. Spearman correlation between variables.
Table 2. Spearman correlation between variables.
prodGEAunemiptcpippipchsmbcfnaiuimuifepucsioimktrvsvartsdfy
prod1.000.01−0.070.16−0.01−0.05−0.060.04−0.010.080.050.07−0.04−0.03−0.020.00−0.060.00−0.07−0.12
GEA0.011.000.000.070.080.100.05−0.04−0.040.01−0.09−0.05−0.01−0.040.020.030.00−0.01−0.04−0.02
unem−0.070.001.00−0.230.030.00−0.05−0.090.01−0.31−0.010.020.14−0.02−0.01−0.030.210.200.010.04
ipt0.160.07−0.231.000.00−0.030.210.05−0.130.54−0.100.03−0.18−0.07−0.05−0.04−0.23−0.11−0.05−0.09
cpi−0.010.080.030.001.000.64−0.11−0.02−0.060.020.090.02−0.14−0.180.020.04−0.13−0.060.110.08
ppi−0.050.100.00−0.030.641.00−0.060.06−0.04−0.030.080.00−0.06−0.090.070.09−0.13−0.060.140.08
pc−0.060.05−0.050.21−0.11−0.061.000.07−0.120.27−0.01−0.04−0.150.020.100.110.00−0.080.010.02
hs0.04−0.04−0.090.05−0.020.060.071.000.010.03−0.03−0.060.00−0.05−0.05−0.06−0.10−0.030.000.00
mb−0.01−0.040.01−0.13−0.06−0.04−0.120.011.00−0.22−0.02−0.110.340.06−0.05−0.06−0.05−0.020.100.11
cfnai0.080.01−0.310.540.02−0.030.270.03−0.221.00−0.020.07−0.400.050.060.05−0.21−0.23−0.20−0.20
uim0.05−0.09−0.01−0.100.090.08−0.01−0.03−0.02−0.021.000.290.02−0.08−0.11−0.110.060.02−0.01−0.12
uif0.07−0.050.020.030.020.00−0.04−0.06−0.110.070.291.00−0.14−0.08−0.17−0.190.010.020.02−0.12
epu−0.04−0.010.14−0.18−0.14−0.06−0.150.000.34−0.400.02−0.141.00−0.07−0.13−0.130.070.270.070.10
csi−0.03−0.04−0.02−0.07−0.18−0.090.02−0.050.060.05−0.08−0.08−0.071.000.010.01−0.05−0.14−0.050.03
oi−0.020.02−0.01−0.050.020.070.10−0.05−0.050.06−0.11−0.17−0.130.011.000.95−0.11−0.200.100.19
mkt0.000.03−0.03−0.040.040.090.11−0.06−0.060.05−0.11−0.19−0.130.010.951.00−0.12−0.190.110.18
rv−0.060.000.21−0.23−0.13−0.130.00−0.10−0.05−0.210.060.010.07−0.05−0.11−0.121.000.38−0.03−0.04
svar0.00−0.010.20−0.11−0.06−0.06−0.08−0.03−0.02−0.230.020.020.27−0.14−0.20−0.190.381.000.04−0.04
ts−0.07−0.040.01−0.050.110.140.010.000.10−0.20−0.010.020.07−0.050.100.11−0.030.041.000.80
dfy−0.12−0.020.04−0.090.080.080.020.000.11−0.20−0.12−0.120.100.030.190.18−0.04−0.040.801.00
Table 3. Parameter estimates after variable selection.
Table 3. Parameter estimates after variable selection.
μ α β γ m θ U M θ U F θ D F
0.0280.051 ***0.944 ***0.009−0.3120.736 ***1.121 ***−1.240 ***
(0.033)(0.016)(0.014)(0.010)(0.865)(0.242)(0.256)(0.472)
ω 11 ω 12 ω 21 ω 22 ω 31 ω 32
1.265 *6.0123.453 **4.386 **1.0004.426 ***
(0.784)(8.698)(1.478)(1.802)(1.141)(2.130)
The first line under the parameter is the estimated value, and the standard deviation is in parentheses. ***, ** and * means significant at the 1%, 5%, and 10% levels, respectively.
Table 4. Out-of-sample prediction results.
Table 4. Out-of-sample prediction results.
GARCH Model1.2101 (−) ***
Univariate GARCH-MIDAS model:
Oil Production1.0131 (−) ***Macroeconomic uncertainty1.0091 (−) ***
GEA1.0140 (−) ***Financial uncertainty1.0294 (−) ***
Unemployment rate1.0082 (−) **EPU1.0100 (−) ***
Industrial production1.0021 (−) **CSI1.0133 (−) ***
CPI1.0091 (−) ***AMEX oil index1.0156 (−) ***
PPI1.0127 (−) ***Oil industry returns1.0129 (−) ***
Personal consumption1.0294 (−) ***RV1.0107 (−) ***
Housing starts1.0101 (−) ***svar1.0091 (−) ***
Monetary base1.0085 (−) **Term spread1.0123 (−) ***
CFNAI1.0137 (−) ***Default yield spread1.0117 (−) ***
GARCH-MIDAS model with all 20 variables1.0317 (−) ***
A negative sign indicates that it is worse than the model selected in this article. *** and ** means significant at the 1% and 5% levels, respectively.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chuang, O.-C.; Yang, C. Identifying the Determinants of Crude Oil Market Volatility by the Multivariate GARCH-MIDAS Model. Energies 2022, 15, 2945. https://0-doi-org.brum.beds.ac.uk/10.3390/en15082945

AMA Style

Chuang O-C, Yang C. Identifying the Determinants of Crude Oil Market Volatility by the Multivariate GARCH-MIDAS Model. Energies. 2022; 15(8):2945. https://0-doi-org.brum.beds.ac.uk/10.3390/en15082945

Chicago/Turabian Style

Chuang, O-Chia, and Chenxu Yang. 2022. "Identifying the Determinants of Crude Oil Market Volatility by the Multivariate GARCH-MIDAS Model" Energies 15, no. 8: 2945. https://0-doi-org.brum.beds.ac.uk/10.3390/en15082945

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop