Dr. Sébastien Laurent, the copyright owner of G@RCH, holds a Ph.D. (2002) in Financial
Econometrics from Maastricht University (The Netherlands). In 20022003 he worked as
postdoc researcher at the CREST (Laboratory of FinanceInsurance) in Paris (France). He
spent six years as associate professor in Econometrics at the Faculté NotreDame de la Paix
in Namur (Belgium) and is currently associate professor in financial econometrics at
Maastricht University, School of Business and Economics. He is also fellow of the CORE in
LouvainlaNeuve (Belgium). In 2008 and 2009 he has been invited professor at the London
School of Economics (Finance Department). Sébastien has published in Journal of Business
and Economic Statistics, Journal of Applied Econometrics, Journal of Financial
Econometrics, European Economic Review and Journal of Empirical Finance among
others.
Web Site: http://www.core.ucl.ac.be/~laurent/
Email: s.laurent@maastrichtuniversity.nl
Dr. Kris Boudt holds a Ph.D. (2008) in Applied Economics at K.U.Leuven, Belgium. He is
afliated researcher at K.U.Leuven and since 2008 assistant professor of Finance at
Lessius University College. His main research interests are empirical nance, risk
management and robust statistics. His research has been published in the Computational
Statistics and Data Analysis, Metrika, Journal of Risk and RISK magazine, among
others.
Web Site: http://www.econ.kuleuven.be/public/N06054/
Email: Kris.Boudt@econ.kuleuven.be
Jerome Lahaye graduated from the HEC Management School of the University of Liège
(Belgium) where he subsequently worked as teaching and research assistant (20002003). He
holds a master of advanced studies in economics from the University of Leuven (2004) and a
Ph.D. (2009) in Economics at the Faculté NotreDame de la Paix in Namur (Belgium). He is
currently postdoc at HEC Lausanne. His research interests are at the crossroads of applied
econometrics and international finance. In particular, his research has focused on the link
between economic information disclosure and discontinuities on financial time
series.
Email: jerome.lahaye@fundp.ac.be
JeanPhilippe Peters has been involved in the G@RCH project until version 4.2. He
graduated from the HEC Management School of the University of Liège (Belgium) in 2000
with a M.Sc. in Management Sciences. His final dissertation, entitled “Development of a
Package in the Ox Environment for GARCH Models and Two Empirical Applications” was
the founding ground of the current package. From 2000 to 2002, he has worked as researcher
in the Operations Research and Production Management unit of the University of Liège. In
October 2002 he joined the Advisory and Consulting department of Deloitte Luxembourg
where he is involved in consulting activities on risk management for the financial industry.
JeanPhilippe also holds a Ph.D. (2010) in Economics from the HEC Management School of
the University of Liège (Belgium).
Dr. Jeroen VK Rombouts holds a Ph.D. (2004) in Econometrics from the Catholic
University of Louvain and is since 2004 assistant professor HEC Montreal. He is elected
member of the International Statistical Institute, reseacher at CIRANO (Canada) and research
associate at CORE (Belgium). Jeroen has published in Econometric Theory, Journal of
Applied Econometrics, Econometric Reviews and The Econometrics Journal among
others.
Web Site: http://neumann.hec.ca/pages/jeroen.rombouts/
Email: Jeroen.Rombouts@hec.ca
Francesco Violante graduated from the Bocconi University, Milan (Italy) where he worked
as teaching and research assistant in 2004 at Centre of Research on Innovation and
Internationalization (CESPRI). He holds a master of arts in economics from the Universitè
Catholique de Louvain (Belgium) and is currently Ph.D. candidate of the Louvain
Academy, at the University of Namur (2006). His research interests are focused on
applied econometrics and finance and in particular realized volatility and forecast,
multivariate GARCH models, volatility models comparison and selection. In 2007, he was
visiting scholar at the HEC Management School of the University of Montreal
(Canada).
Email: francesco.violante@fundp.ac.be
This book documents G@RCH 6.1, an OxMetrics module (see Doornik, 2009) dedicated to the estimation and forecasting of univariate and multivariate ARCHtype models (Engle, 1982), including some of the most recent contributions in this field. Univariate GARCH (Bollerslev, 1986), EGARCH (Nelson, 1991), GJR (Glosten, Jagannathan, and Runkle, 1993), APARCH (Ding, Granger, and Engle, 1993), IGARCH (Engle and Bollerslev, 1986), RiskMetrics (J.P.Morgan, 1996), FIGARCH (Baillie, Bollerslev, and Mikkelsen, 1996a and Chung, 1999), FIEGARCH (Bollerslev and Mikkelsen, 1996), FIAPARCH (Tse, 1998) and HYGARCH (Davidson, 2001) specifications are available for the conditional variance. Moreover an AR(FI)MA process can be specified in the conditional mean (see Baillie, Chung, and Tieslau, 1996b, Tschernig, 1995, Teyssière, 1997, or Lecourt, 2000, for further details about ARFIMA models). All these models can be adapted to add an “ARCHinmean” term in the conditional mean as suggested by Engle, Lilien, and Robbins (1987).
G@RCH 6.1 allows the estimation of some of the most widespread multivariate GARCH models, including the scalar BEKK, diagonal BEKK(Engle and Kroner, 1995), RiskMetrics (J.P.Morgan, 1996), CCC (Bollerslev, 1990), DCC (Engle, 2002 and Tse and Tsui, 2002), DECO (Engle and Kelly, 2007), OGARCH (Kariya, 1988 and Alexander and Chibumba, 1997) and GOGARCH models (van der Weide, 2002 and Boswijk and van der Weide, 2006).
G@RCH 6.1 also allows the estimation of univariate and multivariate nonparametric estimators of the quadratic variation and the integrated volatility. These estimators include the realized volatility, bipowervariation and realized outlyingness weighted variance. Several tests for daily and intradaily tests for jumps and intraday periodicity filters are also provided.
This software has been developed with Ox 6.0, a matrix programming language described in Doornik (2007b).^{1} G@RCH 6.1 should be compatible with a lot of platforms, including Windows, Mac OsX, Linux, Unix and Solaris when it is used in combination with the Ox Console. Furthermore, G@RCH 6.1 provides a menudriven graphical interface for Microsoft Windows, Mac OsX and Linux users as well as a comprehensive HTML documentation. For most of the specifications, it is generally very fast and one of its main characteristic is its ease of use.
This book is structured as follows:
G@RCH 6.1 is a software dedicated to the estimation and the forecasting of univariate and multivariate (G)ARCH models and many of its extensions. It can be used within OxMetrics or via the classic programming way (using OxEdit for instance) for those who have access to the Ox programming language.
The available univariate models are all ARCHtype models. These include ARCH, GARCH, EGARCH, GJR, APARCH, IGARCH, RiskMetrics, FIGARCH , FIEGARCH , FIAPARCH and HYGARCH. They can be estimated by approximate (Quasi) Maximum Likelihood under one of the four proposed distributions for the errors (Normal, Studentt, GED or skewedStudent). Moreover, ARCHinmean models are also available and explanatory variables can enter the conditional mean and/or the conditional variance equations.
G@RCH 6.1 also propose the some multivariate GARCH specifications including the scalar BEKK, diagonal BEKK, full BEKK, RiskMetrics, CCC, DCC, DECO, OGARCH and GOGARCH models.
Finally, hstepsahead forecasts of both equations are available as well as many univariate and multivariate missspecification tests (Nyblom, Sign Bias Tests, Pearson goodnessoffit, BoxPierce, ResidualBased Diagnostic for conditional heteroscedasticity, Hosking’s portmanteau test, Li and McLead test, constant correlation test, …).
This documentation refers to version 6.0 of G@RCH, which can be used in three different ways. The first one is through the menudriven environment of OxMetrics, similarly to PcGive, PcGets or STAMP.
Second, one can estimate the models via the “Batch Editor” of OxMetrics.
Finally, Ox users can write Ox codes based on the Garch, MGarch and Realized classes. This solution is found to be particularly interesting when nonstandard operations are considered (like a whole catalogue of estimations). A set of examples is provided with the G@RCH package. Please note however that this solution requires a registered version of Ox Professional or its console version. Check the Ox web site (http://www.doornik.com) for details.
The first method is illustrated in Chapters 3 and 4 while the next two are presented in Chapter 5.
This will produce the following output:
This manual has been revised to account for the following new features:
Suggestions, mistakes, typos and possible improvements can be reported to the first author via email: Sebastien.Laurent@fundp.ac.be
The most appropriate way to discuss about problems and issues of the G@RCH package is the Oxusers discussion list. The oxusers discussion group is an emailbased forum to discuss any problems related to Ox programming, and share code and programming solutions. Consult the online help of OxMetrics for information on joining the list.
The full independent version of G@RCH is exclusively distributed by Timberlake
Consultants.
Contact information for the UK:
Phone: +44 (0)20 8697 3377 or
Internet: http://www.timberlake.co.uk
For the US:
Phone: +1 908 686 1251 or
Internet: http://www.timberlakeconsultancy.com
For other countries, please go to the Timberlake web sites for contact information.
There are different versions of G@RCH. A console version of G@RCH is available on
G@RCH web site http://www.garch.org.
This (and only this) version of G@RCH may be used freely for academic research and teaching purposes only. The Professional version (i.e. using OxRun, OxDebug, OxMetrics or oxli) is not free.
Commercial users and others who do not qualify for the free version must purchase the Professional version of G@RCH with documentation, regardless of which version they use (i.e., even with Linux or Unix). Redistribution of G@RCH in any form is not permitted.
Moreover, for easier validation and replication of empirical findings, please cite this documentation (or Laurent and Peters, 2002) in all reports and publications involving the use of G@RCH. Failure to cite the use of Ox in published work may result in loss of the right to use the free version, and an invoice at the full commercial price.
All company and product names referred to in this documentation are either trademarks or registered trademarks of their associated companies.
Check www.garch.org for information on releases, online help and other relevant information on G@RCH.
To install the Professional version of G@RCH 6.1, run the installation file and follow the instructions.
To install the Console version of G@RCH, unzip the file Garch6.zip (available herebelow) in the ox/packages directory. Note that the free console version of G@RCH 6.1 only includes the Garch class. The MGarch and Realized classes are only available in the Professional version of G@RCH.
Download G@RCH Console 6.1  
Download Ox Console 
For OxMetrics enterprise users, note that the Console version of G@RCH 6.1 comes along with OxMetrics enterprise and is directly installed in the ox/packages directory.
THIS PACKAGE IS FUNCTIONAL BUT NO WARRANTY IS GIVEN WHATSOEVER. YOU USE IT AT YOUR OWN RISK!
We now discuss some of the basic skills required to get started with G@RCH. We will give a brief introduction to the functionalities provided by OxMetrics. The aim of this section is to explain how to load data and make graphs: for instructions on how to transform data using the calculator or algebra, consult the OxMetrics manuals.
We assume that you have the basic skills to operate programs under the Windows operating system (the OxMetrics books provides some hints). G@RCH runs on several versions of Windows, i.e. Windows Vista/XP/NT/2000 but also on Mac OsX and Linux.^{1} The screen appearance of the program and dialogs will reflect the operating system you are using. The screen captures of this manual reflect the Windows XP operating system but should not be very different from the other systems.
When launching OxMetrics, all the available modules (PcGive, G@RCH and Stamp) are automatically loaded and appear in the Modules group in the workspace window on the lefthand side of OxMetrics. This process is displayed in the following capture.
G@RCH cannot operate without data. Once OxMetrics is activated, the very first step is to load data. Let us illustrate this with an example, loading nasdaq.in7. These are daily returns of the Nasdaq stock index. The covered period is 10/11/1984  12/21/2000, which represents 4093 daily observations. Daily returns (in %) are defined as
 (2.1) 
where p_{t} is the price series at time t.
The IN7 extension indicates an OxMetrics file. The IN7 file is a humanreadable file, describing the data. There is a companion BN7 file, which holds the actual numbers (in binary format, so this file cannot be edited). OxMetrics can handle a wide range of data files, among them Excel and Lotus files and, of course, plain humanreadable (ASCII) files. You can also cut and paste data between Excel and OxMetrics. More details are available in OxMetrics handbooks.
To load this dataset, click File then Open in the main menu of OxMetrics, then select the data file. The default datasets provided by G@RCH are stored in the subdirectory \G@RCH\data of OxMetrics’s main directory. Locate that directory and select nasdaq.in7.
Double clicking on the variable name shows the documentation of the variable.
The data can be manipulated, much like in a spreadsheet program. Here we shall not need these facilities, so minimize the database window again: click on the first button in the righthand side of the window. Do not click on the cross: this closes the database, thus removing it from OxMetrics and hence from G@RCH.
The graphics facilities of OxMetrics are powerful yet easy to use. This section will show you how to make timeplots of variables in the database. OxMetrics offers automatic selections of scaling etc., but you will be able to edit these graphs, and change the default layout such as line colors and line types. Graphs can also be saved in a variety of formats for later use in a word processor or for reloading to OxMetrics.
Graphics is the first entry on the Model menu. Activate the command to see the following dialog box (or click on the crossplot graphics icon on the toolbar).
In this example we select Nasdaq and then press the << button to move the database variables to the selection listbox, as shown above. Then press the button labelled Actual series. Figure 2.1 should appear on the screen. OxMetrics can also draw multiple graphs simultaneously onscreen. For other types of graphs, select All plot types to open the related menu, as seen hereunder (additional details on graphics are available in Doornik, 2009).
To print a graph directly to the printer, click on the printer icon in OxMetrics. You can preview the result first using the Print Preview command from the File menu.
Graphs can be saved in various formats. See Doornik (2009) for more details.
Saving graphs in the .EPS format is useful to include them in a LATEX document. The example given below is the Latex code used to include Figure 2.1 in the current document. Note that the command \includegraphics requires the use of the package graphicx. This can be done using the \usepackage{graphicx} command in the preamble of the LATEX document.
Panel 1 of Figure 3.1 indicates that the NASDAQ exhibits volatility clustering as periods of low volatility mingle with periods of high volatility. This is a clear sign of presence of ARCH effect in the series.
Panel 2 also shows that the unconditional distribution of the NASDAQ returns (full line) is not normal. Indeed, it is more peaked than the normal density (dashed line) and it has fatter tails as indicated by panels 3 and 4 (this visual indication is validated by the computation of the kurtosis, which is is equal to 9.34). While not obvious on a visual basis, the skewness parameter (0.627) indicates a negatively skewed behavior.
Recall that the skewness coefficient (SK) equals 0 for a symmetric distribution while the kurtosis coefficient (KU) equals 3 for the normal distribution. The excess kurtosis equals KU  3. Formally these coefficients are expressed as
 (3.1) 
When the error term is not independent of previous errors, it is said to be autocorrelated. Autocorrelation of order h is computed as follows
 (3.2) 
Plotting the autocorrelation function against the lag gives a first visual impression of the magnitude of the autocorrelation problem. This plot is called “autocorrelogram” and it provides information about simple correlation coefficients. When the errors are strongly time dependent, the autocorrelations tend to be fairly large even for large values of h. The next screen shot shows how to obtain an autocorrelogram in OxMetrics.
The obtained graphic is displayed in Figure 3.2. It suggests that the daily return series of the NASDAQ is a short memory process (in the level) and an AR(1) term might be needed in the conditional mean equation.
We have previously seen from the visual inspection that the NASDAQ exhibits volatility clustering as periods of low volatility mingle with periods of high volatility. In addition to this inspection, one can plot the autocorrelogram of the squared (or absolute) returns to highlight the presence of ARCH effects in the data. This can be done using the calculator of OxMetrics. We can compute squared returns of the NASDAQ, as shown in the screen shot below.
Figure 3.3 shows the autocorrelogram (with 100 lags) of the squared returns. It suggests that squared returns are strongly autocorrelated, exhibiting volatility clustering.
G@RCH 6.1 provides several tests on the raw data (in addition to the possibilities offered by OxMetrics: ACF, PACF, QQplots,…):
 (3.3) 
where q is the number of lagged first differences included in the ADF regression. The
constant and the trend are optional and can be excluded if needed. Note that e_{t} is
assumed to be a white noise. Critical values are taken from MacKinnon (1991).
2. The KPSS test of Kwiatkowski, Phillips, Schmidt, and Shin (1992) tests H_{0} : y_{t} is
I(0) against H_{1} : y_{t} is I(1). The regression model with a time trend has the
form
 (3.4) 
where e_{t} is assumed to be stationary with mean 0 and variance σ_{e}^{2} and x_{i} an i.i.d. process with E(x_{i}) = 0 and V (x_{i}) = 1. When γ≠0,y_{t} is integrated while when k = 0 it is trendstationary (or stationary if β_{1} = 0). As a consequence, H_{0} implies k = 0 while H_{1} implies k≠0. After the estimation of the above model under H_{0} by OLS (including or not a timetrend), the KPSS test rejects H_{0} in favor of H_{1} for large values of the statistic
 (3.5) 
where _{T}^{2} is an estimator of the spectral density at a frequency of zero (nonparametric
estimator of the socalled longrun variance). This estimator is based on a
weighted sum of the autocovariances, where the weights are defined by a
kernel function. We thus need to specify a truncation lag in the covariance or
bandwidth, denoted l. G@RCH 6.1 provides a Bartlett kernel. The bandwidth
can be specified by the user. In their original work, Kwiatkowski, Phillips,
Schmidt, and Shin (1992) considered three values of l as a function of T :
l_{0} = 0,l_{4} = integer[4(T∕100)1∕4], and l_{12} = integer[12(T∕100)1∕4].
3. Schmidt and Phillips (1992) have proposed a test for the null hypothesis of a unit
root when a deterministic linear trend is present. These authors endup with two
statistics, denoted Z(rho) and Z(tau) based on a nonparametric estimator of the
socalled longrun variance (see above) that involves a truncation lag in the covariance
or bandwidth (l). Critical values for these two statistics are reported in Schmidt and
Phillips (1992).
To launch the descriptive statistics, click on G@RCH in the Modules group in the workspace window on the lefthand side of OxMetrics. Then change the Category to Other models and Model Class to Descriptive Statistics using G@RCH and click on Formulate.
A new dialog box Formulate is launched and the user is asked to select the series to be tested. Selected series are labelled ‘T’.
Next, the user select the tests to be run. Lag values of the ARCH LM and BoxPierce tests can be changed by the user. The subsequent window displays options related to the sample period. As can be seen, if the date variable is well formatted in the database, the sample selection is made easier by allowing selecting a starting date and an ending date. See Chapter 9 of Doornik (2007a) for details on OxMetrics file formats.
Once the OK button of the Estimate  Descriptive Statistics dialog box is pressed, tests are launched and results are printed out in OxMetrics. Note that if a unit root test has been selected, an additional dialog box is launched and the user is asked to select some options related to the test (to include a constant or a trend for instance and the lag length for differences or bandwidth). They should be similar to the output of “Box 1  Descriptive Statistics”. Results confirm that NASDAQ returns exhibit ARCHtype effects (see the ARCH LM tests and the QStatistics on squared data). Looking at the QStatistics on raw data, one concludes that an ARMAtype model seems justified. Finally, the JarqueBera statistic clearly rejects the normality assumption for the unconditional distribution of the NASDAQ while the KPSS Statistics indicated that the series is likely to be I(0).
Let us consider a univariate time series y_{t}. If Ω_{t1} is the information set at time t 1, we can define its functional form as:
 (3.6) 
where E denotes the conditional expectation operator and ε_{t} is the disturbance term (or unpredictable part), with E = 0 and E = 0,∀ t≠s.
Equation (3.6) is the conditional mean equation which has been studied and modelled in many ways. Two of the most famous specifications are the Autoregressive (AR) and Moving Average (MA) models. Combining these two processes and introducing n_{1} explanatory variables in the equation, we obtain the ARMAX(n,s) process
 (3.7) 
where L is the lag operator^{1} , Ψ = 1 ∑ _{i=1}^{n}ψ_{i}L^{i} and Θ = 1 + ∑ _{j=1}^{s}θ_{j}L^{j}. The ARMA residuals are obtained using the arma0 function of Ox on the demeaned series (y_{t}  μ_{t}). See the Ox manual for more details about arma0.
Several studies have shown that the dependent variables (interest rate returns, exchange rate returns, etc.) may exhibit significant autocorrelation between observations widely separated in time. In such a case, y_{t} is said to display long memory (or longterm dependence) and is best modelled by a fractionally integrated ARMA process. This ARFIMA process was initially developed in Granger (1980) and Granger and Joyeux (1980) among others.^{2} The ARFIMA(n,ζ,s) is given by:
 (3.8) 
where the operator (1  L)^{ζ} accounts for the long memory of the process and is defined as:
with 0 < ζ < 1, c_{1}(ζ) = ζ,c_{2}(ζ) = ζ(1  ζ), … and Γ(.) denoting the Gamma function (see Baillie, 1996, for a survey on this topic). The truncation order of the infinite summation is set to t  1. Note that The ARFIMA residuals are obtained using diffpow function on the ARMA residuals. See the Ox manual for more details about diffpow.It is worth noting that Doornik and Ooms (1999) provided an Ox package for estimating, forecasting and simulating ARFIMA models. However, contrary to G@RCH, the conditional variance is assumed to be constant over time.^{3}
A feature of G@RCH 6.1 is the availability of ARCH “inmean” models. If we introduce the conditional variance (or the conditional standard error) as additional explanatory variables in Equation (9.6), we get the ARCHinMean model of Engle, Lilien, and Robbins (1987), i.e.
 (3.10) 
with k = 1 to include the conditional standard deviation and k = 2 for the conditional variance.
The ARCHM model is often used in financial applications where the expected return on an asset is related to the expected asset risk. The estimated coefficient of the expected risk is a measure of the riskreturn tradeoff.
The ε_{t} term in Equations (3.6)(3.8) is the innovation of the process. More than two decades ago, Engle (1982) defined as an Autoregressive Conditional Heteroscedastic (ARCH) process, all ε_{t} of the form:
 (3.11) 
where z_{t} is an independently and identically distributed (i.i.d.) process with E(z_{t}) = 0 and V ar(z_{t}) = 1. By assumption, ε_{t} is serially uncorrelated with a mean equal to zero but its conditional variance equals σ_{t}^{2}. Therefore, it may change over time, contrary to what is assumed in the standard regression model.
The models provided by our program are all ARCHtype.^{4} They differ on the functional form of σ_{t}^{2} but the basic principles are the same (see the next chapter for details). Besides the traditional ARCH and GARCH models, we focus mainly on two kinds of models: the asymmetric models and the fractionally integrated models. The former are defined to account for the socalled “leverage effect” observed in many stock returns, while the latter allows for longmemory in the variance. Early evidence of the “leverage effect” can be found in Black (1976), while persistence in volatility is a common finding of many empirical studies; see for instance the excellent surveys on ARCH models proposed in Bollerslev, Chou, and Kroner (1992), Bera and Higgins (1993) or Palm (1996).
The ARCH (q) model can be expressed as:
where D(.) is a probability density function with mean 0 and unit variance (see Section 3.6.2).The ARCH model can describe volatility clustering. The conditional variance of ε_{t} is indeed an increasing function of the square of the shock that occurred in t  1. Consequently, if ε_{t1} was large in absolute value, σ_{t}^{2} and thus ε_{t} is expected to be large (in absolute value) as well. Notice that even if the conditional variance of an ARCH model is timevarying, i.e. σ_{t}^{2} = E(ε_{t}^{2}ψ_{t1}), the unconditional variance of ε_{t} is constant and, provided that ω > 0 and ∑ _{i=1}^{q}α_{i} < 1, we have:
If z_{t} is normally distributed, E(z_{t}^{3}) = 0 and E(z_{t}^{4}) = 3. Consequently, E(ε_{t}^{3}) = 0 and the skewness of y is zero. The kurtosis coefficient for the ARCH(1) is 3 if α_{1} < ≈ 0.577. In this case, the unconditional distribution of the returns features fat tails whenever α_{1} > 0.
In most applications, the excess kurtosis implied by the ARCH model (coupled with a normal density) is not enough to mimic what we observe on real data. Other distributions are possible. For example we could assume that z_{t} follows a Student distribution with unit variance and υ degrees of freedom, i.e. z_{t} is ST(0,1,υ). In that case, the unconditional kurtosis of the ARCH(1) is λ with λ = 3(υ  2)∕(υ  4). Because of the additional coefficient υ, the ARCH(1) model based on the Student distribution does feature fatter tails than the corresponding model based on the Normal distribution. See Section 3.6.2 for more details.
The computation of σ_{t}^{2} in Equation (3.12) depends on past (squared) residuals (ε_{t}^{2}), that are not observed for t = 0,1,…,q + 1. To initialize the process, the unobserved squared residuals are set to their sample mean.
Explanatory variables can be introduced in the conditional variance as follows:
 (3.14) 
For the ease of presentation, we will use the notation ω for both ω and ω_{t} in all subsequent equations.^{5}
σ_{t}^{2} has obviously to be positive for all t. Sufficient conditions to ensure that the conditional variance in Equation (3.12) is positive are given by ω > 0 and α_{i} ≥ 0.^{6} Furthermore, when explanatory variables enter the ARCH equation, these positivity constraints are not valid anymore (but the conditional variance is still required to be nonnegative).
A simple trick to reduce the number of parameters to be estimated is referred to as variance targeting by Engle and Mezrich (1996).
The conditional variance matrix of the ARCH model (and most of its generalizations), may be expressed in terms of the unconditional variance and other parameters. Doing so one can reparametrize the model using the unconditional variance and replace it by a consistent estimator (before maximizing the likelihood).
Applying variance targeting to the ARCH model implies replacing ω by σ^{2}, where σ^{2} is the unconditional variance of ε_{t}, which can be consistently estimated by its sample counterpart.
If explanatory variables appear in the ARCH equation, ω is then replaced by σ^{2}∑ _{i=1}^{n2}ω_{i}_{i}, where _{i} is the sample average of variable x_{i,t} (assuming the stationarity of the n_{2} explanatory variables). In other words, the explanatory variables are centered.
Estimating (3.12) with G@RCH is intended to be very simple.
Note that in the example we will use a Monday dummy and a Friday dummy. These dummies are easily created with the Calculator (recall that the dataset is dated). First, use the function dayofweek() to created a variable DAY with DAY = 2 on Mondays, 3 on Wednesdays, etc. Then, create the Monday and Friday dummy as follows: Monday = Day == 2 and Friday = Day == 6.
To specify the model, first click on G@RCH in the Modules group in the workspace window on the lefthand side of OxMetrics. Next change the Category to Models for financial data and the Model class to Univariate GARCH models using G@RCH, then click on Formulate.
A list with all the variables of the database appears in the Database frame. To select variables that will enter your model, click on the variable name and then click on the << button. There are three possible statuses for each variable (see the list of statuses under the Selection frame): dependent variable (Y variable), regressor in the conditional mean (Mean), or regressor in the conditional variance (Variance). In the univariate module, only one Y variable per model is accepted. However one can include several regressors in the conditional mean and the conditional variance equations and the same variable can be a regressor in both equations. In the example a Monday dummy is included in the conditional mean and a Friday dummy in the conditional variance equation.
Once the OK button is pressed, the Model Settings box automatically appears. This box allows to select the specification of the model: AR(FI)MA orders for the mean equation, GARCH orders, type of GARCH model for the variance equation and the distribution. The default specification is an ARMA(0,0)GARCH(1,1) with normal errors. In our application, we select an ARMA(1,0)GARCH(0,1) specification or equivalently an ARMA(1,0)ARCH(1).^{7}
Estimation of ARCHtype models is commonly done by maximum likelihood so that one has to make an additional assumption about the innovation process z_{t}, i.e. choosing a density function D(0,1) with a mean 0 and a unit variance.
Weiss (1986) and Bollerslev and Wooldridge (1992) show that under the normality assumption, the quasimaximum likelihood (QML) estimator is consistent if the conditional mean and the conditional variance are correctly specified. This estimator is, however, inefficient with the degree of inefficiency increasing with the degree of departure from normality (Engle and GonzálezRivera, 1991).
As reported by Palm (1996), Pagan (1996) and Bollerslev, Chou, and Kroner (1992), the use of a fattailed distributions is widespread in the literature. In particular, Bollerslev (1987), Hsieh (1989), Baillie and Bollerslev (1989) and Palm and Vlaar (1997) among others show that these distributions perform better in order to capture the higher observed kurtosis.
As shown below, four distributions are available in G@RCH 6.1 : the usual Gaussian (normal) distribution, the Studentt distribution, the Generalized Error distribution (GED) and the skewedStudent distribution.
The logic of ML is to interpret the density as a function of the parameter set, conditional on a set of sample outcomes. This function is called the likelihood function. It is quite evident from Equation (3.12) (and all the following equations of Section 3) that the recursive evaluation of this function is conditional on unobserved values. The ML estimation is therefore not perfectly exact. To solve the problem of unobserved values, we have set these quantities to their unconditional expected values. For this reason we talk about approximate (or conditional) ML and not exact ML.
If we express the mean equation as in Equation (3.6) and ε_{t} = z_{t}σ_{t}, the loglikelihood function of the standard normal distribution is given by:
 (3.15) 
where T is the number of observations.
For a Studentt distribution, the loglikelihood is:
where υ is the degrees of freedom, 2 < υ ≤∞ and Γ(.) is the gamma function.The GED loglikelihood function of a normalized random variable is given by:
 (3.17) 
where 0 < υ < ∞ and

The main drawback of the last two densities is that even if they may account for fat tails, they are symmetric. Skewness and kurtosis are important in financial applications in many respects (in asset pricing models, portfolio selection, option pricing theory or ValueatRisk among others). Quite recently, Lambert and Laurent (2000, 2001) applied and extended the skewedStudent density proposed by Fernández and Steel (1998) to the GARCH framework.
The loglikelihood of a standardized (zero mean and unit variance) skewedStudent is:
whereξ is the asymmetry parameter, υ is the degree of freedom of the distribution,
and
Note that G@RCH does not estimate ξ but log(ξ) to facilitate inference about the null hypothesis of symmetry (since the skewedStudent equals the symmetric Student distribution when ξ = 1 or log(ξ) = 0). The estimated value of log(ξ) is reported in the output under the label “Asymmetry”. See Lambert and Laurent (2001) and Bauwens and Laurent (2005) for more details.
The gradient vector and the hessian matrix can be obtained numerically or by evaluating their analytical expression. Due to the high number of possible models and distributions available, G@RCH uses numerical techniques to approximate the derivatives of the loglikelihood function with respect to the parameter vector.
The next window is (Starting Values) . The user is asked to make a choice regarding the starting values of the parameters to be estimated: (1) let the program choose the starting values, (2) enter them manually, element by element (the one selected here), or (3) enter the starting values in a vector form (one value per row) and potentially fix some parameters at their starting value.
The first method is obviously the easiest to use and is recommended for unexperienced users since it prevents from entering aberrant values. The last two methods are useful if the user wants specific starting values for the estimation. The third method is particularly useful when one wants to fix some parameters.
Next, the Estimate window proposes options on two important characteristics of the model: the sample size and the estimation method.
When the variable corresponding to the date is correctly formatted, the sample can conveniently be fixed based on starting and ending date (see Chapter 9 of Doornik, 2007, for details). The number of forecasts can be also subtracted when outofsample forecasting is to be performed.
The models are estimated using a maximum likelihood (ML) approach. Basically, three methods are proposed to estimate these models (see the option Optimization Methods in the Model Settings window).
G@RCH provides three methods to compute the covariance matrix of the estimates: Second Derivatives (based on the inverse of the hessian), OuterProduct of Gradients and Robust standard errors, also known as QuasiMaximum Likelihood (QML) or ‘Sandwich formula’. This choice is accessible by clicking on the button ‘Options...’. By default, Robust standard errors are reported.
Note that a new feature of G@RCH 5.0 is that the default settings are updated after each estimation (the default model is thus the previously estimated one). To disable this option, disable ‘Update Default Settings’. Option ‘Reset default Settings’ is selfexplanatory.
Once all the options have been selected, the estimation procedure is launched if the default starting values are used. Otherwise, a new dialog box appears. Let us assume that second method has been selected for the starting values, i.e. ‘Select (Individual Form)’. This new window contains an exhaustive list of parameters used in the different models. Depending on the specification, some parameters have a value, others have not.
The user should replace only the former since they correspond to the parameters to be estimated for the specified model. Note that it is crucially important to respect the format of these initial values. More specifically, when two values are associated with a single parameter (say, both ARCH coefficients in a ARCH(2) model), the mandatory format is “value1 ; value2”. Here is an example:
If the third method is selected for the starting values, i.e. ‘Select (Matrix Form)’, the corresponding window contains a column with the default starting values of the parameter vector (Column ‘Value’) and a Column ‘FIX’ that allows to fix some parameters to their starting value.
Note that estimating the same model with different values starting values is interesting to check the robustness of the maximum likelihood procedure.
Once this step is completed, the program starts the iteration process. Depending on the options
selected^{8} ,
G@RCH prints intermediate iteration results or not. The final output is divided in two main parts:
first, the model specification reminder; second, the estimated values and other useful statistics of the
parameters.^{9}
The outputs “Box 2  Output of the ARMA(1,0)GARCH(1,1)” and
“Box 3  Output of the ARMA(1,0)GARCH(1,1) by QMLE” correspond
respectively to the estimation results of the ARMA(1,0)GARCH(1,1) model by ML and
QMLE.
Parameters labelled ‘(M)’ relate to the conditional mean while those labelled ‘(V)’ relate to the conditional variance equation. AR(1) and ARCH(Alpha1) correspond to ψ_{1} and α_{1}, respectively.
Surprisingly, the AR(1) coefficient is not significantly different from 0 (we will come back to this issue latter) while it was expected to be significantly negative. Interestingly, the returns and volatility are, on average, found to be lower on Monday and on Friday, respectively. Furthermore, the ARCH coefficient α_{1} is highly significant (rejecting the null of no ARCH effects) and incompatible with the existence of the kurtosis (since it is above 0.577). The loglikelihood value is 6106.357.
Expost, it is desirable to test the adequacy of the ARCH model. New options are thus available after the estimation of the model when clicking on the Test... button of the main G@RCH box: Tests, Graphic Analysis, Forecasts, Exclusion Restrictions, Linear Restrictions and Store.
The Graphic Analysis... option allows to plot different graphics.
Figure 3.4 plots the conditional variance (_{t}^{2}) as well as the histogram of the standardized residuals (ẑ_{t} = ) obtained with the AR(1)ARCH(1) model, together with a kernel estimation of its unconditional distribution (solid line) and the N(0,1) (dotted line).

Just as any other graphs in the OxMetrics environment, all graphs plotted from G@RCH can be easily edited (color, size,…) and exported in many formats (.eps, .ps, .wmf, .emf and .gwg).
The Tests... option allows the user to run different tests but also to print the variancecovariance matrix of the estimated parameters.
Once again, in addition to the possibilities offered by OxMetrics (ACF, PACF, QQplots,…), several misspecification tests are indeed provided in G@RCH:
Instead of running three different regressions, G@RCH follows Engle and Ng (1993) in estimating jointly the three effect, i.e.
 (3.19) 
Tstats corresponding to H_{0} : d_{i} = 0 (H_{1} : d_{i}≠0), ∀i = 1,2 and 3 are reported, as well as their pvalue. Finally, a joint test for H_{0} : d_{1} = d_{2} = d_{3} = 0 is also provided.
 (3.20) 
where n_{i} is the observed number in the sample that fall into the i^{th} category and En_{i} is the number of observations expected to be in this i^{th} category when Ho is true. The Pearson statistic is therefore “small” when all of the observed counts (proportions) are close to the expected counts (proportions) and it is “large” when one or more observed counts (proportions) differs noticeably from what is expected when H_{0} is true.^{12} For i.i.d. observations, Palm and Vlaar (1997) show that under the null of a correct distribution the asymptotic distribution of P(g) is bounded between a χ^{2}(r  1) and a χ^{2}(r  k  1) where k is the number of estimated parameters. As explained by Palm and Vlaar (1997), the choice of r is far from being obvious.^{13} According to König and Gaab (1982), the number of cells must increase at a rate equal to T^{0.4}.
The idea of the test is the following: after estimating the model, the standardized
residuals ẑ_{t} = _{t}∕_{t} can be computed. It is obvious that ẑ_{t} depends on the set of
estimated parameters. E(ẑ_{t}^{2}) = 1 by construction, so we can run a regression of
E(ẑ_{t}^{2})  1 on some information variables and examine the statistical significance of
the regression parameters. Tse (2002) proposes to run the following OLS
regression to test the presence of remaining heteroscedasticity in the standardized
residuals:
E(ẑ_{t}^{2})  1 = d_{1}ẑ_{t1}^{2} + … + d_{M}ẑ_{tM}^{2} + u_{t}. Since the regressors are not observed
(but estimated), standard inference procedures of OLS is invalid. Tse (2002) derives
the asymptotic distribution of the estimated parameters and shows that a joint test of
significance of the d_{1},…,d_{M} is now χ^{2}(M) distributed. Notice that in G@RCH the
maximum lag values M of the test are set by default to 2, 5 and 10 but can be changed
by the user.
Monte Carlo experiment
To study the performance of this test, let us consider a first simulation study (see Simulation_RBD_TSE.ox). We simulate 1000 series of T = 2000 observations following a NAR(0)GARCH(1,1) model with μ = 0.1, i.e. (y_{t}  μ) = ε_{t}, where ε_{t} ~ N(0,σ_{t}^{2}) and σ_{t}^{2} = 0.2 + 0.1ε_{t1}^{2} + 0.8σ_{t1}^{2}. We than estimate a NAR(0)GARCH(1,1) and the simulated data and apply Tse (2002)’s test on ẑ_{t}^{2} ≡ with M = 1,2,3 and 4. We report both the test with and without the RBD correction.
The number of times that the null is rejected at the α% nominal size (called the empirical size of the test) is reported below.
These results suggest that the version of the test that implements the RBD correction has a reasonable size for α = 10 and 5% and suffers from a positive size distortion for α = 1%. However, the ’OLS’ version of the test (that does not take into account the uncertainty of the estimation of the GARCH model) is clearly inadequate because it has a tendency to underreject the null whatever the value of M and α. For instance, for M = 1 and α = 10%, the empirical size is 0.9%, which suggests a strong size distortion.
In the example, 6 tests have been selected. The output is reported in
“Box 4  Misspecification Tests”.
Without going too deeply into the analysis of these results, we can briefly argue that the model does not capture the dynamics of the the NASDAQ.
The Qstatistics on standardized and squared standardized residuals, and RBD test with various lag values as well as the adjusted Pearson Chisquare goodnessoffit test (with different cell numbers) reject the null hypothesis of a correct specification. This result is not very surprising. Early empirical evidence has indeed shown that a high ARCH order has to be selected to catch the dynamics of the conditional variance (thus involving the estimation of numerous parameters).
When numerical optimization is used to maximize the loglikelihood function with respect to the vector of parameters Ψ, the inspected range of the parameter space is . The problem is that some parameters might have to be constrained in a smaller interval. For instance, it is convenient to constrain the α_{i} parameters of an ARCH(q) to be positive. Constraining parameters to lie between given lower and upper bounds is easily done by selecting the ‘BFGSBOUNDS’ option in the Estimate  GARCH Models dialog box in G@RCH.
If the user fixes bounds on parameters, the program uses MaxSQPF instead of MaxBFGS (Broyden, Fletcher, Goldfarb and Shanno quasiNewton method) to optimize the likelihood function. MaxSQPF enforces all iterates to be feasible, using the algorithm by Lawrence and Tits (2001). If a starting point is infeasible, MaxSQPF will try to minimize the squared constraint violations to find a feasible point. See Doornik (2007b) for more details on these two optimization functions.
To change the values of the bounds, select option ‘Select (Matrix Form)’ in the ‘Starting Value’ dialog box together with option ‘BFGSBOUNDS’ in the Estimate  GARCH Models dialog box. In addition to columns ‘FIX’ and ‘Value’, two new columns appear with the default lower and upper bounds.
Furthermore, nonlinear constraints (like the stationarity constraint of the GARCH (1,1) model, α_{1} + β_{1} < 1) can be imposed during the estimation with the Console version. For more details, see 5.3.2.
The main purpose of building and estimating a model with financial data is probably to produce a forecast. With the Forecast... option, G@RCH 6.1 also provides forecasting tools: forecasts of both the conditional mean and the conditional variance are available as well as several forecast error measures.
The first parameter to specify is the horizon h of the hstepahead forecasts. The default value is 10. Three options are available to:
Finally, graphical options are available for the standard error bands (error bands, bars or fans).
Our first goal is to give the optimal hstepahead predictor of y_{t+h} given the information we have up to time t.
For instance, in an AR(1) process such as
y_{t} = μ + ψ_{1}(y_{t1}  μ) + ε_{t},
the optimal^{14} hstepahead predictor of y_{t+h} (i.e. ŷ_{t+ht}) is its conditional expectation at time t (given the estimated parameters and _{1}):
 (3.21) 
where ŷ_{t+it} = y_{t+i} for i ≤ 0.
For the AR(1), the optimal 1stepahead forecast equals + _{1}(ŷ_{t} ). For h > 1, the optimal forecast can be obtained recursively or directly as ŷ_{t+ht} = + _{1}^{h}(ŷ_{t} ).
More generally, for the ARFIMA(n,ζ,s) model described Equation (3.8), the optimal hstepahead predictor of y_{t+h} is:
Recall that when exogenous variables enter the conditional mean equation, μ becomes μ_{t} = μ + ∑ _{i=1}^{n1}δ_{i}x_{i,t} and consequently, provided that the information x_{i,t+h} is available at time t (which is the case for instance if x_{i,t} is a “dayoftheweek” dummy variable), _{t+ht} is also available at time t. When there is no exogenous variable in the ARFIMA model and n = 1,s = 0 and ζ = 0 (c_{k} = 0), the forecast of the AR(1) process given in Equation (3.21) can be recovered.Note that for ARCHinmean models, _{t+ht} = μ + ∑ _{i=1}^{n1}δ_{i}x_{i,t+ht} + ϑσ_{t+ht}^{k} (with k = 1 or 2).
Independently from the conditional mean, one can forecast the conditional variance. In the simple ARCH(q) case, the optimal hstepahead forecast of the conditional variance, i.e. _{t+ht}^{2} is given by:
 (3.23) 
where ε_{t+it}^{2} = σ_{t+it}^{2} for i > 0 while ε_{t+it}^{2} = ε_{t+i}^{2} for i ≤ 0. Equation (3.23) is usually computed recursively, even if a closed form solution of σ_{t+ht}^{2} can be obtained by recursive substitution in Equation (3.23).
Leaving out the last 10 observation for the forecasting experiment (as shown in the Estimate Model windows here below) and performing a 10stepahead forecasts of the Nasdaq based on the AR(1)ARCH(1) produces Figure 3.5.
The forecasted bands are ±2_{t+ht} which gives a 95 % confidence interval (note that the critical value 2 can be changed).
Furthermore, if we leave enough observations for the outofsample period, G@RCH can report some standard measures of forecasting performance derived from the forecasts errors (see Box 5). Note that certain criteria are not computed on both the mean and the variance (because they are not relevant, e.g. Percentage Correct Sign(PCS) for the variance). In such a case, G@RCH reports a .NaN.
Finally, three options are also available in the Test menu: an Exclusion Restrictions dialog box, a Linear Restrictions dialog box and a Store in database dialog.
The Exclusion Restrictions dialog box option allows you to select explanatory variables and test whether they are jointly significant. A more general form is the test for linear restrictions.
Mark all the variables you wish to include in the test in this MultipleSelection List box. G@RCH tests whether the selected variables can be deleted from the model.
Tests for linear restrictions are specified in the form of a matrix R, and a vector r. These are entered as one matrix [R : r] in the dialog. (This is more general than testing for exclusion restrictions).
The first four columns are the columns of R, specifying two restrictions. The last column is r, which specifies what the restrictions should add up to.
The dimensions of the matrix must be specified in the rows and columns fields. It is your responsibility to specify the right values, G@RCH will not try to work it out (because elements of a row may be spread over several lines).
Finally, the residuals, the squared residuals, the standardized residuals, the conditional variance and the hstepahead forecasts (of the conditional mean and the conditional variance) can be stored in the database as new variables. When selecting this option, a first window appears and the user selects the series to be stored. A default name is then proposed for this series.
This section has been largely inspired by Taylor (1995).
Are prices (or returns) of financial assets forecastable? According to the random walk hypothesis (RWH) not. The steps of a random walk are unpredictable. The RWH associates steps with returns, so that returns can not be predicted from past values. In G@RCH, the RWH is here defined by two statements about the stochastic process that provides observed returns: The mean is stationary, E = μ , and returns are uncorrelated, corr = 0,τ≠0.
When RWH is true, the latest return and all past returns are irrelevant if we attempt to predict future returns using linear predictors. In this case, the variables can be assumed to be a martingale difference and thus expected returns do not depend on the history of time series information.^{15}
The Varianceratio test of Lo and MacKinlay (1988) is designed to detect either mean reversion or price trend behaviour. Let x_{t} = log be the price logarithm (with any dividends reinvested). Then oneperiod returns are r_{t} = x_{t}  x_{t1}. Nperiod returns are r_{t} + ... + r_{t+N1} = x_{t+N1}  x_{t1}. Also, assuming the returns process is stationary, let V (N) = var. When RWH is true, V (N) = NV (1). The test attempts to decide if this property is true. N is any integer, ≥ 2.
For any stationary process, the varianceratio is
by using the formula for the variance of a linear combination. Consequently, the varianceratio test is appropriate when any dependence occurs for at least N  1 lags and when the autocorrelations all have the same sign. It is a good test when the alternative to RWH is an ARMA(1,1) process that have a positive autoregressive parameter for returns.
Implementation of the test requires:
Assume there are n returns.
Step 1 : Calculate the average return and unbiased variance estimates, thus:
Step 2 : Calculate the sample variance ratio as
Reject RWH if V (N) is far from 1.
Recall that
and
Under H_{0},E(V (N)) = 1 and thus V ar(V (N)) = V ar. An appropriate estimate of nV ar(ρ_{τ}) is provided by b_{τ}.
Step 3 : Estimate variances for the sample autocorrelations, _{τ}, of the n returns, for 1 ≤ τ ≤ N  1, under the assumption that RWH is true.
where s_{t} = (r_{t}  )^{2}.
A very good approximation is given by b_{τ}1 + _{τ,s}, where ku is the sample kurtosis of the returns r_{t} and _{τ,s} is the sample autocorrelation at lag τ of the time series defined by s_{t} = ^{2}.
Step 4 : Estimate a variance for the varianceratio V (N), under the assumption that RWH is true.
This is denoted by . Then
Equivalently,
Step 5 : Calculate the test statistic,
The asymptotic distribution of this statistic is N(0,1) when RWH is true.
Monte Carlo experiment
To study the performance of this test, let us consider a first simulation study (see Simulation_VR.ox). We simulate 1000 series of T = 2000 observations following a NAR(0) model with μ = 0.1, i.e. (y_{t} μ) = ε_{t}, where ε_{t} ~ N(0,σ^{2}) and σ = 0.2. We than apply the Varianceratio test with N = 2,3 and 10. The 1000 realisations of the VR statistics are plotted in Figure 3.6 while the empirical distribution of the VR statistics is plotted in Figure 3.7 (together with a N(0,1)).
We see from Figure 3.6 that the VR statistics varies between 3 and 3. Furthermore, Figure 3.7 suggests that the VR statistics is close to be standard normal distributed.
The number of times that the null is rejected at the α% nominal size (called the empirical size of the test) is reported below.
These results suggest that the test suffers from a minor size distortion for the three considered values of M.
To study the power of the test, we now simulate T = 2000 observations of a NAR(1) model with μ = 0.1 and ρ = 0.1, i.e. (1  ρL)(y_{t}  μ) = ε_{t}, where ε_{t} ~ N(0,σ^{2}) and σ = 0.2. To obtain this configuration of the model, chose the following options:
The empirical distribution of the VR statistics (under the alternative) is plotted in Figure 3.8 (together with a N(0,1)).
Both Figure 3.8 and the output here below suggest that the VR test has good power to detect the presence of serial correlation in the series.
A potential weakness of the VR test is that it assumes a constant variance while it is clear that most financial time series do not have constant conditional variance. To study the impact of the presence of GARCH effects on the VR test, let us now simulate T = 2000 observations of a NAR(0)GARCH(1,1) model with μ = 0.1, i.e. (y_{t}  μ) = ε_{t}, where ε_{t} ~ N(0,σ_{t}^{2}), σ_{t}^{2} = ω + α_{1}ε_{t1}^{2} + β_{1}σ_{t1}^{2} with ω = 0.2,α_{1} = 0.1 and β_{1} = 0.8.
Again, we simulate 1000 series following this DGP and apply the VR test with the same values for M.
Chose now the following options in Simulation_VR.ox:
The empirical distribution of the VR statistics is plotted in Figure 3.9 and the empirical size in the output just below. The main conclusion is that the presence of GARCH effects has a negligible impact on the size of the test.
However, the power of the test is slightly affected by the presence of GARCH effects. Indeed, adding an AR(1) component to this GARCH model (with ρ = 0.05) substantially reduces the power of the test as suggested below.
To obtain these results, chose the following options in Simulation_VR.ox:
To overcome this problem some authors have proposed to apply the VR test on the rescaled or standardized residuals of a GARCHtype model.
This is illustrated in Simulation_VR_GARCH.ox.
As suggested by the next output, applying the VR test on the rescaled returns restores its original power to the test (i.e. case AR(1) without GARCH effects).
See Taylor (1995) for more details about the VR tests as well as some applications on real data.
Another popular test of the RMH is the Runs test, first used by Fama (1965). The runs test (also called WaldWolfowitz test) is a nonparametric test that checks the randomness hypothesis of a two or threevalued data sequence. Similar to a firstautocorrelation test but is nonparametric and does not require any assumption of normality and stationary distribution. A run’ is a sequence of adjacent equal elements. For example, the sequence “1111000111001111110000” is divided in six runs, three of which consist of 1’s and the others of 0’s. If the 1’s and 0’s alternate randomly, the number of runs is a random variable whose asymptotic distribution is N.
Let q_{t} be the sign of the return r_{t}, q_{t} is 1,0,1, respectively for positive, zero, negative values of r_{t}. Let c_{t} be 1 if q_{t}≠q_{t+1} and 0 otherwise. The total number of runs of all types is
Suppose there are n_{1} positive returns, n_{2} zero returns, and n_{3} negative returns in a series of n returns (with 0 mean → apply the test on r_{t}  ). Then the mean and variance of the random variable generating C, conditional upon n_{1}, n_{2}, and n_{3}, are
when the signs q_{t} are generated by i.i.d. variables (Mood, 1940).
The statistic C has an approximate normal distribution, for large n. Tests can then be decided by evaluating
with RWH rejected at the 5% level if K > 1.96. The Runs test avoids all the problems created by conditional heteroskedasticity. Note that trends in prices (vs. no trend) would give fewer runs than expected.
To study the performance of the Runs test, let us consider a first simulation study (see Simulation_VR.ox). We simulate 1000 series of T = 2000 observations following a NAR(0) model with μ = 0.1, i.e. (y_{t}  μ) = ε_{t}, where ε_{t} ~ N(0,σ^{2}) and σ = 0.2. We than apply the Varianceratio test with N = 2,3 and 10. The empirical distribution of the K statistics is plotted in Figure 3.10 (together with a N(0,1)).
The output reported here below also suggests that the test has a correct size at the three chosen significance levels. Results not reported here suggest (as expected) that the test is not affected by the presence of GARCH effects.
Let us now study the power of the Runs test by considering a NAR(1) model. The results suggest that this tests suffers from a reduction in power (compared to the VR test on the same DGP) due to the loss of information in the transformation form returns to their signs.
Range statistics that have power to detect longterm dependence were first developed for hydrological data by Hurst (1951) and later applied to financial returns.
Lo (1991) provides many references and a rigorous description of appropriate tests when the preferred alternative to randomness is longterm dependence. The range defined by a set of returns :
R/Stest statistics are ranges divided by scaled standard deviations
Two special cases define Mandelbrot (1972)’s and Lo (1991)’s test statistics:
and
with s^{2} the sample variance of returns.
Under certain assumptions, the distributions of these statistics converge, as n and q increase, to that of the range of a Brownian bridge on the unit interval. The rule is the following: do not reject the null when
[0.861 ; 1.747]  at the 10% level  
[0.809 ; 1.862]  at the 5% level  
[0.721 ; 2.098]  at the 1% level 
The null hypothesis of an uncorrelated process can be tested using (R∕S)Man, and Lo focuses on the null hypothesis of a short memory process and then the appropriate test statistic is (R∕S)LO .
To study the performance of the R/S tests, let us consider the following simulation study (see Simulation_RS.ox). We simulate 1000 series of T = 2000 observations following a NAR(0) model with μ = 0.1, i.e. (y_{t}  μ) = ε_{t}, where ε_{t} ~ N(0,σ^{2}) and σ = 0.2. We than apply the (R∕S)Man test and the (R∕S)LO test with q = 10 and 20.
The empirical size of the tests is reported below. The results suggest that the test is perfectly sized.
The next output corresponds to the AR(1) case with ρ = 0.05, obtained with the following options
It is clear that the tests have no power against short memory processes.
With a much higher value of ρ, i.e. ρ = 0.5, we see that the statistics of Lo (1991) is close to its nominal size while the one of Mandelbrot (1972) has some power to detect short memory.
Results not reported here suggest that theses two statistics are not affected by the presence of GARCH effects.
To study the ability of the (R∕S)LO statistics to detect long range dependance, we now use the ARFIMA package of Doornik and Ooms (1999) to simulate T = 2000 observations of a NARFIMA(0,0.2,0) model with μ = 0.1, i.e. (1  L)^{0.2}(y_{t}  μ) = ε_{t}, where ε_{t} ~ N(0,σ^{2}) and σ = 0.2.
This results of this simulation have been obtained using example file Simulation_RS.ox with the following options:
The next output suggests that the (R∕S)LO statistics indeed has power to detect longmemory.
While Engle (1982) is certainly the most important contribution in financial econometrics, the ARCH model is rarely used in practice due to its simplicity.
A useful generalization of this model is the GARCH model introduced by Bollerslev (1986). This model is also a weighted average of past squared residuals, but it has declining weights that never go completely to zero. This model is more parsimonious than the ARCH and, even in its simplest form, has proven surprisingly successful in predicting conditional variances.
The GARCH model is not the only extension. Indeed, G@RCH 6.1 proposes no less than 12 specifications. As shown below, choosing an alternative model is extremely easy in G@RCH.
The Generalized ARCH (GARCH) model of Bollerslev (1986) is based on an infinite ARCH specification and it allows to reduce the number of estimated parameters by imposing nonlinear restrictions on them. The GARCH (p,q) model can be expressed as:
 (4.1) 
Using the lag (or backshift) operator L, the GARCH (p,q) model becomes:

with α(L) = α_{1}L + α_{2}L^{2} + … + α_{q}L^{q} and β(L) = β_{1}L + β_{2}L^{2} + … + β_{p}L^{p}.
If all the roots of the polynomial 1  β(L) = 0 lie outside the unit circle, we have:
 (4.2) 
which may be seen as an ARCH(∞) process since the conditional variance linearly depends on all previous squared residuals. In this case, the conditional variance of y_{t} can become larger than the unconditional variance given by
Applying variance targeting to the GARCH model implies replacing ω by σ^{2}, where σ^{2} is the unconditional variance of ε_{t}, which can be consistently estimated by its sample counterpart.^{1}
Bollerslev (1986) has shown that for a GARCH(1,1) with normal innovations, the kurtosis of y is 3[1  (α_{1} + β_{1})^{2}]∕ > 3. The autocorrelations of ε_{t}^{2} have been derived by Bollerslev (1986). For a stationary GARCH(1,1), ρ_{1} = α_{1} + [α_{1}^{2}β_{1}∕(1  2α_{1}β_{1}  β_{1}^{2})], and ρ_{k} = (α_{1} + β_{1})^{k1}ρ_{1}, ∀k = 2,3,… In other words, the autocorrelations decline exponentially with a decay factor of α_{1} + β_{1}.
As in the ARCH case, some restrictions are needed to ensure σ_{t}^{2} to be positive for all t. Bollerslev (1986) shows that imposing ω > 0, α_{i} ≥ 0 (for i = 1,…,q) and β_{j} ≥ 0 (for j = 1,…,p) is sufficient for the conditional variance to be positive. In practice, the GARCH parameters are often estimated without the positivity constraints. Nelson and Cao (1992) argue that imposing all coefficients to be nonnegative is too restrictive and that some of these coefficients are found to be negative in practice while the conditional variance remains positive (by checking on a casebycase basis). Consequently, they relaxed this constraint and gave sufficient conditions for the GARCH(1,q) and GARCH(2,q) cases based on the infinite representation given in Equation (4.2). Indeed, the conditional variance is strictly positive provided ω^{1} is positive and all the coefficients of the infinite polynomial α(L)^{1} in Equation (4.2) are nonnegative. The positivity constraints proposed by Bollerslev (1986) can be imposed during the estimation (see Section 3.9). If not, these constraints, as well as the ones implied by the ARCH(∞) representation, will be tested a posteriori and reported in the output (if there is no explanatory variable in the conditional variance equation).
Estimation results of an ARMA(1,0)GARCH(1,1) model by QML are reported in the next Output (labelled Box 6).
Interestingly, the loglikelihood now equals 5370.858 against 6106.357 for the corresponding ARCH(1) model. Any likelihood ratio test (LRT), asymptotically ~ χ^{2}(1), would reject the ARCH(1) model in favour of the GARCH(1,1).
Furthermore, we report below the same 6 misspecification tests as for the ARCH(1) model. The output is reported in Box 7  Misspecification Tests.
Unlike the ARCH(1) model, the QStatistics on standardized and squared standardized residuals, as well as the RBD test with various lag values suggest that the GARCH(1,1) does a good job in modelling the dynamics of the first two conditional moments of the NASDAQ.
However, the adjusted Pearson Chisquare goodnessoffit test (with different cell numbers) still points out some misspecification of the overall conditional distribution.
Several authors have proposed to use a Studentt or GED distribution in combination with a GARCH model to model the fat tails of the highfrequency financial timeseries. Furthermore, since the NASDAQ seems to be skewed, a skewedStudent distributions might be justified (see Section 6 for a discussion about nonnormal distributions).
The Exponential GARCH (EGARCH) model, originally introduced by Nelson (1991), is reexpressed in Bollerslev and Mikkelsen (1996) as follows:
 (4.3) 
The value of g(z_{t}) depends on several elements. Nelson (1991) notes that, “to accommodate the asymmetric relation between stock returns and volatility changes (...) the value of g(z_{t}) must be a function of both the magnitude and the sign of z_{t}”.^{2} That is why he suggests to express the function g(.) as
 (4.4) 
E depends on the assumption made on the unconditional density of z_{t}. Indeed, for the normal distribution,
 (4.5) 
For the skewedStudent distribution,
 (4.6) 
where ξ = 1 for the symmetric Student.
For the GED, we have
 (4.7) 
ξ,υ and λ_{υ} concern the shape of the nonnormal densities and was defined in Section 3.6.2.
Note that the use of a log transformation of the conditional variance ensures that σ_{t}^{2} is always positive.
Applying variance targeting to the EGARCH model implies replacing ω by log(σ^{2}), where σ^{2} is the unconditional variance of ε_{t}, which can be consistently estimated by its sample counterpart.^{3}
The output reported below corresponds to the ARMA(1,0)EGARCH(1,1) with a GED distribution. Interestingly, both θ_{1} and θ_{2} are significant. Note that the degree of freedom of the GED distribution is significantly lower than 2, confirming that the standardized residuals are fattailed.
This popular model is proposed by Glosten, Jagannathan, and Runkle (1993). Its generalized version is given by:
 (4.8) 
where S_{t}^{} is a dummy variable that take the value 1 when γ_{i} is negative and 0 when it is positive.
In this model, it is assumed that the impact of ε_{t}^{2} on the conditional variance σ_{t}^{2} is different when ε_{t} is positive or negative. The TGARCH model of Zakoian (1994) is very similar to the GJR but models the conditional standard deviation instead of the conditional variance. Ling and McAleer (2002) provide, among other stationarity conditions for various GARCH models, the conditions of existence of the second and fourth moment of the GJR.
Applying variance targeting to the GJR model implies replacing ω by σ^{2}, where σ^{2} is the unconditional variance of ε_{t}, which can be consistently estimated by its sample counterpart and E(S_{t}^{}) is 0.5 for symmetric distributions (i.e. Normal, Student and GED) and for the SKST.^{4}
A nice feature of the GJR model is that the null hypothesis of no leverage effect is easy to test. Indeed, γ_{1} = … = γ_{q} = 0 implies that the news impact curve is symmetric, i.e. past positive shocks have the same impact on today’s volatility as past negative shocks.
The output reported below suggests the presence of such an effect on the NASDAQ since _{1} = 0.107988 with a robust tvalue of 2.826.
This model has been introduced by Ding, Granger, and Engle (1993). The APARCH (p,q) model can be expressed as:
 (4.9) 
where δ > 0 and 1 < γ_{i} < 1 (i = 1,...,q).
The parameter δ plays the role of a BoxCox transformation of σ_{t} while γ_{i} reflects the socalled leverage effect. Properties of the APARCH model are studied in He and Teräsvirta (1999a, 1999b).
The APARCH includes seven other ARCH extensions as special cases:^{5}
Following Ding, Granger, and Engle (1993), if ω > 0 and ∑ _{i=1}^{q}α_{i} E( γ_{i}z)^{δ}+∑ _{j=1}^{p}β_{j} < 1, a stationary solution for Equation (4.9) exists and is expressed as:
 (4.10) 
Notice that if we set γ = 0, δ = 2 and if z_{t} has zero mean and unit variance, we have the usual stationarity condition of the GARCH(1,1) model (α_{1} + β_{1} < 1). However, if γ≠0 and/or δ≠2, this condition depends on the assumption made on the innovation process.
Ding, Granger, and Engle (1993) derive a closed form solution to κ_{i} = E^{δ} in the Gaussian case. Lambert and Laurent (2001) show that for the standardized skewedStudent:^{6}
For the GED, we can show that:
Note that ξ,υ and λ_{υ} concern the shape of the nonnormal densities and was defined in Section 3.6.2.
Applying variance targeting to the APARCH model implies replacing ω by σ^{δ}, where σ is the unconditional standard deviation of ε_{t}, which can be consistently estimated by its sample counterpart.^{7}
Once again, the APARCH model suggest the presence of a leverage effect on the NASDAQ (see Box 10). Importantly, δ is significantly different from 2 ( = 1.146366) but not significantly different from 1. This suggests that, instead of modelling the conditional variance (GARCH), it is more relevant in this case to model the conditional standard deviation. This result is in line with those of Taylor (1986), Schwert (1990) and Ding, Granger, and Engle (1993) who indicate that there is substantially more correlation between absolute returns than squared returns, a stylized fact of high frequency financial returns (often called ‘long memory’).
In many highfrequency timeseries applications, the conditional variance estimated using a GARCH(p,q) process exhibits a strong persistence, that is:
If ∑ _{j=1}^{p}β_{j} + ∑ _{i=1}^{q}α_{i} < 1, the process (ε_{t}) is second order stationary, and a shock to the conditional variance σ_{t}^{2} has a decaying impact on σ_{t+h}^{2}, when h increases, and is asymptotically negligible. Indeed, let us rewrite the ARCH(∞) representation of the GARCH(p,q), given in Equation (4.2), as follows:
 (4.11) 
where ω^{*} = ω^{1}, λ(L) = α(L)^{1} = ∑ _{i=1}^{∞}λ_{i}L^{i} and λ_{i} are lag coefficients depending nonlinearly on α_{i} and β_{i}. For a GARCH(1,1), λ_{i} = α_{1}β_{1}^{i1}. Recall that this model is said to be second order stationary provided that α_{1} + β_{1} < 1 since it implies that the unconditional variance exists and equals ω∕(1  α_{1}  β_{1}). As shown by Davidson (2004), the amplitude of the GARCH(1,1) is measured by S = ∑ _{i=1}^{∞}λ_{i} = α_{1}∕(1  β_{1}), which determines “how large the variations in the conditional variance can be” (and hence the order of the existing moments). This concept is often confused with the memory of the model that determines “how long shocks to the volatility take to dissipate”. In this respect, the GARCH(1,1) model has a geometric memory ρ = 1∕β_{1}, where λ_{i} = O. In practice, we often find α_{1} + β_{1} = 1. In this case, we are confronted with an Integrated GARCH (IGARCH) model.
The GARCH(p,q) model can be expressed as an ARMA process. Using the lag operator L, we can rearrange Equation (4.1) as:
When the polynomial contains a unit root, i.e. the sum of all the α_{i} and the β_{j} is one, we have the IGARCH(p,q) model of Engle and Bollerslev (1986). It can then be written as:
 (4.12) 
where ϕ(L) = (1  L)^{1} is of order max{p,q} 1.
We can rearrange Equation (4.12) to express the conditional variance as a function of the squared residuals. After some manipulations, we have its ARCH(∞) representation:
 (4.13) 
For this model, S = 1 and thus the second moment does not exist. However, this process is still short memory. To show this, Davidson (2004) considers an IGARCH(0,1) model defined as ε_{t} = σ_{t}z_{t} and σ_{t}^{2} = ε_{t1}^{2}. This process is often wrongly compared to a random walk since the longrange forecast σ_{t+h}^{2} = ε_{t}^{2}, for any h. However ε_{t} = z_{t}ε_{t1} which means that the memory of a large deviation persists for only one period.
In October 1994, the risk management group at J.P. Morgan released a technical document describing its internal market risk management methodology (J.P.Morgan, 1996). This methodology, called RiskMetrics^{TM} soon became a standard in the market risk measurement due to its simplicity.
Basically, the RiskMetrics^{TM} model is an IGARCH(1,1) model where the ARCH and GARCH coefficients are fixed.
The model is given by:
 (4.14) 
where ω = 0 and λ is generally set to 0.94 with daily data and to 0.97 with weekly data.^{8} Note also that Equation 4.14 is the very basic conditional variance model of the RiskMetrics^{TM} methodology, but there exist many extensions of the original model. See the RiskMetrics Group website for details.
To illustrate the need for flexible ARCHtype models, here is the output of the BoxPierce test on squared standardized residuals and the RBD test applied after the estimation of the RiskMetrics model (including an AR(1) term and the two dummy variables). There is no doubt that the RiskMetrics specification is not appropriate.
Volatility tends to change quite slowly over time, and, as shown in Ding, Granger, and Engle (1993) among others, the effects of a shock can take a considerable time to decay.^{9} Therefore the distinction between stationary and unit root processes seems to be far too restrictive. Indeed, the propagation of shocks in a stationary process occurs at an exponential rate of decay (so that it only captures the shortmemory), while for an unit root process the persistence of shocks is infinite. In the conditional mean, the ARFIMA specification has been proposed to fill the gap between short and complete persistence, so that the shortrun behavior of the timeseries is captured by the ARMA parameters, while the fractional differencing parameter allows for modelling the longrun dependence.^{10}
To mimic the behavior of the correlogram of the observed volatility, Baillie, Bollerslev, and Mikkelsen (1996) (hereafter denoted BBM) introduce the Fractionally Integrated GARCH (FIGARCH) model by replacing the first difference operator of Equation (4.13) by ^{d}.
The conditional variance of the FIGARCH (p,d,q) is given by:
 (4.15) 
or σ_{t}^{2} = ω^{*} + ∑ _{i=1}^{∞}λ_{i}L^{i}ε_{t}^{2} = ω^{*} + λ(L)ε_{t}^{2}, with 0 ≤ d ≤ 1. It is fairly easy to show that ω > 0,β_{1}  d ≤ ϕ_{1} ≤ and d≤ β_{1} are sufficient to ensure that the conditional variance of the FIGARCH (1,d,1) is positive almost surely for all t. Setting ϕ_{1} = 0 gives the condition for the FIGARCH (1,d,0). See Section 3.9 to see how to impose these constraints.
When estimating a FIGARCH (1,d,1) model by QML on the NASDAQ dataset (see the next output, labelled Box 11), one sees that d is significantly different from 0 and 1 while ϕ_{1} is barely significant. Importantly, comparing its loglikelihood with the one of the simple GARCH(1,1), i.e. 5359.179 vs. 5370.858, justifies the use of a longmemory process in the conditional variance.
Davidson (2004) notes the interesting and counterintuitive fact that the memory parameter of this process is d, and is increasing as d approaches zero, while in the ARFIMA model the memory increases when ζ increases. According to Davidson (2004), the unexpected behavior of the FIGARCH model may be due less to any inherent paradoxes than to the fact that, embodying restrictions appropriate to a model in levels, it has been transplanted into a model of volatility. The main characteristic of this model is that it is not stationary when d > 0. Indeed,
where c_{1}(d) = d,c_{2}(d) = d(1 d), etc. By construction, ∑ _{k=1}^{∞}c_{k}(d) = 1 for any value of d, and consequently, the FIGARCH belongs to the same “knifeedgenonstationary” class represented by the IGARCH model. To test whether this nonstationarity feature holds, Davidson (2004) proposes a generalized version of the FIGARCH and calls it the HYperbolic GARCH. The HYGARCH is given by Equation (4.15), when λ(L) is replaced by 1 ^{1}ϕ(L). Note that G@RCH reports log(α) and not α. The c_{k}(d) coefficients are thus weighted by α. Interestingly, the HYGARCH nests the FIGARCH when α = 1 (or equivalently when log(α) = 0) and the process is stationary when α < 1 (or equivalently when log(α) < 0) in which case the GARCH component observes the usual covariance stationarity restrictions (see Davidson, 2004 for more details).Note that when estimating the HYGARCH model on the NASDAQ dataset (see the next output, labelled Box 12), one cannot reject the FIGARCH specification in favour of the HYGARCH since = 0.006962 with a robust standard error of 0.029515.
Chung (1999) underscores some additional drawbacks in the BBM model: there is a structural problem in the BBM specification since the parallel with the ARFIMA framework of the conditional mean equation is not perfect, leading to difficult interpretations of the estimated parameters. Indeed the fractional differencing operator applies to the constant term in the mean equation (ARFIMA) while it does not in the variance equation (FIGARCH). Chung (1999) proposes a slightly different process:
 (4.17) 
where σ^{2} is the unconditional variance of ε_{t} .
Applying variance targeting to this model implies replacing σ^{2} its sample counterpart.
If we keep the same definition of λ as in Equation (4.15), we can formulate the conditional variance as:
or
 (4.18) 
Chung (1999) shows that σ^{2} > 0 and 0 ≤ ϕ_{1} ≤ β_{1} ≤ d ≤ 1 is sufficient to ensure the positivity of Equation (4.18) when p = q = 1.^{11}
λ is an infinite summation which, in practice, has to be truncated. BBM propose to truncate λ at 1000 lags (this truncation order has been implemented as the default value in our package, but it may be changed by the user) and replace the unobserved ε_{t}^{2}’s by the empirical counterpart of E(ε_{t}^{2}), i.e. 1∕T ∑ _{t=1}^{T} _{t}^{2}. Contrary to BBM, Chung (1999) proposes to truncate λ at the size of the information set (T  1) and to initialize the unobserved at 0 (this quantity is small in absolute values and has a zero mean).^{12}
Recently, Conrad and Haag (2006) and Conrad (2007) have derived nonnegativity
conditions for FIGARCH and HYGARCH models (respectively). Thanks to
Christian Conrad, these conditions are implemented in the following examples:
Stat_Constr_FIGARCH_Conrad_Haag_Check.ox and
Stat_Constr_HYGARCH_Conrad_Check.ox for a FIGARCH and HYGARCH(1,d,1)
respectively. While the conditions are tested on the estimated parameters in the previous two
example files, the conditions are imposed during the estimation in the next two, i.e.
Stat_Constr_FIGARCH_Conrad_Haag_Impose.ox and
Stat_Constr_HYGARCH_Conrad_Impose.ox.
The idea of fractional integration has been extended to other GARCH types of models, including the Fractionally Integrated EGARCH (FIEGARCH) of Bollerslev and Mikkelsen (1996) and the Fractionally Integrated APARCH (FIAPARCH) of Tse (1998).^{13}
Similarly to the GARCH(p,q) process, the EGARCH(p,q) of Equation (4.3) can be extended to account for long memory by factorizing the autoregressive polynomial = ϕ(L)(1  L)^{d} where all the roots of ϕ(z) = 0 lie outside the unit circle. The FIEGARCH (p,d,q) is specified as follows:
 (4.19) 
Finally, the FIAPARCH (p,d,q) model can be written as:^{14}
 (4.20) 
To illustrate, the next outputs, labelled Box 13 and 14, report the estimation results of a FIAPARCH(1,d,1) and a FIEGARCH(1,d,1) respectively. Once again, longmemory is detected in the conditional variance equation.
Like for the simple ARCH(q) model, it is rather easy to obtain hstepahead forecasts of these more complicated models. In the simple GARCH(p,q) case, the optimal hstepahead forecast of the conditional variance, i.e. _{t+ht}^{2} is given by:
 (4.21) 
where ε_{t+it}^{2} = σ_{t+it}^{2} for i > 0 while ε_{t+it}^{2} = ε_{t+i}^{2} and σ_{t+it}^{2} = σ_{t+i}^{2} for i ≤ 0. Equation (4.21) is usually computed recursively, even if a closed form solution of σ_{t+ht}^{2} can be obtained by recursive substitution in Equation (4.21).
Similarly, one can easily obtain the hstepahead forecast of the conditional variance of an ARCH, IGARCH and FIGARCH model. By contrast, for thresholds models, the computation of outofsample forecasts is more complicated. Indeed, for EGARCH, GJR and APARCH models (as well as for their longmemory counterparts), the assumption made on the innovation process may have an effect on the forecast (especially for h > 1).
For instance, for the GJR (p,q) model, we have
 (4.22) 
When γ_{i} = 0 for all i, we obtain the forecast of the GARCH model. Otherwise, S_{ti+ht}^{} has to be computed. Note first that S_{t+it}^{} = S_{t+i}^{} for i ≤ 0. However, when i > 0, S_{t+it}^{} depends on the choice of the distribution of z_{t}. When the distribution of z_{t} is symmetric around 0 (for the Gaussian, Student and GED density), the probability that ε_{t+i} is negative is S_{t+it}^{} = 0.5. If z_{t} is (standardized) skewedStudent distributed with asymmetry parameter ξ and degree of freedom υ, S_{t+it}^{} = since ξ^{2} is the ratio of probability masses above and below the mode.
For the APARCH (p,q) model,
where E = κ_{i}σ_{t+kt}^{ }, for k > 1 and κ_{i} = E^{ } (see Section 3.6.2).For the EGARCH (p,q) model,
where ĝ(z_{t+kt}) = ĝ(z_{t+k}) for k ≤ 0 and 0 for k > 0.Finally, the hstepahead forecast of the FIAPARCH and FIEGARCH models are obtained in a similar way.
As explained in Section 3.9, it is possible to constrain the parameters to range between a lower and an upper bound by selecting the option MaxBFGS  Bounded Parameters in the Estimate window.
If the user selects the option Select (Individual or Matrix Form) in the Starting Values window, the program automatically opens a new box dialog before launching the estimation. Analogous to the procedure for the starting values, the user can change the bounds manually.
In the following example, an APARCH specification has been used. By default, we propose to impose the constraints 1 < γ_{i} < 1, ∀i = 1,…,q and 0 < δ < 3.
Finally, if the option MaxSA  Unbounded Parameters is selected in the Model Settings window, a new box dialog appears before the estimation.^{15} In this box dialog, the user may change the default values of the parameters used in the simulated annealing algorithm.
Simulated annealing is a global optimization method that potentially distinguishes
between different local optima. Starting from an initial point, the algorithm takes a step and
the function is evaluated. When minimizing a function, any downhill step is accepted and the
process repeats from this new point. An uphill step may be accepted. Thus, it can escape from
local optima. This uphill decision is made by the Metropolis criteria. As the optimization
process proceeds, the length of the steps declines and the algorithm closes in on the
global optimum. Since the algorithm makes very few assumptions regarding the
function to be optimized, it is quite robust with respect to nonquadratic surfaces. The
degree of robustness can be adjusted by the user. In fact, simulated annealing can be
used as a local optimizer for difficult functions. This implementation of simulated
annealing is used in Goffe, Ferrier, and Rogers (1994). Briefly, it is found to be
competitive, if not superior, to multiple restarts of conventional optimization routines
for difficult optimization problems. The price to pay is that it is slower than the
previous two techniques. A comprehensive help is available on the MaxSA web
page:
www.tinbergen.nl/~cbos/software/maxsa.html.
McCullough and Vinod (1999) and Brooks, Burke, and Persand (2001) use the daily German mark/British pound exchange rate data of Bollerslev and Ghysels (1996) to compare the accuracy of GARCH model estimation among several econometric softwares. They choose the GARCH(1,1) model described in Fiorentini, Calzolari, and Panattoni (1996) (hereafter denoted FCP) as the benchmark. In this section, we use the same methodology with the same dataset to check the accuracy of our procedures. Coefficients and standard errors estimates of G@RCH 4.2 are reported in Table 4.1 together with the results of McCullough and Vinod (1999) (FCP in the table).

G@RCH gives very satisfactory results since the first four digits (at least) are the same as those of the benchmark for all but two estimations.
Moreover, to investigate the accuracy of our forecasting procedures, we have run a 8step ahead forecasts of the model, similar to Brooks, Burke, and Persand (2001). Table 4 in Brooks, Burke, and Persand (2001) reports the conditional variance forecasts given by six wellknown softwares and the correct values. G@RCH hits the benchmarks for all steps to the third decimal.
Finally, Lombardi and Gallo (2001) extends the work of Fiorentini, Calzolari, and Panattoni (1996) to the FIGARCH model of Baillie, Bollerslev, and Mikkelsen (1996) and develops the analytic Hessian matrices of this long memory process. For the same DEM/UKP database as in the previous example, Table 4.2 reports the coefficients estimates and their standard errors for our package (using numerical gradients and the BFGS optimization method) and for Lombardi and Gallo (2001) (using analytical gradients and the NewtonRaphson algorithm).

Results show that G@RCH provides excellent numerical estimates that are quite close to the analytical ones, even for an advanced model such as the FIGARCH.^{16}
G@RCH 6.1 allows the simulation of four models, namely the GARCH, GJR, APARCH and EGARCH models. An ARMA specification is allowed in the conditional mean and four different distributions are available for the error term, i.e. a normal, Student, GED or skewedstudent.
Simulation can be done either in ox or through the rolling menus. Simul_Garch.ox is an example of Ox code using the Garch class. Alternatively to simulate a GARCH model using the rolling menus, click on G@RCH in the Modules group in the workspace window on the lefthand side of OxMetrics. Then change the Category to Monte Carlo and Model Class to Simulation of GARCH Models and click on Formulate.
Then, select the model of interest and change the default values of the corresponding parameters (next window). The simulated series can be stored in a separate dataset and are plotted if requested.

G@RCH is primarily a menudriven module for OxMetrics. However, there are two additional ways of estimating GARCHtype models with G@RCH. The first makes use of the batch facilities of OxMetrics. In the second, an Ox program is written that imports the Garch class code into the program.
This chapter is devoted to the illustration of these two methods. No detailed analysis of the results will be undertaken as we concentrate on the mechanics of the procedures.
A useful feature in G@RCH 6.1 is the possibility to use the algebra and batch languages in OxMetrics (see the OxMetrics handbook for more details). The batch language gives some control over OxMetrics through a command language. This allows for automating repetitive tasks, or as a quick way to get to a certain point in your analysis. The syntax of these commands is described below.
There are five ways of running Batch commands.
The most intuitive way is probably to estimate a model first and then click on Tools/Batch Editor ... in OxMetrics (or Alt+b). Then, a new box dialog appears with the batch code corresponding to this model. To illustrate, the next picture shows part of the batch code corresponding to a simple (without explanatory variable) ARMA(1,0)GARCH(1,1) model.
To estimate a slightly different model, one has to change the batch code accordingly. Here is an example batch code used to estimate an ARMA(1,1)GJR(1,1) model using the same database (note that only the commands ARMA_ORDERS() and MODEL() have been modified).
Here is a list of G@RCH specific commands (see Section 10.3  G@RCH Members Functions for more details on these options).
Here is another example batch code.
Feel free to contact us if you need more flexibility for the batch mode.
This section explains how to use the Garch class in Ox. We assume that you have installed Ox (version 5 or later). For more details about Ox, see Jurgen Doornik’s web site (http://www.doornik.com).
G@RCH is build upon the concept of objectoriented programming. For non specialists, objectoriented programming might sound rather daunting at first. But it is in fact quite easy to get the whole picture.
A major component of objectoriented programming is the “class”. Several useful classes are supplied with Ox, such as Database, Modelbase and Simulation classes. Garch is an additional class that helps in the estimation of GARCHtype models. Section 10.1 provides additional details on this concept.
GarchEstim.ox is an example of Ox code using the Garch class. As file editor, we strongly recommend to use OxMetrics or OxEdit.^{1}
The GarchEstim.ox file is displayed in the following Box.
Let us study this file more in details. The #import statement indicates that this file is linked to the Garch.oxo and Garch.h files. In the body of the file (after the main() instruction), a new Garch object is first created and a database is loaded. The user has to enter the correct path of the database, but also has to pay attention to the structure of the database to be used. For instance, to use a Microsoft Excel file, the format of the spreadsheet is of crucial importance. The following convention has to be adopted when loading an Excel spreadsheet: variables are in columns, columns with variables are labelled, there is an unlabelled column containing the dates (with the form YearPeriod) and the data form a contiguous sample. Here is an example:^{2}

OxMetrics also supports databases having a date variable which provides the dates for daily data (or possibly times for higher frequencies). Then selections are made by date, and the dates shown in the graphics. Dates can be read from and saved to Excel files.
We note then that the dependent variable (Y), the regressor(s) in the mean equation (X) and the regressor(s) in the variance equation (Z) are specified with the Select function. Ox being casesensitive, the exact name of the variable has to be entered. The second and third arguments denote the starting and ending observations to be considered. By default, “0” and “0” mean that all the observations are selected. From this selection, a sample can be extracted with the SetSelSample function. The arguments are ordered as (StartYear, StartPeriod, EndYear, EndPeriod2) and the default (1, 1, 1, 1) means all the selected observations.^{3}
The GarchEstim.ox file is made up of six parts:^{4}
If you want to use the “Console Version” of G@RCH 6.1 , all you need is an Ox executable. To illustrate the various ways to run Ox code, we keep our GarchEstim.ox example.
First, you can type
oxl GarchEstim
at the command prompt. This method is mainly for users of Ox Console.
Updating the environment
Skip this section if you managed to run the Ox programs in this booklet. Otherwise, under Windows and Unix you may wish to set the PATH environment variable.
The executable (oxl.exe etc.) is in the ox\bin folder, for example by default it is in:
So, update your PATH variable if necessary.^{5}
Without these, you can still run GarchEstim.ox, but more typing is needed:
The double quotes are required because of the space in the file name.
Alternatively, you may use OxEdit.^{6} Ox tools should be installed automatically. If not, run ox\bin\oxedit\oxcons.tool. To run the program, use the Modules/Ox (for Ox Console) or Modules/OxRun (OxMetrics), as shown below.
When using OxEdit, OxMetrics is needed to display the graphics but is not mandatory for
running the estimation. Alternatively, graphs can be displayed on the screen using Gnudraw and
Gnuplot. Indeed, G@RCH 6.1 can be combined with Gnudraw, an Ox package meant for creating
GnuPlot^{7} developed
by Charles Bos.^{8}
The interface is completely based on the OxDraw package (even the
documentation  gnudraw.html  uses the same structure as Jurgen Doornik’s
oxdraw.html).^{9}
Usage of GnuDraw is intended to be simple as the syntax is similar to the original Ox
drawing routines. You just have to add the lines
#include <oxstd.h>
#include <packages/gnudraw/gnudraw.h>
before
#import <packages/Garch6/Garch>.
As GnuPlot can be called automatically from within Ox, ShowDrawWindow can be used,
displaying graphs on screen.
See CribariNeto and Zarkos (2003) for a comprehensive overview of the GnuDraw package.
A third possibility is to launch Ox programs directly from OxMetrics, as shown below.^{10} There are different ways for running an Ox code under OxMetrics. The first way is to use OxRun.
A new feature of G@RCH 6.1 (and OxMetrics 5 in general) is that Ox code can be generated.
The Model/Ox Batch Code command (or Alt+O) activates a new dialog box called ‘Generate Ox Code’ that allows the user to select an item for which to generate Ox code.
The code is than opened in a new window and provided Ox Professional is available, this code can be run, either from OxMetrics, or from OxEdit or the command line. This option is also available for the ‘Descriptive Statistics’ and ‘Simulation’ modules.
OxBatch_1.ox is an example of Ox Batch code generated by G@RCH 6.1 after the estimation of an APARCH model while OxBatch_2.ox and OxBatch_3.ox are two examples of Ox Batch code generated after the use of the ‘Descriptive Statistics’ and ‘Simulation’ modules.
To illustrate the potential of writing Ox code based on G@RCH and its Garch class, we also provide Forecast.ox, an example that estimates an ARMA(1,0)APARCH(1,1) model on the CAC40. In this program, we analyze the French CAC40 stock index for the years 19951999 (1249 daily observations). Daily returns in percentage are defined as 100 times the first difference of the log of the closing prices.
After the estimation of the ARMA(1,0)APARCH(1,1) model with a skewedstudent
likelihood on the first 800 observations, 448 onestepahead forecasts of the conditional mean
and conditional variance are computed.
Forecasting Performance
One of the most popular measures to check the forecasting performance of the ARCHtype models is the MincerZarnowitz regression, i.e. expost volatility regression:
 (5.1) 
where _{t}^{2} is the expost volatility, _{t}^{2} is the forecasted volatility and a_{0} and a_{1} are parameters to be estimated. If the model for the conditional variance is correctly specified (and the parameters are known) and if E(_{t}^{2}) = _{t}^{2}, we have a_{0} = 0 and a_{1} = 1. The R^{2} of this regression is often used as a simple measure of the degree of predictability of the ARCHtype model.
But _{t}^{2} is never observed. It is thus common to use _{t}^{2} = (y_{t} y)^{2}, where y is the sample mean of y_{t}. The R^{2} of this regression is often lower than 5% and this could lead to the conclusion that GARCH models produce poor forecasts of the volatility (see, among others, Schwert, 1990, or Jorion, 1996). But, as described in Andersen and Bollerslev (1998a), the reason of these poor results is the choice of what is considered as the “true” volatility. Instead, they propose to compute the daily realized volatility as the sum of squared intraday returns and use it as the “true” volatility. Actually, Andersen and Bollerslev (1998a) show that this measure is a more proper one than squared daily returns. Therefore, using hminute returns (5minute for instance), the realized volatility can be expressed as:
 (5.2) 
where y_{k,t} is the return of the k^{th} hminutes interval of the t^{th} day and K is the number of hminutes intervals per day. See Section 7.3 for more details about realized volatility.
Coming back to our illustration, the CAC40 index is computed by the exchange as a weighted measure of the prices of its components and was originally available in our database on an intraday basis with the price index being computed every 15 minutes. For the time period under review, the opening hours of the French stock market were 10.00 am to 5.00 pm, thus 7 hours of trading per day. This translates into 28 intraday returns used to compute the daily realized volatility.^{11} Because the exchange is closed from 5.00 pm to 10.00 am the next day, the first intraday return is the first difference between the log price at 10.15 am and the log price at 5.00 pm the day before. Then, the intraday data are used to compute the daily realized volatility as the sum of the 28 squared intraday returns as shown in Equation (5.2).
Finally, to compare the adequacy of the different distributions in the selected forecasting model, a wellknown tool is the density forecasts tests developed in Diebold, Gunther, and Tay (1998). The idea of density forecasts is quite simple.^{12} Let f_{i}(y_{i}Ω_{i})_{i=1}^{m} be a sequence of m onestepahead density forecasts produced by a given model, where Ω_{i} is the conditioning information set, and p_{i}(y_{i}Ω_{i})_{i=1}^{m} is the sequence of densities defining the Data Generating Process y_{i} (which is never observed). Testing whether this density is a good approximation of the true density p(.) is equivalent to testing:
 (5.3) 
Diebold, Gunther, and Tay (1998) use the fact that, under Equation (5.3), the probability
integral transform _{i} = ∫
_{∞}^{yi}f_{i}(t)dt is i.i.d. U(0,1), i.e. independent and identically
distributed uniform. To check H_{0}, they propose to use both a goodnessoffit test and an
independence test for i.i.d. U(0,1). The i.i.d.ness property of _{i} can be evaluated by
plotting the correlograms of ^{j}, for j = 1,2,3,4,..., to detect potential
dependence in the conditional mean, variance, skewness, kurtosis, etc. Departure
from uniformity can also be evaluated by plotting an histogram of _{i}. According to
Bauwens, Giot, Grammig, and Veredas (2000), a humped shape of the histogram
would indicate that the issued forecasts are too narrow and that the tails of the
true density are not accounted for. On the other hand, a Ushape of the histogram
would suggest that the model issues forecasts that either under or overestimate too
frequently. Moreover, Lambert and Laurent (2001) show that an inverted S shape of the
histogram would indicate that the errors are skewed, i.e. the true density is probably not
symmetric.^{13}
An illustration is provided in Section 5 with some formal tests and graphical tools.
Results
The MincerZarnowitz regression and some outofsample density forecast tests (as suggested by Diebold, Gunther, and Tay, 1998) are also performed.
The program Forecast.ox is printed in the next box. This code has been used to produce Figure 5.1 and the output labelled “Density Forecast Test on Standardized Forecast Errors”. In the first four panels of Figure 5.1, we show the correlograms of ^{j}, for j = 1,2,3,4, where is the probability integral transform for the outofsample period (see the paragraph below Equation (5.3) for more details). This graphical tool has been proposed by Diebold, Gunther, and Tay (1998) to detect potential remaining dependence in the conditional mean, variance, skewness, kurtosis. In our example, it seems that the probability integral transform is independently distributed.
Panel 5 of Figure 5.1 also shows the histogram (with 30 cells) of with the 95% confidence bands. From this figure, it is clear that the ARMA(1,0)APARCH(1,1) model coupled with a skewedStudent distribution for the innovations performs very well with the dataset we have investigated. This conclusion is reinforced by the result reported in the Density Forecast Test on Standardized Forecast Errors box. The adjusted Pearson Chisquare goodnessoffit test provides a statistical version of the graphical test presented in Figure 5.1. In addition, the program also performs the MincerZarnowitz regression (see Equation (5.1)) that regresses the observed volatility (in our case the realized volatility) on a constant and a vector of 448 onestepahead forecasts of the conditional variance (produced by the APARCH model).^{14} Theses results suggest that the APARCH model gives good forecasts of the conditional variance. Based on the significance of the β parameter of the regression, one can indeed hardly conclude that the APARCH model provides biased forecasts. Moreover, the R^{2} of this regression is about 33%, versus 0.14 when we use the daily squared returns (in deviation) as proxy of the observed volatility.^{15}
As mentioned in Section 3.9, the inspected range of the parameter space is when numerical optimization is used to maximize the loglikelihood function with respect to the vector of parameters Ψ. Sometimes these parameters might have to be constrained in a smaller interval.
We have shown how to impose these lower and upper bounds and how to change these bounds in the OxPack version of G@RCH (see Section 4.9).
The same feature is available for the Console version. Furthermore, the Console version allows the user to impose any nonlinear constraint on the parameters (like the stationarity constraints).
The example StartValues_Bounds.ox shows how to print the code needed to change the starting values and the bounds for an ARMA(0,0)GARCH(1,1) with a Student likelihood.
Let us focus now on the following lines:
The function BOUNDS(1) selects the constrained optimizer. PrintBounds(2) prints the code needed to change the bounds while PrintStartValues(3) prints the code needed to change the starting values. The function exit(0) stops the program before starting the estimation (since the starting values and the bounds have to be modified).
The output labelled First Step : Output indicates that the default starting value of the degree of freedom of the Student density (m_cV) is 6 and that m_calpha0 (i.e. ω, the constant of the GARCH model) is bounded between 0 and 100.
To change the starting value of m_cV to say 5 and to fix the bounds of m_calpha0 to [100;100] (instead of [0;100]), just copypaste the lines
at the top of the function StartValues(const object) and change the values as shown below:^{16}
Furthermore, nonlinear constraints (like the stationarity constraint of the GARCH (1,1) model, i.e. α_{1} + β_{1} < 1) can be imposed during the estimation in the Console version. The example Stat_Constr_GARCH shows how to impose this restriction.
In Stat_Constr_GARCH, a new object GARCH_C is created and inherits all the members
and functions of the GARCH class. The option Bounds(1) is selected while the virtual
function cfunc_gt0 is replaced by the function that imposes the corresponding
restrictions:
In this case the nonlinear constraint α_{1} + β_{1} < 1 is evaluated by the cfunc_gt0 function.^{17}
Indeed, the cfunc_gt0 argument can be zero, or a function evaluating the nonlinear constraints (which will be constrained to be positive) with the following format:
In Stat_Constr_FIGARCH.ox, the stationarity constraints of the FIGARCH(1,d,0) and FIGARCH(1,d,1) models are imposed during the estimation. Interestingly imposing the constraint was found to speed up the estimation by about 25 % (while the estimates are exactly the same without imposing it).
OxGauss, is a program available with recent versions of Ox. It provides a way to run Gauss^{18} programs in the Ox environment or to call an existing Gauss procedure under Ox.
Depending on the goal of the analysis and the user’s experience, both features are noteworthy and useful. From an OxG@RCH user point of view, the main objective of OxGauss is to allow existing Gauss programs to be called from Ox with only a minimum number of changes to these programs. This is beneficial to both Ox and Gauss users. It provides more visibility to both and hence increases the potential use of the underlying statistical technique. Furthermore, it can help with the migration from Gauss to Ox.
Running a pure Gauss code with OxGauss is attractive for nonGauss and potentially even for nonOx users because it allows the replication of published work using the console version of Ox. This is an interesting feature since the replicability of simulation and empirical results in econometrics is recognized as being an important aspect of research. An increasing number of researchers in econometrics are making their programs and routines freely available to the econometrics community. As such, OxGauss also provides much added value in that it provides the researcher with a free and rather simple solution to run Gauss programs. See Laurent and Urbain (2003) for a review of OxGauss and the M@ximize web site at http://www.core.ucl.ac.be/~laurent/.^{19}
The main goal of this section is to illustrate the usefulness of OxGauss for G@RCH users.
The first use of OxGauss is to allow Gauss procedures to be called from Ox. This helps in the transition to Ox, and it increases the amount of code available to Ox users.
To illustrate how Gauss programs can be called from Ox, we consider a small project that
mixes both Gauss and Ox programs. The first file, Gaussprocs.src, consists of a code file
that features the procedure gengarch(omega,alpha,beta,nu,T_0,T,n), which simulates
a GARCH model. This procedure has been written by Dick van Dijk (see Franses and van
Dijk, 2000) and is downloadable from his web site
http://www.few.eur.nl/few/people/djvandijk/nltsmef/nltsmef.htm.
To call this procedure from Ox codes, one first has to create a header file. This header file allows the declaration of the functions, constants and external variables so that these are known when required. This is also mandatory to avoid compilation errors in Ox, since functions and global variables have to be explicitly declared before their use. The header file corresponding to our example is Gaussprocs.h.
Additional procedures can be added in Gaussprocs.src, but the header file has to be modified accordingly.^{20} It is recommended to use the .src extension for the Gauss programs and .h for the header files.
In the example /packages/OxGauss/GarchEstim.ox, we use a Gauss procedure to generate 20,000 observations from a GARCH(1,1) process with Studentt errors. Then, we rely on G@RCH to estimate a GARCH(1,1) model by Gaussian QuasiMaximum likelihood. To do this, the Gauss code must be imported into the Ox program, along with the G@RCH package. The #import command has been extended so that OxGauss imports are defined by prefixing the file name with gauss::.
Note that when OxGauss functions or variables are accessed, they must also be prefixed with the identifier gauss::.
When an OxGauss program is run, it automatically includes the ox\include\oxgauss.ox file. This by itself imports the required files:^{21}
These import statements ensure that ox\include\g2ox.h and oxgauss.h are being included. Most of the OxGauss runtime system is in ox\include\g2ox.ox while the keywords are largely in oxgauss.src.
Most of the programs that link Gauss functions to Ox are gathered in the file \include\g2ox.ox. For instance, the output of the Gauss function cumprodc(x) is an N ×K matrix with the cumulative products of the columns of the N ×K matrix x. The Ox code given below (copied from the file g2ox.ox) shows how OxGauss interprets this function.
As indicated in this example, OxGauss does not translate the Gauss code into Ox. Instead, it makes a link between the Gauss function (here cumprodc) and its Ox counterpart (cumprod). When the corresponding Ox function does not exist, Ox code is written between the brackets which computes what the original Gauss function meant. It is important to note that not all Gauss functions are supported by OxGauss. For instance, there is no equivalent of the Gauss function intgrat2 (for the computation of double integrals) in Ox 5. For this reason, the corresponding procedure in g2ox.ox just reports the error message intgrat2() unsupported (see below). However, if such a function becomes available in a future version of Ox, mapping ingrat2 to the corresponding function in Ox will be very easy.
An important aspect of OxGauss is that it supports most of the graphical features of the Gauss library pgraph. As for the standard functions (see Section 5.4.2), the file \oxgauss\src\pgraph.ox now makes the link between pgraph and the Ox graphical package (oxdraw). For instance, the Gauss function xy() is linked to its Ox counterpart DrawXMatrix().
Here is an example of a Gauss program (grGauss.prg) that draws a simple graph.
However, only Ox Professional for Windows supports onscreen graphics (through OxMetrics). By default, nonWindows versions of Ox and Ox Console have no support for graphs. Nevertheless, the user can rely on the Ox package GnuDraw developed by Charles Bos that allows the creation of GnuPlot (see http://www.gnuplot.info) graphics from Ox (see Section 5.2.2). Interestingly, GnuDraw allows the use of the console version for a quick check of the graphical output of Gauss code. Therefore, academic institutions do not have to license the full professional version of Ox if Gauss programs only need to be replicated.
It turns out that the VaR has a simple statistical definition: the VaR at level α for a sample of returns is defined as the corresponding empirical quantile at α%. Because of the definition of the quantile, we have that, with probability 1 α, the returns will be larger than the VaR. In other words, with probability 1 α, the losses will be smaller than the dollar amount given by the VaR.^{1} From an empirical point of view, the computation of the VaR for a collection of returns thus requires the computation of the empirical quantile at level α of the distribution of the returns of the portfolio.
In this section, we show how to use G@RCH to predict the VaR and test the adequacy of the selected model in forecasting the VaR of the investigated series. The material used in the current section is based on Giot and Laurent (2003) and is devoted to show that models that rely on a symmetric density distribution for the error term can underperform with respect to skewed density models when the left and right tails of the distribution of returns must be modelled. Thus, VaR for traders having both long and short positions is not adequately modelled using usual normal or Student distributions. We suggest using an APARCH model based on the skewedStudent distribution to fully take into account the fat left and right tails of the returns distribution. This allows for an adequate modelling of large returns defined on long and short trading positions. The performances of the model is assessed on daily data of the NASDAQ (11/10/1984  21/12/2000).
To characterize the models, we consider a collection of daily returns (in %), y_{t} = 100 , where t = 1,…T , and p_{t} is the price at time t. Because daily returns are known to exhibit some serial autocorrelation, we fit an AR(2) structure on the y_{t} series for all specifications. Accordingly, the conditional mean of y_{t}, i.e. μ_{t}, is equal to μ + ∑ _{j=1}^{2}ψ_{j}(y_{tj}  μ). We now consider several specifications for the conditional variance of ε_{t}.
In its most simple form, it can be shown that the basic RiskMetrics^{TM} model is equivalent to a normal Integrated GARCH (IGARCH) model where the autoregressive parameter is set at a prespecified value λ and the coefficient of ε_{t1}^{2} is equal to 1  λ. In the RiskMetrics^{TM} specification for daily data, λ is fixed to 0.94 and we then have:
 (6.1) 
where z_{t} is IID N(0,1) and σ_{t}^{2} is defined as in Equation (4.14).
The long side of the daily VaR is defined as the VaR level for traders having long positions in the relevant equity index: this is the “usual” VaR where traders incur losses when negative returns are observed. Correspondingly, the short side of the daily VaR is the VaR level for traders having short positions, i.e. traders who incur losses when stock prices increase.^{2}
How good a model is at predicting long VaR is thus related to its ability to model large negative returns, while its performance regarding the short side of the VaR is based on its ability to take into account large positive returns.
For the RiskMetrics^{TM} model, the onestepahead VaR computed in t 1 for long trading positions is given by μ_{t} + z_{α}σ_{t}, for short trading positions it is equal to μ_{t} + z_{1α}σ_{t}, with z_{α} being the left quantile at α% for the normal distribution and z_{1α} is the right quantile at α%.^{3}
The normal APARCH (Ding, Granger, and Engle, 1993) is an extension of the GARCH model of Bollerslev (1986). It is a very flexible ARCHtype model as it nests at least seven GARCH specifications (see Section 4.4). The APARCH(1,1) is:
where ω,α_{1},γ_{1},β_{1} and δ are parameters to be estimated.For the normal APARCH model, the onestepahead VaR is computed as for the RiskMetrics^{TM} model except the computation of the conditional standard deviation σ_{t} which is now given by Equation (6.2) (evaluated at its MLE).
Previous empirical studies on VaR have shown that models based on the normal distribution usually cannot fully take into account the ‘fat tails’ of the distribution of the returns. To alleviate this problem, the Student APARCH (or ST APARCH) is introduced:
 (6.3) 
where z_{t} is IID t(0,1,υ) and σ_{t} is defined as in Equation (6.2).
For the Student APARCH model, the VaR for long and short positions is given by μ_{t} + st_{α,υ}σ_{t} and μ_{t} + st_{1α,υ}σ_{t}, with st_{α,υ} being the left quantile at α% for the (standardized) Student distribution with (estimated) number of degrees of freedom υ and st_{1α,υ} is the right quantile at α% for this same distribution. Note that because z_{α} = z_{1α} for the normal distribution and st_{α,υ} = st_{1α,υ} for the Student distribution, the forecasted long and short VaR will be equal in both cases.
To account for the excess skewness and kurtosis, Fernández and Steel (1998) propose to extend the Student distribution by adding a skewness parameter.^{4} Their procedure allows the introduction of skewness in any continuous unimodal and symmetric (about zero) distribution g(.) by changing the scale at each side of the mode. The main drawback of this density is that it is expressed in terms of the mode and the dispersion. In order to keep in the ARCH ‘tradition’, Lambert and Laurent (2001) reexpressed the skewedStudent density in terms of the mean and the variance, i.e. reparameterize this density in such a way that the innovation process has zero mean and unit variance. Otherwise, it will be difficult to separate the fluctuations in the mean and variance from the fluctuations in the shape of the conditional density (Hansen, 1994).
The innovation process z is said to be (standardized) skewedStudent distributed if:
 (6.4) 
where g(.υ) is the symmetric (unit variance) Student density and ξ is the asymmetry coefficient.^{5} m and s^{2} are respectively the mean and the variance of the nonstandardized skewedStudent and are defined in Section 3.6.2.
Notice also that the density f(z_{t}1∕ξ,υ) is the “mirror” of f(z_{t}ξ,υ) with respect to the (zero) mean, i.e. f(z_{t}1∕ξ,υ) = f(z_{t}ξ,υ). Therefore, the sign of log(ξ) indicates the direction of the skewness: the third moment is positive (negative), and the density is skew to the right (left), if log(ξ) > 0 (< 0).
Lambert and Laurent (2000) show that the quantile function skst_{α,υ,ξ}^{*} of a non standardized skewedStudent density is:
 (6.5) 
where st_{α,υ} is the quantile function of the (unit variance) Studentt density. It is straightforward to obtain the quantile function of the standardized skewedStudent: skst_{α,υ,ξ} = .^{6}
For the skewedStudent APARCH model, the VaR for long and short positions is given by μ_{t} + skst_{α,υ,ξ}σ_{t} and μ_{t} + skst_{1α,υ,ξ}σ_{t}, with skst_{α,υ,ξ} being the left quantile at α% for the skewedStudent distribution with υ degrees of freedom and asymmetry coefficient ξ; skst_{1α,υ,ξ} is the corresponding right quantile. If log(ξ) is smaller than zero (or ξ < 1), skst_{α,υ,ξ} > skst_{1α,υ,ξ} and the VaR for long trading positions will be larger (for the same conditional variance) than the VaR for short trading positions. When log(ξ) is positive, we have the opposite result.
The first step of the application consists in estimating the various models. Then, the adequacy in forecasting the insample VaR will be tested. The code used to perform this illustration is given in the file VaR_insample.ox. The first model to be estimated is an ARMA(2,0)RiskMetrics^{TM} model (with a normal density). The second, third and fourth are ARMA(2,0)APARCH(1,1) models with respectively a normal, Student and skewedStudent densities. The results are given in the four boxes below. Several comments are in order:
To summarize, these results indicate the need for a model featuring a negative leverage effect (conditional asymmetry) for the conditional variance combined with an asymmetric distribution for the underlying error term (unconditional asymmetry). The skewedStudent APARCH model delivers such specifications.
Does it means that this model improves on symmetric GARCH models when the VaR for long and short returns is needed ?
Figure 6.1 plots the insample VaR for the 5% and 95 % quantiles for the AR(2)APARCHSKST model. Using the rolling menus, this graph is easily obtained by using the Test.../Graphic Analysis/InSample VaR Forecasts command and choosing option Theoretical Quantiles’ with the following quantiles: 0.05; 0.95.
All models are tested with a VaR level α which ranges from 5% to 0.25% (quan = < 0.95, 0.975, 0.99, 0.995, 0.9975 >) and their performance is then assessed by computing the failure rate for the returns y_{t}. By definition, the failure rate is the number of times returns exceed (in absolute value) the forecasted VaR. If the VaR model is correctly specified, the failure rate should be equal to the prespecified VaR level. In our empirical application, we define a failure rate f_{l} for the long trading positions, which is equal to the percentage of negative returns smaller than onestepahead VaR for long positions. Correspondingly, we define f_{s} as the failure rate for short trading positions as the percentage of positive returns larger than the onestepahead VaR for short positions.
Because the computation of the empirical failure rate defines a sequence of yes/no observations, it is possible to test H_{0} : f = α against H_{1} : f≠α, where f is the failure rate (estimated by , the empirical failure rate). At the 5% level and if T yes/no observations are available, a confidence interval for is given by  1.96,+ 1.96. In the literature on VaR models, this test is also called the Kupiec (1995) LR test when the hypothesis is tested using a likelihood ratio test. The LR statistic is LR = 2log , where N is the number of VaR violations, T is the total number of observations and α is the theoretical failure rate. Under the null hypothesis that f is the true failure rate, the LR test statistic is asymptotically distributed as a χ^{2}(1).
In this application, these tests are successively applied to the failure rate f_{l} for long trading positions and then to f_{s}, the failure rate for short trading positions using the procedure VaR_Test.
While an extremely easytounderstand concept, the ValueatRisk has however some drawbacks. On of these is that it is not a coherent measure of risk in the sense of Artzner, Delbaen, Eber, and Heath (1999). A somewhat related measure of risk is the socalled expected shortfall (see Scaillet, 2000). Expected shortfall is a coherent measure of risk and it is defined as the expected value of the losses conditional on the loss being larger than the VaR. The average multiple of tail event to risk measure “measures the degree to which events in the tail of the distribution typically exceed the VaR measure by calculating the average multiple of these outcomes to their corresponding VaR measures” (Hendricks, 1996).^{7} These measure are reported when calling the procedure VaR_Test and labelled ESF1 and ESF2 respectively.
Complete VaR results for the four models are reported in the following outputs.
These results indicate that:
Besides the failure rate, a relevant VaR model should feature a sequence of indicator functions (VaR violations) that is not serially correlated. With the new variables Hit_{t}(α) = I(y_{t} < V aR_{t}(α))  α and Hit_{t}(1  α) = I(y_{t} > V aR_{t}(1  α))  α, Engle and Manganelli (1999) suggest to test jointly that:
According to Engle and Manganelli (1999), testing A1 and A2 can be done using the artificial regression Hit_{t} = Xλ + ϵ_{t}, where X is a T × k matrix whose first column is a column of ones, the next p columns are Hit_{t1},…,Hit_{tp} and the k  p  1 remaining columns are additional independent variables (including the VaR itself). Engle and Manganelli (1999) also show that under the null A1 and A2, the Dynamic Quantile test statistic χ^{2}(k), where is the OLS estimate of λ. A small sample version of this test (Ftest) is readily obtained but the difference is negligible since the sample size is larger than 1,000 observations. Note that while Engle and Manganelli (1999) only consider long trading positions, we also use this test when computing the VaR of short trading positions.
The G@RCH package also provide a function called VaR_DynQuant implementing this test.
Complete VaR results corresponding to the Dynamic Quantile test are reported in the following outputs.
The same conclusion holds for the SKSTAPARCH model, i.e. it performs very well for the NASDAQ, unlike its symmetric competitors.
The testing methodology in the previous subsection is equivalent to backtesting the model on the estimation sample. Therefore it can be argued that this should be favorable to the tested model. In a ‘real life situation’, VaR models are used to deliver outofsample forecasts, where the model is estimated on the known returns (up to time t for example) and the VaR forecast is made for period [t + 1,t + h], where h is the time horizon of the forecasts. In this subsection we implement this testing procedure for the long and short VaR with h = 1 day. Note that the code used for this illustration is VaR_outofsample.ox.
We use an iterative procedure where the skewedStudent APARCH model is estimated to predict the onedayahead VaR. The first estimation sample is the complete sample for which the data is available less the last five years. The predicted onedayahead VaR (both for long and short positions) is then compared with the observed return and both results are recorded for later assessment using the statistical tests. At the ith iteration where i goes from 2 to 5 × 252 (five years of data), the estimation sample is augmented to include one more day and the VaRs are forecasted and recorded. Whenever i is a multiple of 50, the model is reestimated to update the skewedStudent APARCH parameters. In other words, we update the model parameters every 50 trading days and we thus assume a ‘stability window’ of 50 days for our parameters. We iterate the procedure until all days (less the last one) have been included in the estimation sample. Corresponding failure rates are then computed by comparing the long and short forecasted V aR_{t+1} with the observed return y_{t+1} for all days in the five years period. We use the same statistical tests as in the previous subsection.
Empirical results for the SKSTAPARCH are given below. Broadly speaking, the results are quite similar (although not as good) to those obtained for the insample testing procedure.
(written with the collaboration of Kris Boudt and Jerome Lahaye)
The recent widespread availability of databases providing the intradaily prices of financial assets (stocks, stock indices, bonds, currencies, …) has led to new developments in applied econometrics and quantitative finance to model daily and intradaily volatility.
The aim of this chapter is to introduce and illustrate several concepts. We first sketch basic notions to understand diffusion models before moving on to an explanation of relevant volatility measures. We introduce the notion of integrated volatility and one popular estimator: realized volatility. We then consider the issues of the presence of jumps in the price process. We then present an important stylized fact of intraday returns, namely intraday periodicity in volatility, and show how to consistently estimate it in the presence and absence of jumps. We also review two estimators of the integrated volatility that are robust in the presence of jumps, i.e. the bipower variation and the realized outlyingness weighted variation.
We also review two different approaches for jumps detection. The first one tests whether the contribution of jumps to the daily variance is zero. The second one tests at the intraday level whether a given return is affected by price jumps.
While G@RCH 6.1 is primarily a menudriven module for OxMetrics, it also provides a set of Ox classes that allow the user to write Ox programs that import these classes. G@RCH contains a set of procedures, gathered in the Realized class, devoted to the computation (or estimation) of
and the simulation of continuoustime stochastic volatility processes.
Most concepts are directly implemented in the menudriven application. As usual, several ox programs are also provided (they are available in /packages/Realized/samples/).
This chapter has been inspired by several excellent sources of information, namely Chapter 6 (ContinuousTime Models and Their Applications) of Tsay (2002), the recent survey of Andersen, Bollerslev, Christoffersen, and Diebold (2006) and more generally by the seminal papers of Andersen, Bollerslev, Diebold and coauthors, BarndorffNielsen and Shephard, as well as Lee and Mykland (2008). This chapter is also based on Boudt, Croux, and Laurent (2008a,2008b).
It is useful to think of returns as arising from an underlying continuoustime process. In particular, suppose that this underlying model involves a continuous sample path for the (logarithmic) price process. The return process may then, under general assumptions, be written in standard stochastic differential equation (sde) form as,
 (7.1) 
where dp(t) denotes the logarithmic price increment, μ(t) the drift, σ(t) refers to the pointintime or spot volatility, and W(t) is a standard Brownian motion.
In a discretetime econometric model, we assume that shocks are driven by a white noise process, which is not predictable. What is the counterpart of shocks in a continuoustime model? The answer is increments of a Wiener process W(t) (also known as a standard Brownian motion).
A continuoustime stochastic process W(t) is a Wiener process if it satisfies:
where Δ is a small increment in time.
To simulate a continuous process, it is of course necessary to discretize time. We simulate the motion at times 0,Δ,2Δ,3Δ,.... Moreover, the definition of a Brownian motion implies that {W(kΔ)  W((k  1)Δ) : k N} is a collection of i.i.d. N(0,1) random variables. So to simulate a trajectory of the Brownian motion up to time mΔ, one needs to generate m independent N(0,1) random variables {Z_{k} : k {1,2,...,m}}.
Since W_{0} ≡ 0, and W(kΔ) ≡ W([k  1]Δ) + Δ^{1∕2}Z_{k},k {1,2,...,m}, we simulate:

This is called the Euler method to simulate a Brownian motion.
The example file Simul_wiener.ox uses the procedure
Simul_Wiener_process(const m, const deltat) to simulate four
Wiener processes consisting of 5 days with 10,000 observations per day
(m = 5 × 10,000 = 50,000 and Δ = 1∕10,000). This procedure is available in
/packages/Realized/samples/procs_simul.ox.
Figure 7.1 represents the four simulated Wiener processes starting with W(0) = 0.
The Wiener process is a special stochastic process with zero drift and variance proportional to the length of time interval. This means that the rate of change in expectation is 0 and the rate of change in variance is 1. Even though this process is of crucial importance in finance, it is desirable to generalize it and allow the mean and the variance to evolve over time in a more realistic manner.
Let us now consider the generalized Wiener process p^{*}(t): its expectation has a drift rate μ while the rate of variance change is σ^{2}.
The generalized Wiener process can be written as follows:
 (7.2) 
where W(t) is a Wiener process, and p^{*}(t) is the price (and not its log). Then, E[p^{*}(t) p^{*}(0)] = μt and Var[p^{*}(t) p^{*}(0)] = σ^{2}t. μ and σ^{2} are often referred to as the drift and volatility parameters of the generalized Wiener process p^{*}(t).
The main limitation of the generalized Wiener process is that both the drift and volatility parameters (μ and σ^{2}) are constant over time while it is well known that daily returns for instance (i.e. t = 1) are heteroscedastic.
The Ito process overcomes this drawback, allowing both the drift and volatility parameters to be timevarying. It is usually written as follows:
 (7.3) 
where W(t) is a Wiener process. This process can be rewritten p^{*}(t) = p^{*}(0) + ∫ _{0}^{t}μ(p^{*}(s),s)ds + ∫ _{0}^{t}σ(p^{*}(s),s)dW(s).
A particular version of this process is commonly used in the literature to describe the behavior of prices and is known as geometric Brownian motion:
 (7.4) 
In other words, a geometric Brownian motion is a continuoustime stochastic process in which the logarithm of the randomly varying quantity follows a Brownian motion. It is easy to show that the random variable log(p^{*}(t)∕p^{*}(0)) is normally distributed with mean (μ  σ^{2}∕2)t and variance σ^{2}t, which reflects the fact that increments of a geometric Brownian motion are normal relative to the current price, reason for which it is called geometric Setting t = 1, e.g. daily data, we have that the oneperiodahead returns are normally distributed, with mean (μ  σ^{2}∕2) and variance σ^{2}. See Chapter 6 of Tsay (2002) for more details.
A drawback of the geometric Brownian motion is that it is not able to generate the well know ARCH effects (volatility clustering). It is thus desirable to make both μ and σ^{2} time varying.
Let us now consider the following diffusion process for the logarithmic price p(t) = log(p^{*}(t)):
 (7.5) 
where μ(t) denotes the drift, σ(t) refers to the pointintime or spot volatility, and W(t) denotes a standard Brownian motion. Intuitively, over (infinitesimal) small time intervals,
 (7.6) 
where ΔW(t) = W(t)  W(t  Δ) ~ N(0,Δ).
For the oneperiod daily return,
 (7.7) 
From Equation (7.7), it is clear that the volatility for the continuoustime process over [t 1,t] is linked to the evolution of the spot volatility σ(t). Furthermore, conditional on the sample path of the drift and the spot volatility processes,
 (7.8) 
where IV _{t} denotes the socalled integrated variance (volatility), and is defined as follows:

To visualize the difference between the spot volatility, the integrated volatility, the conditional volatility of a GARCH(1,1) model and daily squared returns, let us consider the following simulation study.
The simulated model is designed to induce temporal dependencies consistent with the GARCH(1,1) model. The simulated continuoustime GARCH diffusion is formally defined by
where W_{p}(t) and W_{d}(t) denote two independent Brownian motions.^{1}The Realized class contains a function Simul_Continuous_GARCH devoted to the simulation of the above continuoustime GARCH diffusion process. This procedure uses a standard Euler discretization scheme to generate the continuoustime GARCH diffusion process, i.e. p(t + Δ) = p(t) + σ(t)Z_{p}(t) and σ^{2}(t + Δ) = θωΔ + σ^{2}(t), where Z_{p}(t) and Z_{d}(t) denote two independent standard normal variables.
In the example file Simul_Continuous_GARCH.ox, a continuoustime GARCH diffusion process is generated for which the constants are set to θ = 0.035, ω = 0.636 and λ = 0.296. The same model has previously been analyzed in Andersen and Bollerslev (1998a). To simulate exchange rates, we choose Δ = 1∕2880, corresponding to 10 observations per 5minute interval. The number of simulated days = 510 but the first ten days have been discarded, giving a total of 500 simulated days. Furthermore, we use the following initial values for the logprice and spot volatility: p(0) = 1 and σ^{2}(0) = 0.1.
Figure 7.2 graphs the simulated instantaneous logprices, 5min prices and daily prices.

Figure 7.3 plots the corresponding returns, i.e. instantaneous returns, 5min returns and daily returns.

Finally, Figure 7.4 plots five volatility measures:

Two comments are in order.
We have seen in Section 5.3.1 that one of the most popular measures to check the forecasting performance of the ARCHtype models is the MincerZarnowitz regression, i.e. expost volatility regression:
 (7.11) 
where _{t}^{2} is the expost volatility, _{t}^{2} is the forecasted volatility and a_{0} and a_{1} are parameters to be estimated. Recall that if the model for the conditional variance is correctly specified (and the parameters are known) and if E(_{t}^{2}) = _{t}^{2}, we have a_{0} = 0 and a_{1} = 1. Furthermore, the R^{2} of this regression is often used as a simple measure of the degree of predictability of the ARCHtype model.
To judge the quality of the GARCH forecasts it is common to use _{t}^{2} = (r_{t} r)^{2}, where r is the sample mean of r_{t} (daily return at time t). The R^{2} of this regression is often lower than 5% and this could lead to the conclusion that GARCH models produce poor forecasts of the volatility (see, among others, Schwert, 1990, or Jorion, 1996). In their seminal paper, Andersen and Bollerslev (1998a) have shown that if r_{t} follows a GARCH(1,1), the R^{2} of this regression is nothing but = . If κ is the kurtosis of z_{t}, we have that κα_{1}^{2} + β_{1}^{2} + 2α_{1}β_{1} < 1 to ensure the existence of the unconditional kurtosis of r_{t}. It follows then that κα_{1}^{2} < 1  β_{1}^{2}  2α_{1}β_{1} and,
In example file Simul_Continuous_GARCH_R2.ox, we simulate the same diffusion process as before and run the above regression to illustrate the findings of Andersen and Bollerslev (1998a). When using the daily squared returns to measure the observed daily volatility, the R^{2} of the regression is found to be extremely low, i.e. 0.0374596 even though a_{0} and a_{1} are not significantly different from 0 and 1 respectively. However, if we consider the integrated volatility instead of the squared daily returns as an expost volatility measure, the R^{2} now equals 0.421871 suggesting that the GARCH model explains more than 40% of the variability of the true volatility.
It is clear from Section 7.2 that the integrated volatility is an ideal expost measure of volatility and thus useful for assessing the quality of conditional volatility forecasts. In other words, it is the ideal measure for assessing the quality of conditional volatility forecasts, like GARCH models. However, the integrated volatility is never observed in practice.
Inspired by the recent widespread availability of intraday prices of various financial assets, Andersen and Bollerslev (1998a) have proposed to proxy the integrated volatility by the socalled realized volatility.
The most fundamental feature of realized volatility is that it provides a consistent non parametric estimate of the price variability that has transpired over a given discrete interval.
The intuition behind the use of realized volatility is most readily conveyed within the popular continuoustime diffusion presented in Equation (7.5) for which the key quantity of daily volatility is the integral of the squared spot variance. This integral is called the integrated variance.
We assume we dispose of T days of M ≡⌊1∕Δ⌋ equallyspaced intraday returns. Now denote the ith return of day t by r_{t,i} ≡ p(t + iΔ)  p(t + (i  1)Δ), where i = 1,…,M. The daily Realized Variance (RVar) of day t, denoted RV _{t}(Δ), is then defined as:
 (7.12) 
By the theory of quadratic variation, when Δ → 0 and when the true diffusion process belongs to the family of processes presented in Equation (7.5) (i.e. absence of jumps):
 (7.13) 
In other words, under suitable conditions (like the absence of serial correlation in the intraday returns)^{2} , the realized volatility is consistent for the integrated volatility in the sense that when Δ → 0,RV _{t}(Δ) measures the latent integrated volatility IV _{t} perfectly.
G@RCH contains a set of procedures devoted to the computation of the realized volatility (and other related concepts that we will review in the next sections) gathered in the Realized class. For instance, the example file Simul_Continuous_GARCH_RV.ox simulates the previous GARCHdiffusion process and uses the procedure Compute_RV to compute the realized volatility from the simulated 5minute returns, i.e. 288 returns per day.

Figure 7.5 plots four volatility measures obtained when running
Simul_Continuous_GARCH_RV.ox.
The quality of the approximation of IV _{t} by RV _{t}(Δ) is evident from this graph.
Note that the example file Simul_Continuous_GARCH_R2.ox also reported the R^{2} of the MincerZarnowitz regression using the realized volatility as endogenous variable. Not surprisingly, the R^{2} is very close to the previous value obtained for the IV _{t}, i.e. 0.4219 vs. 0.4121.
Empirical studies have shown that a continuous diffusion model as in Equation (7.5) fails to explain some characteristics of asset returns. Furthermore, standard GARCH models are not able to fully explain the excess kurtosis found in most financial time series. In a continuous time framework, the inadequacy of the standard stochastic diffusion model has led to developments of alternative models. Jump diffusion and stochastic volatility models have been proposed in the literature to overcome this inadequacy.
As common in the literature on non parametric estimation of the continuous and jump components in the quadratic variation (QVar), we suppose that the logprice process belongs either to the Brownian SemiMartingale (BSM) or the BSM with Jumps (BSMJ) families of models. Under the BSMJ model, the diffusion component captures the smooth variation of the price process, while the jump component accounts for the rare, large discontinuities in the observed prices. Andersen, Bollerslev, and Dobrev (2007) cite the work of several authors who found that this is a realistic model for the price series of many financial assets.
A BSMJ logprice diffusion admits the representation
 (7.14) 
where μ(t) is a continuous locally bounded variation process, σ(t) is a strictly positive and càdlàg (rightcontinuous with left limits) stochastic volatility process, W(t) is a standard Brownian motion, dq(t) is a counting process with dq(t) = 1 corresponding to a jump at time t and dq(t) = 0 otherwise. The (possibly timevarying) jump intensity is l(t) and κ(t) is the size of the corresponding jump.
Jumps in stock prices are often assumed to follow a probability law. For instance, the jumps may follow a Poisson process, which is a continuoustime discrete process. For a given time t, let X_{t} be the number of times a special event occurs during the time period [0,t]. Then X_{t} is a Poisson process if:

The ł parameter governs the occurrence of the special event and is referred to as the rate or intensity of the process. Recall also that E(X_{t}) = l.
G@RCH also provides a procedure Simul_Continuous_GARCH_JUMPS to simulate a continuoustime GARCH diffusion process with jumps, formally defined by
The jump size κ(t) is modeled as the product between σ(t) and a uniformly distributed random variable on ([2,1] ∪ [1,2]). The parameter m determines the magnitude of the jumps.
Figure 7.6 plots IV _{t} and RV _{t} for 500 days of simulated intraday returns. The DGP is given by Equations (7.15)(7.18) with the same GARCH parameters as before, but m is set to 0.5 and l is chosen such that there is on average one jump every 5 days. The ox code used to get this figure is Simul_Continuous_GARCH_JUMPS_IV_RV.ox.
Importantly, it is clearly visible that the realized volatility does not match the spot and integrated volatility.

This result is not surprising since we know by the theory of quadratic variation that for Δ → 0, we have the following convergence in probability:
 (7.19) 
In other words, in the absence of jumps, the realized volatility is a consistent estimator of the integrated volatility. But not in the presence of jumps. Alternatives to the realized volatility that are robust to jumps are discussed in Section 7.6.
Temporal dependence in volatility is one of the most striking features of financial series recorded at various frequencies. Interestingly, new phenomena become visible as one proceeds from daily returns to intraday returns.
The aim of this section is to show how G@RCH can be used to estimate the intraday (or intraweek) periodicity in volatility.
The codes used to prepare this section are also available in the folder /packages/Realized/samples/.
Note that the dataset has been simulated in such a way that the outputs and graphs obtained when running this example should be very similar to the ones reported in this section.
Like Andersen and Bollerslev (1997), Andersen and Bollerslev (1998a), Martens, Chang, and Taylor (2002) and Taylor and Xu (1997), we study the periodicity in intraday volatility of 5minute EUR/USD returns.^{3}
Our data set ranges from January 2000 to October 2004 and time is expressed in Easter Time (EST, EST = GMT  5), taking into account the daylight saving time shifts in Europe and the USA. All the returns from Friday 16.05 EST through Sunday 16.00 EST are excluded. We also exclude from the sample some regular (Christmas, New Year...) and irregular (Easter...) holidays. We finally delete days where the data quality is dubious. We are left with T = 1175 days of M = 288 equallyspaced and continuously compounded 5minute return observations r_{t,i}, as currencies are traded around the clock. The data consists of the last mid point of the interval (last average of log(bid) and log(ask)). The first interval of day t is 16.05 EST, the last is 16.00 EST.
Figure 7.7 (obtained by running example file Periodicity.ox plots the lag 1 through 1440 (i.e. 288 × 5 = 5 days) sample autocorrelations for the fiveminute absolute returns r_{t,i}.
This graph displays a distinct Ushaped pattern in the ACF of absolute returns. Standard ARCH models imply a geometric decay in the absolute return autocorrelation structure and simply cannot accommodate strong regular cyclical patterns of the sort displayed in Figure 7.7.
The volatility dynamics of intraday financial time series are complex. There are intraday volatility patterns, reflecting the daily activity cycle of the regional centers as well as weekend, the macroeconomic announcement effects (immediately following the release) and standard volatility clustering at the daily level.
The behaviour of a time series is called periodic if it shows a periodic structure in addition to less regular movements. The foreign exchange (FX) market shows strong periodic effects caused by the presence of the traders in the three major markets according to the hour of the day, the day of the week and the Daylight Saving Times. The global FX market consists of three major markets, i.e., Asia, Europe and North America, and the major movements of intradaily return volatility can be attributed to the passage of market activity around the globe.
Figure 7.8 depicts the average absolute returns over the (288) fiveminute intervals. This intraday pattern is quite similar across all days of the week with discrete changes in quoting activity marking the opening and closing of business hours in the three major regional centres, all of which have their own activity pattern.
The following hours can be used as indicative: the Far East is open from 16:00 EST (21:00 GMT) to 1:00 EST (6:00 GMT), Europe trades between 2:00 EST (7:00 GMT) and 11:00 EST (16:00 GMT) and trading in North America occurs from 7:00 EST (12:00 GMT) to 16:00 EST (21:00 GMT). Using the discussion of market opening and closures presented above, we explain the intraday periodic volatility as follows. At 19:00 EST, the Far Eastern market has already been trading for around three hours and market activity is high. From 19:00 EST until about 22:00 EST, activity levels and volatility remain high. The lunchtime in Tokyo (22:00 EST 23:45 EST) is the point of the day corresponding to the most prominent feature of the series. Volatility drops sharply and regains its former value at about 0:00 EST. Generally, Europe begins to contribute to activity at around 2:00 EST as the Far Eastern market begins to wane: there is a small peak in volatility. During European lunch hours (starting around 6:30 EST), both activity and volatility show a slight lull. The most active period of the day is clearly when both the European and North American markets are open (between 7:00 EST and 11:00 EST). Volatility starts to decline as first the European and then US markets wind down. At around 16:00 EST, the pacific market begins to trade again and the daily cycle is repeated after midnight. This intraday pattern is consistent with previous evidence reported in the literature, see Andersen and Bollerslev (1998b) among others.
Failure to take account of this intradaily periodicity is likely to result in misleading statistical analyses as shown by Andersen and Bollerslev (1997) (in the sense that intraday returns do not conform at all to the theoretical aggregation results for the GARCH models).
The periodic phenomena in the volatility of FX markets can be modelled in a variety of ways. GARCH specification with dummy variables can be used for modelling the conditional volatility. For our dataset, this would require estimating 288 timeofday parameters, if one dummy variable were created for each fiveminute interval. The number of variables required is very large and it is unlikely to be effective in capturing the complexity of the periodic patterns.
Recall that we dispose of T days of M equallyspaced and continuously compounded intraday returns and that r_{t,i} is the ith return of day t. We assume the logprice follows a Brownian SemiMartingale with Finite Activity Jumps (BSMFAJ) diffusion.
We assume also that, for small values of Δ, the returns r_{t,i} in an interval without jumps in the underlying price diffusion, are conditionally normally distributed with mean zero and variance σ_{t,i}^{2} = ∫ _{t+(i1)Δ}^{t+iΔ}σ^{2}(s)ds.
Due to the weekly cycle of opening, lunch and closing times of the financial centers around the world, the highfrequency return variance σ_{t,i}^{2} has a periodic component f_{t,i}^{2}.
Depending on the nature of the analysis, there exists a natural window length for which almost all variation in σ_{t,i}^{2} during the window can be attributed to f_{t,i}^{2} such that s_{t,i}^{2} ≡ σ_{t,i}^{2}∕f_{t,i}^{2} is approximately constant over the local window. Henceforth G@RCH follow the suggestion Andersen and Bollerslev (1997, 1998b), Andersen, Bollerslev, and Dobrev (2007) and Lee and Mykland (2008) to use local windows of one day.^{4}
Andersen and Bollerslev (1997, 1998b) suggest to estimate s_{t,i} by ŝ_{t} = ∀i = 1,…,M, where h_{t} is the conditional variance of day t obtained by estimating a GARCH model on daily returns. Under the BSM model, a more efficient estimator for s_{t,i} is ŝ_{t} = .
As we will see in Section 7.6.1, under the BSMJAJ model, s_{t,i} is better approximated using a normalized version of BarndorffNielsen and Shephard (2004b)’s realized bipower variation, i.e. ŝ_{t,i} = where BV _{t} is the bipower variation computed on all the intraday returns of day t (see Equation (7.38)).
To ensure identifiability of the periodicity factor f_{t,i} with respect to the average variance of day t, we impose that the squared periodicity factor has mean one over the local window.
Under this model we have that when Δ → 0 and if r_{t,i} is not affected by jumps, the standardized highfrequency return r_{t,i} = r_{t,i}∕ŝ_{t,i} is conditionally normally distributed with mean zero and variance equal to the squared periodicity factor. This result suggests to estimate the periodicity factor using either a non parametric or parametric estimator of the scale of the standardized returns. The estimator has to be robust to price jumps.
The non parametric periodicity estimator is based on a scale estimate of the standardized returns r_{t,i} = r_{t,i}∕ŝ_{t,i} that share the same periodicity factor.
Let r_{1;t,i},…,r_{nt,i;t,i} be the set of n_{t,i} returns having the same periodicity factor as r_{t,i}. If we condition the estimation of the periodicity factor only on the calendar effects, r_{1;t,i},…,r_{nt,i;t,i} are the returns observed on the same time of the day and day of the week as r_{t,i}.
The classical periodicity estimator is based on the standard deviation
 (7.20) 
where SD_{t,i} = . This estimator is similar to Taylor and Xu (1997)’s periodicity estimate based on averages of squared returns.
In absence of jumps, the standard deviation is efficient if the standardized returns are normally distributed. In the presence of jumps, this estimator is useless, since it suffices that one observation in the sample is affected by a jump to make the periodicity estimate arbitrarily large.
Because of the nonrobustness of the previous estimator in the presence of jumps, Boudt, Croux, and Laurent (2008b) propose to replace the standard deviation in (7.20) by a robust non parametric estimator.
A first candidate is the socalled median absolute deviation (MAD). The MAD of a sequence of observations y_{1},…,y_{n} is defined as
 (7.21) 
where 1.486 is a correction factor to guarantee that the MAD is a consistent scale estimator at the normal distribution. The MAD estimator for the periodicity factor of r_{t,i} equals
 (7.22) 
Amongst the large number of robust scale estimators available in the literature (see Maronna, Martin, and Yohai 2006, for an overview), Boudt, Croux, and Laurent (2008b) also recommend the use of the Shortest Half scale estimator proposed by Rousseeuw and Leroy (1988), because it remains consistent in the presence of infinitesimal contaminations by jumps in the data. Importantly, it has the property of being, among a wide class of scale estimators, the estimator for which jumps can cause the smallest maximum bias possible. Under normality, it has the same efficiency as the MAD and the interquartile range. A final advantage is that it is computationally convenient and does not need any location estimation. For the definition of the Shortest Half scale estimator, we need the corresponding order statistics r_{(1);t,i},…,r_{(nt,i);t,i} such that r_{(1);t,i} ≤r_{(2);t,i} ≤… ≤r_{(nt,i);t,i}. The shortest half scale is the smallest length of all “halves” consisting of h_{t,i} = ⌊n_{t,i}∕2⌋ + 1 contiguous order observations. These halves equal {r_{(1);t,i},…,r_{(ht,i);t,i}}, …, {r_{(nt,iht,i+1);t,i},…,r_{(nt,i);t,i}}, and their length is r_{(ht,i);t,i} r_{(1);t,i}, …, r_{(nt,i);t,i} r_{(ht,i);t,i}, respectively. The corresponding scale estimator (corrected for consistency under normality) equals the minimum of these lengths:
 (7.23) 
The Shortest Half estimator for the periodicity factor of r_{t,i} equals
 (7.24) 
The shortest half dispersion is highly robust to jumps, but it has only a 37% efficiency under normality of the r_{t,i}’s. Boudt, Croux, and Laurent (2008b) show that a better tradeoff between the efficiency of the standard deviation under normality and the high robustness to jumps of the shortest half dispersion is offered by the standard deviation applied to the returns weighted in function of their outlyingness under the ShortH estimate, i.e.
 (7.25) 
where

Because the weighting is applied to the squared standardized returns which are extremely large in the presence of jumps, Boudt, Croux, and Laurent (2008b) recommend the use of the hard rejection with threshold equal to the 99% quantile of the χ^{2} distribution with one degree of freedom, i.e.
 (7.26) 
The factor 1.081 is needed to ensure consistency of the estimator under normality. The Weighted Standard Deviation (WSD) in (7.25) has a 69% efficiency under normality of the r_{t,i}’s.
Note that when the dataset is ’dated’, the length of the periodicity filter equals 5 days, otherwise it is equal to one day.
The non parametric estimators for the periodic component of intraday volatility use only the subset of the data for which the returns have the same periodicity factor. Andersen and Bollerslev (1997) show that more efficient estimates can be obtained if the whole time series dimension of the data is used for the estimation of the periodicity process. Andersen and Bollerslev (1997) use the result that under the assumption that returns are not affected by jumps (i.e. BSM model): r_{t,i} ≈ f_{t,i}s_{t,i}u_{t,i}, where u_{t,i} ~ N(0,1) and so log() ≈ log f_{t,i} + log u_{t,i}, which allows to isolate f_{t,i} as follows,
 (7.27) 
where the error term ε_{t,i} is i.i.d. distributed with mean zero and has the density function of the centered absolute value of the log of a standard normal random variable, i.e.
 (7.28) 
The parameter c = 0.63518 equals the mean of the log of the absolute value of a standard normal random variable. Andersen and Bollerslev (1997) then propose to model log f_{t,i} as a function h of a vector of variables x (such as sinusoid and polynomial transformations of the time of the day) that is linear in the parameter vector θ
 (7.29) 
Combining (7.27) with (7.29), we obtain the following regression equation
 (7.30) 
It is common to estimate the parameter θ in (7.30) by the OLS estimator. This approach is neither efficient nor robust, since the error terms are not normally distributed. Denote the loss functions of the OLS and Maximum Likelihood (ML) estimators by ρ^{OLS}(z) = z^{2} and by
 (7.31) 
where ε_{t,i} is a function of θ (see Equation (7.30)).
Boudt, Croux, and Laurent (2008b) find that the ML estimator has a large bias in the presence of jumps. The nonrobustness of the OLS and ML estimators to jumps is due to the unbounded effect an observation can have on their loss function. These loss functions are plotted in Figure 7.9. As mentioned by Martens, Chang, and Taylor (2002), the effect of jumps on the OLS and ML estimators is attenuated because the regression is based on the log of the standardized returns.
As an alternative to the OLS and ML estimators, Boudt, Croux, and Laurent (2008b) propose to use the Truncated Maximum Likelihood (TML) estimator introduced by Marazzi and Yohai (2004). This estimator gives a zero weight to observations that are outliers according to the value of the ML loss function evaluated at the corresponding residual computed under the robust non parametric estimator ^{WSD} in (7.25). Let
 (7.32) 
Observations for which ρ^{ML}(e_{ t,i}^{WSD}) is large, have a low likelihood and are therefore likely to be outliers (Marazzi and Yohai, 2004). Denote q an extreme upper quantile of the distribution of ε_{t,i}. The TML estimator is defined as
 (7.33) 
with
Like for the non parametric periodicity estimators, we impose that the squared periodicity factor has mean one in the local window. The parametric estimate for the periodicity factor thus equals
 (7.34) 
and similarly for _{t,i}^{OLS} and _{ t,i}^{ML}.
In this section we show how to use G@RCH 6.1 to estimate the periodicity filters presented above on simulated data.
The non parametric periodicity filters presented in Section 7.5.3 are implemented in the functions TaylorXu(...) and Robust_periodicity(...). Their use is illustrated in example file NP_periodicity_filters.ox. This example uses the simulated data obtained by running the files Store_Cont_GARCH_...ox available in the directory /packages/Realized/samples/. There are four different files simulating 5min (FX) returns from a continuoustime GARCH(1,1) diffusion with or without intraday periodicity in volatility and with or without jumps. The simulated data are then stored in the directory /packages/Realized/samples/data/.
Let us concentrate first on Store_Cont_GARCH_FFF.ox. The aim of this program is to generate 5minute returns following a continuoustime stochastic volatility model without drift and spot volatility σ(s) specified as a multiplicative process of the periodicity function f, which depends only on the time of the day s⌊s⌋ and a GARCH diffusion process (see Equation 7.9), i.e.
 (7.35) 
where log(s ⌊s⌋) = θx(s ⌊s⌋)s ⌊s⌋ is intraday time.
σ_{garch}(s) is generated using the values θ = 0.035, ω = 0.636 and λ = 0.296. The function f(s⌊s⌋) used in the simulation is based on log f(s⌊s⌋) = θ′x(s) with x(s) a vector holding sinusoid transformations of s, i.e.
 (7.36) 
where M = 1∕Δ, P is set to 4 and θ = (γ_{1},δ_{1},…,γ_{4},δ_{4}) = (0.24422;0.49756;0.054171;0.073907;0.26098;0.32408;0.11591;0.21442). In this case, the length of the periodicity cycle is one day and the periodicity is thus the same for all the days of the week. To simulate exchange rates, we choose Δ = 1∕2880, corresponding to 10 observations per 5minute interval. The number of simulated days = 3010 but the first ten days have been discarded, which leads actually to a total of 3000 simulated days.
After running this file, the simulated data are automatically saved to ./data/Simulated_cont_GARCH_no_jumps_FFF.in7.
Figure 7.10 plots the true periodicity (dotted line) and the estimated periodicity (solid line) obtained by applying the four non parametric filters on 3000 days of 5min simulated returns (288 returns per day). This model being compatible with the BSM model, the four methods should provide good estimates of the periodicity. As expected, we see that the four methods provide excellent estimates of the periodicity. In a more comprehensive Monte Carlo study, Boudt, Croux, and Laurent (2008b) have shown that in the class of non parametric estimators, the SD is the most efficient estimator.
Figure 7.11 plots the true periodicity (dotted line) and the estimated periodicity (solid line) obtained by applying the three parametric filters on the same data (using the file P_periodicity_filters.ox). Table 7.1 reports the estimated values for the three methods.
As expected, we see that the three methods provide excellent estimates of the periodicity.

Boudt, Croux, and Laurent (2008b) also show that all parametric estimators are more precise that the non parametric estimators. For the parametric estimators, they find that the ML estimator is efficient. The TML estimator is found to be only slightly less efficient than the ML estimator and, as expected, the OLS estimator (Gaussian QML) is the least precise of all parametric estimators.
Let us now consider the case of a BSMJ model (see Equations (7.15)(7.18)) with the same GARCH parameters as before) with μ(s) = 0 and σ(s) specified as the same multiplicative process as in (7.35) (with the same parameter values as in the previous simulation).
The jump size κ(s) is modeled as the product between σ(s) and a uniformly distributed random variable on ([2,1] ∪ [1,2]). The parameter m determines the magnitude of the jumps. We set m equal to 0.3 (rather small jumps). Finally, the jump occurrences q(s) are specified as a Poisson process with on average K = 2 jumps per day. Importantly, occurrences of jumps are concentrated on the parts of the day when volatility is periodically very low (f < 0.45). Figure 7.12 plots the number of simulated jumps per intraday period of time. The simulated data, i.e. Simulated_cont_GARCH_jumps_FFF_K_2_M_0.3.in7 are obtained by running Store_Cont_GARCH_Jumps_FFF.ox with the options K = 2, M = 0.3 and threshold = 0.45.^{5}
Figure 7.13 plots the estimated periodicity obtained by applying the four non parametric filters on the simulated 5min returns (using NP_periodicity_filters.ox).
In this case the SD is expected to be inappropriate because of its non robustness to jumps. This is illustrated in Figure 7.13, where we see that Taylor and Xu (1997)’s periodicity filter (i.e. based on the SD) is very inaccurate while the three robust alternatives are as good as in the previous case of no jump. Again, in a more comprehensive Monte Carlo study, Boudt, Croux, and Laurent (2008b) show that the robust non parametric estimators are little affected by the inclusion of jumps in the price process.
Figure 7.14 plots the true periodicity (dotted line) and the estimated periodicity (solid line) obtained by applying the three parametric filters on the same data (using the file P_periodicity_filters.ox). Table 7.2 reports the estimated values for the three methods. The ML estimator is found to be extremely sensitive to jumps. The optimality of the ML estimator is thus restricted to the model without jumps. Since jumps do occur in practice, we do not recommend its use in practice.

The robustness of the OLS estimator is surprising at first sight, but it corroborates Martens, Chang, and Taylor (2002)’s intuition that the logtransformation shrinks the outliers and makes the estimators based on a regression of the log absolute returns more robust to jumps. However, while being much better than the ML estimator, the OLS (or Gaussian QML) estimator is quite affected by jumps, which is apparently not the case for the TML estimator (for which the fit is again almost perfect).
The main message of this simulation study is that the non parametric WSD and parametric TML estimators have a relatively high efficiency in the absence of jumps. If jumps are present in the process, they are the most accurate of all non parametric and parametric estimators considered, respectively. In all cases considered here, the TML estimator based on the correctly specified periodicity function is more efficient than the WSD estimator. While the advantage of the TML over the WSD is clear when the functional form of the periodicity is known, its major drawbacks are that this functional form is not known a priori in practical applications and its estimation is much more time consuming.
In this section, we use the dataset described in Section 7.5.1, i.e. 5min returns of the Euro  US dollar (EURUSD) exchange rate from January 2000 to October, 2004.
To model the periodic component, we use a slightly more general specification than the one used in the previous simulation studies, i.e.
 (7.37) 
where M = 1∕Δ, P is set to 4, N_{1} = (M + 1)∕2 and N_{2} = (2M^{2} + 3M + 1)∕6 are normalizing constants. Note that Andersen and Bollerslev (1997) originally used the normalizing constant N_{2} = (M + 1)(M + 2)∕6. However, the analytic solution for a sum of squares of integers ∑ _{k=1}^{M}k^{2} = (2M^{3} + 3M^{2} + M)∕6 which leads to the value reported above when computing the mean instead of the sum.
The smooth periodicity generated from the Fourier terms is unlikely to cope well with the sharp drop in volatility, for instance, around lunch in the Far East and the day of the week dependencies. To fill this gap, the term ∑ _{k=1}^{D}λ_{k}I(k)_{t,i} is included, where I(k)_{t,i} is an indicator variable for event k during interval i of day t. The events may be calendar (day of the week dummies, etc.) as well as announcement effects. For instance to account for the announcement effects we have included two dummies that equal one on Friday 8:35 and 8:40 EST.^{6}
Figure 7.15 plots _{t,i}^{SD}, _{ t,i}^{ShortH} and _{ t,i}^{WSD} from Monday to Friday (i.e. 288 × 5 values) while Figure 7.16 plots _{t,i}^{OLS}, _{ t,i}^{ML} and _{ t,i}^{TML}. These graphs have been obtained by running example file Periodicity.ox. Note that for ease of comparison between non parametric and parametric methods, Figure 7.16 also plots _{t,i}^{WSD} on each graph.
We see that while not being radically different, the three non parametric estimates are not equivalent. For instance, the periodicity factor is found to be much higher on Friday at 8h35 (peak around observation 1350) for the nonrobust _{t,i}^{SD} estimator than for the robust ones.
The same comment applies for the parametric estimates. _{t,i}^{TML} and _{ t,i}^{OLS} seem to overestimate the periodicity on that particular interval. Looking at Table 7.3 we see that the estimated coefficient for the ‘Friday 8:35’ dummy is about 1,1.3, and 0.55 for the OLS, ML and TML methods respectively.
Note also that it seems that while the parametric and non parametric estimates are very close to each other it is clear that the specification chosen for the parametric methods does not fully capture the intraday periodicity and needs more refinement.

We have seen in Section 7.4 that in absence of jumps, realized volatility is a consistent estimator of integrated volatility. However, in presence of jumps it is no longer consistent. Furthermore, it has been shown in the literature (see Andersen, Bollerslev, and Diebold, 2007) that jumps are much less persistent (and predictable) than the continuous sample path, or integrated volatility, component of the quadratic variation process. It is thus useful to separate the two components of the quadratic variation process. This was first done by BarndorffNielsen and Shephard (2004b) through the use of the socalled BiPower Variation (BPVar). In this section we also review an alternative to the BPVar, namely Realized Outlyingness Weighted Variation (ROWVar), recently proposed by Boudt, Croux, and Laurent (2008a).
BarndorffNielsen and Shephard (2004b) show that, for a subclass of BSMJ price diffusions (i.e. BSM with Finite Activity Jumps), the normalized sum of products of the absolute value of contiguous returns (i.e. bipower variation) is a consistent estimator for the IVar.
The BPVar (implemented in the procedure Compute_BV()) is defined as:
 (7.38) 
where μ_{1} ≡ ≃ 0.79788 and M = 1∕Δ.
Unlike the RVar, the BPVar is designed to be robust to jumps because its building block is the product between two consecutive returns instead of the squared return. If one of the returns corresponds to a jump and the next one follows the BSM diffusion process, then the product has a small impact on the BPVar, being the sum of many of these building blocks. If the jump process has finite activity^{7} then a.s. jumps cannot affect two contiguous returns for Δ → 0 (or equivalently M →∞) and the jump process has a negligible impact on the probability limit of the BPVar, which coincides with the IVar. Under the BSM with Finite Activity Jumps (BSMFAJ) one has
 (7.39) 
To illustrate the applicability of the above results, let us consider the following picture, obtained by running the example file Simul_Continuous_GARCH_Jumps_1.ox. Note that the DGP is a continuoustime GARCH(1,1) with on average one jump every 5 days (with m = 0.5).
Looking at Figure 7.17, we see that that unlike RV _{t}(Δ) (see Figure 7.6), BV _{t}(Δ) is a much better estimate of the integrated volatility (IV _{t}) in presence of jumps.
For the impact of jumps on the BPVar to be negligible, the highfrequency returns need to be observed over extremely short time intervals (Δ → 0) in order to avoid that jumps affect two contiguous returns and the effect of jumps on the price process may not be smeared out over these short intervals. When returns are computed over longer time intervals such as 5 or 15 minutes, these conditions may be violated. Furthermore, the BPVar is sensitive to the occurrence of “zero” returns in the sample.^{8} The multivariate extension of the BPVar has the additional disadvantage that it is not affine equivariant and it is not necessarily positive semidefinite, which are undesirable properties for covolatility estimators. For theses reasons, Boudt, Croux, and Laurent (2008a) have proposed an alternative to the BPVar that is robust to jumps affecting two or more contiguous returns. This estimator, called Realized Outlyingness Weighted Variance (ROWVar) is also more efficient than the BPVar under the BSM model and its multivariate extension has some nice properties.
The computation of the ROWVar proceeds as follows.
Step 1: Estimation of local outlyingness
Boudt, Croux, and Laurent (2008a) measure the outlyingness
d_{t,i} of return r_{t,i} as the square of the robustly studentized
return:^{9}
 (7.40) 
where _{t,i} is a robust estimate of the instantaneous volatility computed from all the returns belonging to the same local window as r_{t,i}. In Boudt, Croux, and Laurent (2008a), _{t,i} is the Median Absolute Deviation about the median (MAD)^{10} of all the returns belonging to the same local window, which is here equal to one day.^{11}
Because of the presence of a strong intraday periodicity in volatility (see Section 7.5.3), this assumption does not hold when the length of that window is greater than a few minutes. Recall that following Andersen and Bollerslev (1997, 1998b), Andersen, Bollerslev, and Dobrev (2007) and Lee and Mykland (2008) G@RCH uses local windows of length M, i.e. one day.
For this reason, in their empirical application, Boudt, Croux, and Laurent (2008b) propose to compute d_{t,i} on filtered returns _{t,i} = instead of the raw returns r_{t,i}, where _{t,i} is any of the robust periodicity filters presented in Section 7.5.3.
For practical reasons, G@RCH only considers non parametric estimates for _{t,i}, i.e.
_{t,i}^{MAD}, _{t,i}^{ShortH} and _{t,i}^{WSD}. Note that when the dataset is ’dated’, the
length of the periodicity filter is set to 5 days, otherwise it is equal to one day.
Step 2: Computation of ROWVar
Once one has such an outlyingness measure d_{t,i} for all the highfrequency returns in the interval [t 1,t], a weight function has to be chosen. This weight function equals one for returns for which the outlyingness d_{t,i} does not raise any suspicion that the corresponding return r_{t,i} might be affected by jumps and goes to 0 the more extreme d_{t,i} is. Popular weight functions are the hard Rejection weight function
 (7.41) 
and the Soft Rejection Huber weight function
 (7.42) 
with k a tuning parameter to be selected. Under the BSM model and some weak assumptions stated in Boudt, Croux, and Laurent (2008a), the outlyingness measure is asymptotically chisquared distributed with 1 degree of freedom (χ_{1}^{2}). The outlyingness of a return affected by a jump will then be in the extreme tails of the χ_{1}^{2} distribution. Consequently, for the ROWVar the rejection threshold k can be set to χ_{1}^{2}(β), the β quantile of the χ_{1}^{2} distribution function. Boudt, Croux, and Laurent (2008a) strongly recommend to use the SR weight function with β = 95% as a good compromise between robustness and efficiency.
Given a series of highfrequency returns, their estimated outlyingness and a weight function, one can then compute the ROWVar as follows:
 (7.43) 
The correction factor c_{w} in (7.43) ensures that the ROWVar is consistent for the IVar under the BSM model. Table 7.4 (see below) reports the correction factors for the hard and soft rejection weight functions for several values of the critical level β.

Let us now compute ROWVar on the same simulated data than in the previous section. Figure 7.18, obtained by running the example file Simul_Continuous_GARCH_Jumps_2.ox, suggests that ROWV ar_{t}(Δ) also does a good job in estimating the integrated volatility (IV _{t}) in presence of jumps.
A more formal comparison between bipower variation and ROWVar will be presented in Section 7.7.
Two attractive robust estimators of the integrated volatility have been recently proposed by Andersen, Dobrev, and Schaumburg (2008). The MinRV and MedRV (implemented in the procedures Compute_MinRV() and Compute_MedRV()) are defined as follows:
where μ_{2} ≡ π∕(π  2), μ_{3} ≡ π∕(6  4 + π), min stands for minimum and med for median.The MedRV is meant to be less exposed to zeroreturns than the BV.
Evidently, the difference between RV _{t} and any robust to jumps estimator of IV _{t}, denoted _{t}(Δ), is an estimate of the jump contribution or realized jumps.
More formally,
 (7.46) 
where _{t}(Δ) is for instance BV _{t}(Δ) or ROWV ar_{t}(Δ).
Based on the theoretical results of BarndorffNielsen and Shephard (2006) that
 (7.47) 
Andersen, Bollerslev, and Diebold (2007) have developed a formal test for (daily) jumps, i.e.
 (7.48) 
where _{t} is a robust to jumps estimate of the integrated quarticity IQ_{t} ≡∫ _{t1}^{t}σ^{4}(s)ds.
Andersen, Bollerslev, and Diebold (2007) consider the case _{t}(Δ) = BV _{t}(Δ) and use the socalled Tripower quarticity TQ_{t}(Δ) to estimate IQ_{t}, where
 (7.49) 
with μ_{4∕3} ≡ 2^{2∕3}Γ(7∕6)Γ(1∕2)^{1}.
Another popular estimator for IQ_{t}, in the spirit of the bipower (or multipower) variation, is the Quadpower quarticity QQ_{t}(Δ), i.e.
 (7.50) 
When _{t}(Δ) = BV _{t}(Δ), θ = μ_{1}^{4} + 2μ_{1}^{2}  3 ≈ 2.609. Note that TQ_{t}(Δ) and QQ_{t}(Δ) are implemented in the procedures Compute_TQ() and Compute_QQ(), respectively.
The main drawback of TQ_{t}(Δ) and QQ_{t}(Δ) is that like BV _{t}(Δ) they are downward biased in the presence of zero returns. Furthermore, the bias is expected to be even larger in finite sample than for BV _{t}(Δ) because they are made up of products of respectively three and four consecutive absolute returns.
To overcome this problem, Boudt, Croux, and Laurent (2008a) have proposed to replace _{t}(Δ) in Equation (7.48) by ROWV ar_{t}(Δ) and _{t}(Δ) by the Realized Outlyigness Weighted Quarticity
 (7.51) 
where w(.) is the hard rejection weight function. The correction factor d_{w} and the asymptotic variance of the ROWVar θ are reported in Table 7.4 for several choices of the critical level β (used to get the outlyingness threshold k).
In the spirit of their MinRV and MedRV estimators, Andersen, Dobrev, and Schaumburg (2008) propose two alternative robust estimators of the integrated quarticity, namely the MinRQ and MedRQ. The formulas are given herebelow (and implemented in the procedures Compute_MinRQ() and Compute_MedRQ()):
where μ_{4} ≡ π∕(3π  8) and μ_{5} = 3π∕(9π + 72  52).Note also that Andersen, Dobrev, and Schaumburg (2008) show that both the MinRV and MedRV satisfy (7.47), where θ equals 3.81 for the former and 2.96 for the latter (the MedRV being asymptotically more efficient than the MinRV in absence of jumps).
BarndorffNielsen and Shephard (2006) advocated the use of a log version of the Z_{t} statistics. According to them, the following statistic
 (7.54) 
has better finite sample properties.
They also proposed a max version of logZ_{t}, denoted maxlogZ_{t}, where
 (7.55) 
Under the null of no jump on day t, Z_{t}, logZ_{t} and maxlogZ_{t} are asymptotically (as Δ → 0) standard normal. The sequences {Z_{t}(Δ)}_{t=1}^{T}, {logZ_{t}}_{t=1}^{T} and {maxlogZ_{t}}_{t=1}^{T} provide evidence on the daily occurrence of jumps in the price process.
The outlyingness measure d_{t,i} can also be used to build a statistic for daily jump detection. Indeed, under the null of no jump during day t, d_{t,i} ~ χ^{2}(1), ∀i = 1,…,M (see Equation 7.40). In the spirit of the test proposed by Lee and Mykland (2008) (see next section), we can show that follows a Gumbel distribution under the null. More specifically, we reject the null of no jump during day t at the α% critical level if
where G^{1}(1  α) is the 1  α quantile function of the standard Gumbel distribution, C_{n} = (2log n)^{0.5}  and S_{n} = . When n = M or n = MT, the expected number of spurious (daily) detected jumps respectively equals αT and α.
If I_{t,α}(Δ) is a dummy variable that equals 1 if a jump has been detected on day t at the α (e.g. 0.999) significance level (using any of the tests presented above) and 0 otherwise, a better estimate of the realized jumps is given by:
 (7.56) 
To make sure that the jump component added to the continuous one equals the realized volatility, we define as an estimator of the integrated variance
 (7.57) 
Note that when α = 0,C_{t,α}(Δ) = _{t}(Δ), ∀t.
So far, we have considered the estimation of functions of jumps over given time intervals. The approach taken in this section is different. We explain how we can test whether any given intraday return r_{t,i} is from a purely continuous diffusion or is rather due to a jump in the price process. What follows is mainly based on the work of Lee and Mykland (2008) and Boudt, Croux, and Laurent (2008b).
The idea of Lee and Mykland (2008) for jump estimation is intuitive. If a return contains a jump component, it should be abnormally big. However, what constitutes an abnormally big return depends on the volatility condition prevailing at the time of the tested return. Indeed, in times of high volatility, an abnormal return is bigger than an abnormal return in times of low volatility. Hence, Lee and Mykland (2008) study the properties of the ratio of the tested return over a measure of local volatility. They derive asymptotic theory for the statistic and a rejection region under the null of no jump at tested time. In short, they propose a powerful, parsimonious methodology that allows to test whether any return contains a jump component, to know the sign of the jumps and its exact timing.
The statistic J_{t,i} tests whether a jump occurred between intradaily time periods i 1 and i of day t. It is defined as the absolute return divided by an estimate of the local standard deviation _{t,i}, i.e.
 (7.58) 
Under the null of no jump at the testing time, that the process belongs to the family of BSMJ models described in Equation (7.14), and a suitable choice of the window size for local volatility, asymptotically follows a standard normal distribution.
Note that Lee and Mykland (2008) recommend replacing _{t,i} by ŝ_{t,i} = where BV _{t} is the bipower variation computed on all the intraday returns of day t. Boudt, Croux, and Laurent (2008b) propose to account for the strong periodicity in volatility and show that replacing _{t,i} by _{t,i}ŝ_{t,i} is more appropriate. They show that ignoring periodic volatility patterns leads to spurious jump identification. Indeed, the original Lee/Mykland statistic (that neglects the periodicity) tends to overdetect (underdetect) jumps in periods of high (low) intraday periodic volatility. G@RCH 6.1 gives the choice between the three robust non parametric estimation methods described in Section 7.5.3 to estimate _{t,i}, i.e. _{t,i}^{MAD}, _{ t,i}^{ShortH} and _{ t,i}^{WSD}. Recall that when the dataset is ‘dated’, the length of the periodicity filter is set to 5 days, otherwise it is equal to one day.
Under the null of no jump and a consistent estimate _{t,i}, J_{t,j} follows the same distribution as the absolute value of a standard normal variable. Brownlees and Gallo (2006) propose comparing J_{t,i} with the 1  α∕2 quantile of the standard normal distribution. This rule might spuriously detect many jumps, however. Andersen, Bollerslev, and Dobrev (2007) use a Bonferroni correction to minimize spurious jump detection. To minimize the risk of falsely finding jumps, Lee and Mykland (2008) propose inferring jumps from a conservative critical value, which they obtain from the distribution of the statistic’s maximum over the sample size. If the statistic exceeds a plausible maximum, one rejects the null of no jump. Under the stated assumptions and no jump in the interval i  1,i of day t, then when Δ → 0, the sample maximum of the absolute value of a standard normal variable (i.e. the jump statistic J_{t,i}) follows a Gumbel distribution. We reject the null of no jump if
 (7.59) 
where G^{1}(1  α) is the 1  α quantile function of the standard Gumbel distribution, C_{n} = (2log n)^{0.5}  and S_{n} = . When n = 1, the test is similar to the one of Brownlees and Gallo (2006) in the sense that the expected number of spurious detected jumps (under the null) can be extremely large, i.e. αMT. When n = M (i.e. number of observations per day) and n = MT (i.e. total number of observations), this number equals respectively αT and α (i.e. ≈ 0). So if we choose a significance level of α = 0.0001, then we reject the null of no jump at testing time if J_{t,i} > S_{n}β^{*} + C_{n} with β^{*} such that P(ψ ≤ β^{*}) = exp(e^{β* }) = 0.9999, i.e. β^{*} = log(log(0.9999)) = 9.21.
The L&M test is implemented in the function Compute_LeeMykJump of the Realized
class. An example is provided in the file
Jumps_JumpDetection_LeeMykland.ox. In this example, we simulate a GARCH
diffusion with jumps and intraday periodicity (see Section 7.4) and apply the L&M
test.
The first graph on Figure 7.19 shows the simulated 5min logprices. The other graphs plot the simulated (plus) and detected (squares) jumps using the L&M test. Note that the presence of intraday periodicity is ignored in the second graph. The third graph corresponds to the filtered L&M test that implements the correction proposed by Boudt, Croux, and Laurent (2008b), i.e. _{t,i} by _{t,i}ŝ_{t,i} where in this example we chose the WSD periodicity filter _{t,i}^{WSD}.
We see that neglecting the presence of intraday periodicity has a strong impact on the power of the test. Indeed, while all the simulated jumps are correctly detected for the filtered L&M test, most jumps (i.e. small ones) are not detected when ignoring the intraday periodicity. See Boudt, Croux, and Laurent (2008b) for a more comprehensive simulation study.
In the case where r_{t,i} is an Ndimensional return vector generated by the multivariate counterpart of the BSMJ price diffusion model in (7.14), the processes p(s), μ(s) and q(s) are all Ndimensional vector processes and w(s) is a vector of N independent Brownian motions. Denote by Ω(s) the N × N càdlàg process such that Σ(s) = Ω(s)Ω′(s) is the spot covariance matrix process of the continuous component of the price diffusion. Let K be the N × N process controlling the magnitude and transmission of jumps such that K(s)dq(s) is the contribution of the jump process to the price diffusion. We then have that a Ndimensional logprice diffusion can be decomposed as follows:
The integrated covariance matrix (ICov) over [t  1,t] is the matrix
 (7.61) 
Denote by κ_{j} the contribution of the jth jump in [t  1,t] to the price diffusion.
Andersen, Bollerslev, Diebold, and Labys (2003) have shown that the Realized quadratic covariation (RCov)
 (7.62) 
is a consistent estimator for the sum of the ICov and the realized jump variability
 (7.63) 
where j_{t} = ∫ _{t1}^{t}dq^{*}(s), with q^{*}(s) the univariate counting process derived from q(s) such that q^{*}(s) increases by 1 whenever q(s) changes.
For disentangling the continuous and jump components in the RCov, we need an additional estimator for the ICov that is robust to jumps. To this purpose, BarndorffNielsen and Shephard (2004a) introduce the Realized BiPower Covariation process (RBPCov) as the process whose value at time t is the Ndimensional square matrix with k,lth element equal to
where r_{(k)t,i} is the kth component of the return vector r_{t,i}. The factor π∕8 ensures that the RBPCov converges to the ICov under the BSMFAJ model:
 (7.65) 
Like the BPVar, the RBPCov is highly affected by jumps when these jumps affect two contiguous returns. The RBPCov, being a multivariate scale estimator, has the additional disadvantage that it is not affine equivariant and it is not necessarily positive semidefinite, which are undesirable properties for covolatility estimators.
In the next subsection, we show that the inclusion of an appropriate weight function in the RCov, that downweights extreme returns, leads to an estimator that has the following properties:
The computation of the ROWQCov is analogous to the computation of the ROWVar, but
here r_{t,i} is a return vector of dimension N.
Step 1: estimation of local multivariate outlyingness
We estimate the multivariate outlyingness of the return vector r_{t,i} by the Mahalanobis distance between r_{t,i} and 0 using a highly robust estimator of the multivariate scale S(r_{t,i}) of the returns belonging to the same local window as r_{t,i}.^{12} Formally, the outlyingness of r_{t,i} equals
 (7.66) 
Under the BSM model and some weak assumptions stated in Boudt, Croux, and Laurent (2008a), the Mahalanobis outlyingness measure is asymptotically chi squared distributed with N degrees of freedom (χ_{N}^{2}) provided that the spot covariance matrix is constant over the local window.
For the same reasons as the ones evoked in Sections 7.6.2 and 7.8, the assumption of constancy of the covariance is not realistic for local windows of length one day. For that reason, the choice is given to the user to computed d_{t,i} on raw returns (i.e. Equation (7.66)) but also on filtered returns. The three non parametric techniques discussed in Section 7.5.3 are offered, i.e. _{t,i}^{MAD}, _{t,i}^{ShortH} or _{t,i}^{WSD}. Note that the N periodicity filters are estimated separately.
A common choice for a highly robust multivariate scale estimator is the Minimum Covariance Determinant (MCD) covariance estimate proposed by Rousseeuw and Driessen (1999). It is defined as follows. We call a halfsample of a local window a subsample of h = ⌊0.5(M + N + 1)⌋ returns belonging to that local window. Denote now S the covariance matrix computed from the halfsample having the smallest value of the determinant of the covariance matrix computed from it.^{13} Compute then for the ith return belonging to the same local window as r_{t,i} the weight w_{i} = w_{HR}[r_{t,i}′(c_{0.5}S)^{1}r_{t,i}]. The MCD multivariate scale estimator is then given by the weighted covariance matrix of all returns belonging to the local window
The scalars c_{0.5} and c_{0.95} are the correction factors needed for consistency at the multivariate normal distribution. One has that c_{β} = β∕F_{χN+22}(χ_{N}^{2}(β)), where F_{χN+22}(⋅) is the χ_{N+2}^{2} distribution function (Croux and Haesbroeck, 1999). The MCD covariance estimator is our preferred choice to construct the outlyingness.
However, one can also use other robust multivariate scale estimators, such as the average
RBPCov (see above). Note the positive semidefiniteness of ROWQCov is preserved when
d_{t,i} is build upon the RBPCov.
Step 2: computation of ROWQCov
Because the Mahalanobis outlyingness are scalarvalued and asymptotically χ_{N}^{2} distributed, one can use the hard and soft rejection weight functions in (7.41)(7.42) with threshold equal to an extreme quantile of the χ_{N}^{2} distribution, to downweight the highfrequency returns that are outlying with respect to the majority of the returns in their local window. The ROWQCov is then defined as follows.
Definition 2 Let r_{t,i}, for i = 1,…,M, be a sample of M highfrequency returns and d_{t,i} their outlyingness value based on local windows of length one day. For a given weight function w(⋅), the Realized Outlyingness Weighted Quadratic Covariation is defined as
 (7.67) 
The correction factor c_{w} in (7.67) ensures that the ROWQCov is consistent for the ICov at the BSM and BSMFAJ models. It depends on the dimension N of the process and on the weight function used. Denote E[g(u)] the expectation of g(u) where u is standard normal distributed. The correction factor c_{w} is then given by
For the HR weight function defined in (7.41) with threshold k = χ_{N}^{2}(1  α), c_{w} = (1 α)∕F_{χN+22}(χ_{N}^{2}(1 α)), where F_{χN+22}(⋅) is the χ_{N+2}^{2} distribution function. Table 7.5 reports the correction factors for the hard and soft rejection weight functions. For N = 1, the correction factor c_{w} is identical to the one used in (7.43).