Introduction to uGMAR

Savi Virolainen

2018-09-22

The package uGMAR contains tools to estimate and work with univariate Gaussian Mixture Autoregressive (GMAR), Student’s t Mixture Autoregressive (StMAR) and Gaussian and Student’s t Mixture Autoregressive (G-StMAR) models. This vignette does not explain details about the models and it’s assumed that the reader is familiar with the cited articles. There are currently no references for G-StMAR model, so it will be discussed briefly in the section “G-StMAR model”.

The models is uGMAR are defined as class gsmar objects, whose are mainly created with the estimation function fitGSMAR or the constructor function GSMAR. The created gsmar object can then be conveniently used as main arguments in many other functions, enabling one for example to perform model diagnostics, predict and simulate processes. Therefore after defining the model, it’s easy to carry out further analyses. Some tasks however, such as setting up initial population in the genetic algorithm, applying linear constraints or defining arbitrary gsmar-models, require accurate understanding on how the parameter vectors are constructed.

The rest of this vignette is organized as follows. In the first section it’s explained what G-StMAR model is and when and why one should use it. In the second section notations for the parameter vector are described in detail and and we also show how to apply constraints to autoregressive parameters of the models. In the third and last section some useful functions found in uGMAR are described.

G-StMAR model

G-StMAR model can be described as a mixture of GMAR and StMAR models. It is such model that some of its mixture components are similar to ones that GMAR model consideres (that is, regular Gaussian AR-processes) and some of them are similar to the ones that StMAR model consideres (that is, conditionally heteroskedastic autoregressive processes based on Student’s t distribution). In uGMAR the first M1 components are taken to be GMAR-type and the rest M2 components StMAR-type. The theoretical properties of G-StMAR model are similar to the ones of GMAR and StMAR model, but instead of mixtures of Gaussian or Student’s t distributions, mixtures of Gaussian and Student’s t distributions are considered.

In StMAR model, the degrees of freedom parameters sometimes get very large estimates for some mixture components. This may be because it compensates weak conditional variance with respect to strong conditional mean, or because the regime is “Gaussian” in general. When the degrees of freedom parameter is large, the corresponding mixture component is actually very much similar to a one that GMAR model uses. Therefore it feels natural to let the components with large degrees of freedom parameter to be GMAR type, and this is when using G-StMAR model comes into question.

Besides describing the phenomenom in more natural way, using G-StMAR model instead of StMAR model with some large degrees of freedom parameters also comes with some practical benefits. One problem with very large degrees of freedom parameters is, that their profile log-likelihoods are very flat near the estimates. This may mess up the numeric gradient and/or Hessian in such way, that using “first order condition” and “second order condition” to check whether the estimates denote a maximum point, a saddle point or something else does not give reliable results. Also because of the numerical approximations used, computing quantile residual tests is not usually possible when the profile log-likelihood is “too flat” for some parameter. With correctly specified G-StMAR model one does not have these problems.

Parameter vector and constraints

Defining a GMAR, StMAR or G-StMAR model requires the user to specify the order of autoregressive coefficients p and the number of mixture components M. For G-StMAR model one has to define the number of GMAR-type components M1 and the number of StMAR-type components M2, yielding total of M1+M2=M components. Other important argument that often needs to be specified is the parameter vector of the model. The form of the parameter vector depends on specifics of the model: is GMAR, StMAR or G-StMAR model considered, are all the AR coefficients restricted to be the same for all regimes and/or are general linear constraints applied to the AR-parameters? It’s vital to use the correct type of parameter vector accordingly.

Regular GMAR, StMAR and G-StMAR models

In the following the intercept parametrization with intercept parameters \(\phi_{m,0}\) is considered. If mean parametrization is used, one simply needs to replace each intercept parameter with the corresponding mean parameter \(\mu_m=\phi_{m,0}/(1-\sum_{i=1}^p\phi_{i,m}),\enspace m=1,...,M.\)

GMAR model

The parameter vector for regular GMAR model is size (M(p+3)-1)x1 vector of form \[\boldsymbol{\theta}=(\boldsymbol{\upsilon_{1}},...,\boldsymbol{\upsilon_{M}}, \alpha_{1},...,\alpha_{M-1}),\quad where\] \[\boldsymbol{\upsilon_{m}}=(\phi_{m,0},\boldsymbol{\phi_{m}}, \sigma_{m}^2) \enspace and \enspace \boldsymbol{\phi_{m}}=(\phi_{m,1},...,\phi_{m,p}) ,\quad m=1,...,M.\] Symbol \(\phi\) denotes an AR coefficient, \(\sigma^2\) component variance and \(\alpha\) a mixing weight parameter.

StMAR model

In order work with StMAR model, the parameter vector has to be expanded with degrees of freedom parameters. Consequently the parameter vector for regular StMAR model is size (M(p+4)-1)x1 vector of form \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{1},...,\nu_{M})\] denotes the degrees of freedom parameters and parameter \(\boldsymbol{\theta}\) is as in the case of GMAR model. To ensure the existance of finite second moments the degrees of freedom parameters \(\nu_{m}\) are assumed to be larger than \(2\).

G-StMAR model

In the G-StMAR model the first M1 components are GMAR-type and the rest M2 components are StMAR-type. The parameter vector of G-StMAR model is similar to the one of StMAR model, with M2 degrees of freedom parameters for the StMAR-components. That is, a size (M(p+3)+M2-1)x1 vector of form \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{M1+1},...,\nu_{M})\] denotes the degrees of freedom parameters and parameter \(\boldsymbol{\theta}\) is as in the case of GMAR model. As in the StMAR case, the degrees of freedom parameters are assumed to be larger than two.

Restricted GMAR, StMAR and G-StMAR models

Besides the regular GMAR, StMAR and G-StMAR models, uGMAR gives an option to work with restricted models. This means that the AR coefficients \(\phi_{m,1},...,\phi_{m,p}\) are restricted to be the same for all regimes \(m=1,..,M.\) Structure of the parameter vector is different for restricted and non-restricted models.

GMAR model

The parameter vector for restricted GMAR model is size (3M-p+1)x1 vector of form \[\boldsymbol{\theta}=(\phi_{1,0},...,\phi_{M,0},\boldsymbol{\phi},\sigma_{1}^2,...,\sigma_{M}^2,\alpha_{1},...,\alpha_{M-1}), \quad where \quad \boldsymbol{\phi}=(\phi_{1},...,\phi_{p}).\]

StMAR model

The parameter vector for restricted StMAR model is then defined by adding the degrees of freedom parameters, yielding size (4M-p+1)x1 vector of form \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{1},...,\nu_{M})\] again denotes the degrees of freedom parameters and parameter \(\boldsymbol{\theta}\) is as in the case of GMAR model.

G-StMAR model

The parameter vector for restricted G-StMAR model is similar to the StMAR model’s one, but with M2 degrees of freedom parameters for the StMAR components.

So one will have to work with different kind of parameter vectors depending on wether you work with restricted or non-restricted model. In order to restrict the AR parameters or to implicate that the parameter vector is restricted, one needs to set restricted=TRUE in the function’s arguments.

Applying general linear constraints and how it affects the parameter vector

uGMAR makes it easy to apply linear constraints to the autoregressive parameters of GMAR, StMAR and G-StMAR models. Besides restricted models, each mixture component has its own constraint matrix. We considers constraints of form \[\boldsymbol{\phi_{m}}=\boldsymbol{C_{m}\psi_{m}}, \enspace m=1,...,M,\] where \(\boldsymbol{C_{m}}\) is known size \((pxq_{m})\) constraint matrix of full column rank and \(\boldsymbol{\psi_{m}}\) is size \((q_{m}x1)\) parameter vector.

A special case of this is to constrain some of the AR coefficients to be zero. Another special case is the mixture version of Heterogenious Autoregressive (HAR) model, which can be obtained by setting \[\boldsymbol{R_{m}}=\left[{\begin{array}{ccc} \boldsymbol{\iota}_{5} & \frac{1}{5}\boldsymbol{1}_{5} & \frac{1}{22}\boldsymbol{1}_{5} \\ 0_{17} & 0_{17} & \frac{1}{22}\boldsymbol{1}_{17} \\ \end{array}}\right],\] where \(\boldsymbol{\iota}_{5}=[1,0,0,0,0]'\) for all regimes \(m=1,...,M\) and applying the constraints to GMAR(22,M) model.

In order to apply linear constraints in uGMAR, one simply has to parametrize the model with vectors \(\boldsymbol{\psi_{m}}\) instead of \(\boldsymbol{\phi_{m}}\) and provide the constraint matrices \(\boldsymbol{C_{m}}\) in the argument constraints (or if estimating a model, only the constraint matrices need to be provided). Note that despite the lengths of \(\boldsymbol{\psi_{m}}\), the nominal order of AR coefficients is always \(p\) for all regimes.

Non-restricted GMAR, StMAR and G-StMAR models

Similarly as in the case of regular GMAR model, the parameter vector for constrained GMAR model is of form \[\boldsymbol{\theta}=(\boldsymbol{\upsilon_{1}},...,\boldsymbol{\upsilon_{M}}, \alpha_{1},...,\alpha_{M-1}),\] but now the vectors \(\boldsymbol{\upsilon_{m}}\) are defined by using vectors \(\boldsymbol{\psi_{m}}\), that is \[\boldsymbol{\upsilon_{m}}=(\phi_{m,0},\boldsymbol{\psi_{m}}, \sigma_{m}^2) \enspace and \enspace \boldsymbol{\psi_{m}}=(\psi_{m,1},...,\psi_{m,q_{m}}), \enspace m=1,...,M.\] The user has to also provide a list of constraint matrices \(\boldsymbol{R_{m}}\) that satisfy \(\boldsymbol{\phi_{m}}=\boldsymbol{R_{m}\psi_{m}}\) for all \(m=1,...,M.\)

The parameter vector for constrained StMAR model is again defined by simply adding the degrees of freedom parameters, that is \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{1},...,\nu_{M}),\] and \(\boldsymbol{\theta}\) is as in the case of constrained GMAR model.

The parameter vector for constrained G-StMAR model is similar to the one of constrained StMAR model, but with degrees of freedom parameters for the StMAR components only.

Restricted GMAR, StMAR and G-StMAR models

Just as for non-restricted models, the parameter vectors for constrained versions of restricted GMAR, StMAR and G-StMAR models are defined by simply replacing vector \(\boldsymbol{\phi}\) with vector \(\boldsymbol{\psi}\). Hence the parameter vector for restricted and constrained GMAR model is of form \[\boldsymbol{\theta}=(\phi_{1,0},...,\phi_{M,0},\boldsymbol{\psi},\sigma_{1}^2,...,\sigma_{M}^2,\alpha_{1},...,\alpha_{M-1}), \quad where \quad \boldsymbol{\psi}=(\psi_{1},...,\psi_{p}).\] The constraint matrix \(\boldsymbol{C}\) needs to be provided and it is assumed to satisfy \(\boldsymbol{\phi}=\boldsymbol{R\psi}.\)

The parameter vector for restricted and constrained StMAR model is then again defined by adding the degrees of freedom parameters, that is \((\boldsymbol{\theta}, \boldsymbol{\nu})\) where \(\boldsymbol{\nu}=(\nu_{1},...,\nu_{M}).\) For restricted and constrained G-StMAR model the parameter vector is similar to the one of restricted and constrained StMAR model, but with degrees of freedom parameters for the StMAR components only.

Some functions in uGMAR

Estimating a GMAR, StMAR or G-StMAR model

The function used to estimate models in uGMAR is fitGSMAR. It employs a maximum likelihood estimation processthat is performed in two phases. In the first phase fitGSMAR uses genetic algorithm to find starting values for gradient based variable metric algorithm, which it then uses in the second phase for finalize estimations. It’s important to keep in mind that it’s not guaranteed, that the numerical estimation algorithms will end up in the global maximum point rather than a local one. Because of multimodality and challenging surface of the log-likelihood function, it’s actually expected that most of the estimation rounds won’t find the global maximum point. For this reason one should always perform multiple estimation rounds, and more estimation rounds yield more reliable result. The number of estimation rounds can be controlled with the argument nCalls, but this is also done by default and it takes use of parallel computing (number of cores used can be set with the argument nCores).

There is also an option to perform some quantile residual tests for the estimated model to get a quick sense how the model fits to the data.

If the model estimates poorly, it is often because the number of mixture components is chosen too large. One may also adjust the settings of the genetic algorithm employed, or set up an initial population with guesses for the estimates. This can by done by passing arguments in fitGSMAR to the (non-exported) function GAfit employing the genetic algorithm. To check the available settings, read the documentation ?GAfit. If the iteration limit is reached when estimating the model, the function iterate_more can be used to finish the estimation.

The parameters of the estimated model are printed in an illustrative and easy to read form. In order to easily compare approximate standard errors to certain estimates, it’s advisable to use the summary method, which prints them inside brackets next to the estimates. Numerical approximation of the gradient and Hessian matrix of the log-likelihood at the estimates can be obtained conveniently with the functions get_gradient and get_hessian. The estimated objects also have their own plot method.

Model diagnostics

The package uGMAR considers model diagnostics based on quantile residuals (see Kalliovirta 2012), whose are asymptotically standard normal distruted if the model is correctly specified. Quantile residuals can be hence used for graphical diagnostics and testing.

The function quantileResidualTests performs quantile residual tests introduced by Kalliovirta (2012), testing normality, autocorrelation and conditional heteroscedasticity. For graphical diagnostics, one may use the function diagnosticPlot.

Consider installing the suggested package gsl for much faster evaluations of quantile residuals in the cases of StMAR and G-StMAR models. If the model and data are both large, performing quantile residuals tests may take significantly long time for StMAR and G-StMAR models without the package gsl, because numerical integration is used. It’s not imported because it may be tricky to install on some platforms.

Constructing class ‘gsmar’ model without estimation

One may wish to construct an arbitrary model without any estimation process, for example in order to simulate from the particular process of interest. An arbitrary model can be created with the function GSMAR(). If one wants to add or update data to the model afterwards, it’s advisable to use the function add_data.

Simulating from class ‘gsmar’ process

The function simulateGSMAR() is the one for the job. As the main argument it uses a gsmar-object (usually) generated by fitGSMAR() or GSMAR().

Forecasting class ‘gsmar’ process

The package uGMAR contains predict method predict.gsmar() for forecasting GMAR, StMAR and G-StMAR processes. For one step predictions using the exact formula for conditional mean is supported, but the forecasts further than that are based on independent simulations. The predictions are either sample means or medians and the confidence intervals are based on sample quantiles. The objects generated by predict.gsmar() have their own plot method.

Multivariate analysis

For multivariate analysis, one is welcome to try the package gmvarkit. It consideres GMVAR-model which is the multivariate extension of GMAR model.

References