asreml – Fit the linear mixed model

Description

asreml estimates variance components under a general linear mixed model by residual maximum likelihood (REML).

Usage

asreml(fixed = y ~ 1, random = ~NULL, sparse = ~NULL, residual = ~NULL,
G.param = list(), R.param = list(), data = sys.parent(),
na.action = na.method(), subset, weights, predict = predict.asreml(),
vcm = vcm.lm(), vcc = matrix(NA), family = asr_gaussian(),
asmv = NULL, mbf = list(), group = list(),
equate.levels = character(0), start.values = FALSE,
knot.points = list(), pwr.points = list(), wald = list(),
prune = list(), combine = list(), uid = list(), mef = list(),
last = list(), model.frame = TRUE, ...)

Arguments

`fixed`	A formula object specifying the fixed terms in the model, with the response on the left of a ∼ operator, and the terms, separated by + operators, on the right. If `data` is given, all names used in all formulae should appear in the data frame. A model with the intercept as the only fixed effect can be specified as ∼1; there must be at least one fixed effect specified. If the response (y) evaluates to a matrix then a factor `trait` with levels `dimnames(y)[[2]]` is added to the model frame, and must be explicitly included in the model formulae.
`random`	A formula object specifying the random effects in the model. This argument has the same general characteristics as `fixed`, but there can be no left side to the ∼ operator. Variance structures imposed on random terms are specified using special model functions.
`sparse`	A formula object, specifying the fixed effects for which the full variance-covariance matrix is not required. This argument has the same general characteristics as `fixed`, but there can be no left side to the ∼ expression. Wald statistics are not available for sparse fixed terms in order to reduce the computing load.
`residual`	A formula object specifying the residual model; any term specified on the left of the ∼ expression is ignored. The default is ∼`units`, where the reserved word `units` is defined as `seq(1,nrow(data))` and is automatically included in the model frame. Variance models for the residual component of the model can be specified using special model functions. For single-section univariate models, the residual variance model determines the computational mode: If the residual variance model specifies a correlation structure (includes `id`()), then the model is fitted on the gamma scale, otherwise the model is fitted on the sigma scale. The default is `id(units)` if not explicitly specified.
`G.param`	Either, a list object derived from the `random` formula, holding initial parameter estimates and boundary constraints for each term, or a character string naming a comma delimited file with a header line and three columns for the variance component name, initial value and constraint code, respectively. This file can be created using the `start.values argument`; the internal list object is then generated from the contents of this file. On termination, G.param is updated with the final `random` component estimates.
`R.param`	Either, a list object derived from the `random` formula, holding initial parameter estimates and boundary constraints, or a character string naming a comma delimited file with a header line and three columns for the variance component name, initial value and constraint code, respectively. This file can be created using the `start.values` argument; the internal list object is then generated from the contents of this file. On termination, `R.param` is updated with the final `residual` component estimates.
`data`	A data frame in which to interpret the variables named in `fixed`, `random`, `sparse` and `residual`. If the `data` argument is missing, the default is `sys.parent()`. The data frame is converted internally to a `data.table` object and returned as such by `model.frame`.
`na.action`	A call to `na.method()` specifying the action to be taken when missing values are encountered in the response (y) or explanatory variables (x). The function definition for `na.method` is: `function(y=c("include","omit","fail"), x=c("fail","include","omit")).` The default action is to include (and estimate) missing values in the response, and raise an error if there are missing values in the explanatory variables.
`subset`	A logical vector identifying which subset of the rows of `data` should be used in the fit. All observations are included by default.
`weights`	A character string or name identifying the column of `data` to use as weights in the fit.
`predict`	A list object specifying the classifying factors and related options when forming predictions from the model. This list would normally be the value returned by a call to the method `predict` for `asreml` objects.
`vcm`	A matrix defining relationships among variance parameters. The matrix has a row for each original variance parameter and a column for each new parameter. The default is the identity matrix, that is, no action. See `vcm.lm` for further information and an example.
`vcc`	Equality constraints between variance parameters; a two-column numeric matrix with a `dimnames` attribute. The first column defines the grouping structure of equated components, that is, components within an equality group are given the same numeric index, and the second column contains the scaling coefficients. The `dimnames()[[1]]` attribute must match the component names in the `asreml` parameter vector; see `start.values`. The parameters are scaled relative to the first parameter in its group, so the scaling of the first parameter in each group is one. For example, the following `vcc` matrix 1 1 2 1 2 2 3 1 is equivalent to the `vcm` matrix 1 0 0 0 1 0 0 2 0 0 0 1
`family`	A list of functions and expressions for defining the link and variance functions. Optionally a list of such structures for a multivariate analysis involving non-normal variates. Currently this is restricted to a bivariate model where the first variate (excluding the multinomial distribution) is non-normal. Supported families are gaussian, inverse Gaussian, binomial, negative binomial, poisson, Gamma and multinomial. Family objects are generated from the `asreml` family functions which prefix the usual function names with “`asr_`“;for example `asr_gaussian(), asr_binomial()` etc. In addition to the link argument, these functions take an additional `dispersion` argument and a `total` argument where relevent; for example: `asr_binomial(dispersion=1.0, total=counts)`. The default for `asr_gaussian()` is `dispersion`=NA, which implies that `asreml` will estimate the dispersion parameter, otherwise the scale is fixed at the nominated value.
`asmv`	A character string or name specifying the column in the data that identifies the traits in a multivariate analysis. If not `NULL`, `asmv` implies that the data for a multivariate analysis is set up as though it were for a univariate analysis with the response in a single variate.
`mbf`	A named list specifying sets of covariates to be included with one or more `mbf()` model functions. Each component of the list must in turn contain components named `key` and `cov`, where `cov` is a character string naming the data frame holding the covariates, and `key` is a character vector of length 2 naming the columns in `data` and `cov`, respectively, used to match corresponding records in the two data frames. The default is an empty list.
`group`	A named list where each component is a numeric vector specifying contiguous fields in `data` that are to be considered as a single term. The component names can then appear in asreml model formulae using the `grp()` special function. The default is an empty list.
`equate.levels`	A character vector of factor names whose levels are to be equated. If factor `A` has levels a,b,c,d and factor `B` has levels a,b,c,e, the effect of `equate.levels(A, B)` is that both `A` and `B` have 5 levels, with `as.numeric(A)` = 1,2,3,4 and `as.numeric(B`) = 1,2,3,5. This may be necessary if using the `and` model function to overlay columns of the model’s design matrix in forming a compound term. The default is a zero length character vector.
`start.values`	If `TRUE`, `asreml` exits prior to the fitting process and returns a list of length 3: the `G.param` and `R.param` lists, and a data frame containing variance parameter names, initial values and boundary constraints. Initial values or constraints can then be set in the list or data frame objects. If a character string, then a file of that name is created and the data frame object containing initial parameter values is written out in comma separated form. This file can be edited externally and subsequently specified in the `G.param` or `R.param` arguments.
`knot.points`	A named list where each component is a vector of user supplied knot points for a particular spline term; the component name is the object of the `spl() model` function.
`pwr.points`	A named list with each component containing a vector of distances to be used in a one-dimensional power model. The component names must correspond to the `object` arguments of the power function model terms.
`wald`	A named list with four components: `denDF, ssType, Ftest` and `kenadj`. denDF a character string from the set [`"none", "numeric", "algebraic",` `"default"`] specifying the calculation of approximate denominator degrees of freedom. The default “none” is to suppress the computations. Algebraic computations are not feasible in large analyses, use “`default`” to automatically choose numeric or algebraic computations depending on problem size. The denominator degrees of freedom are calculated according to Kenward and Roger, 1997 for terms in the `fixed` model formula. ssType can be “`incremental`” for incremental sum of squares (the default) or “`conditional`” for F-tests that respect both structural and intrinsic marginality. Ftest a formula object of the form `∼test_term \| background_terms` specifying a conditional Wald test of the contribution of `test_term` conditional on those fixed terms listed in `background_terms`, and the those in the `random` and `sparse` model formulae. kenadj a character string from the set [`"none", "expected", "observed`]”, specifying any adjustment to the variance matrix for fixed effects. The `wald` argument would typically be set by a call to the `wald` method, which returns more formal output.
`prune`	A named list with each component generated from a call to `Subset(). prune`, in conjunction with `Subset` and the model function `sbs()`, forms a new factor from an existing one by selecting a subset of its levels. The function `Subset` is defined as: `function(f, x)` where `f` is the name of an existing factor and `x` is a character or numeric vector of levels to select. The name of the list component is the new factor that may appear in the model formulae as the argument to the `sbs()` model function. For example, `prune=list(A=Subset(Site, c(2,3)))` creates a new factor A by selecting the second and third levels of `Site`, and would be included in the model as `sbs(A`). While the actions of prune can be duplicated outside `asreml`, `sbs()` is necessary if the `asreml` method `predict()` is to be used.
`combine`	A named list with each component generated from a call to `Levels(). combine`, in conjunction with `Levels` and the model function `gpf()` forms a new factor from an existing one by merging a subset of its levels. The function `Levels` is defined as: `function(f, x)` where `f` is the name of an existing factor and x is a vector of length `length(levels(f))` defining the levels of f to merge. The name of the list component is the new factor that may appear in the model formulae as the argument to the `gpf() model` function. For example, if `Site` has levels “`1`“, “`2`” and “`3`“, `combine=list(A=Levels(Site, c("1","2","1")))` creates a new factor A with levels “`1`” and “`2`” by merging levels “`1`” and “`3`” of `Site`, and would be included in the model as `gpf(A)`. While the actions of `combine` can be duplicated outside `asreml`, `gpf()` is necessary if the `asreml` method `predict()` is to be used.
`uid`	A named list with each component generated from a call to `Units(). uid`, in conjunction with `Units` and the model function `uni()` forms a new factor by selecting a subset of records for an existing one. The function `Units` is defined as: `function(f, n=0)` where `f` is the name of an existing factor and `n` is a character or numeric scalar that determines which records are selected. The default, `n`=0, forms a factor with a level for each record where `f` is non-zero (strictly, `f` != 0). Otherwise, a factor with a level for each record in `data` where `f` has the value `n` is formed. For example, `uid = list(A = Units(group, 1))` creates a new factor A with levels from `row.names(data)` where `group` = 1, and would be included in the model as `uni(A)`. While the actions of `uid` can be duplicated outside `asreml`, `uni()` is necessary if the `asreml` method `predict()` is to be used.
`mef`	A named list linking a relationship matrix (or its inverse) as specified in the `vm()` special function with the original matrix of subject x regressor (typically molecular marker) scores. If not an empty list (the default), `mef` flags the computation of the regressor (marker) effects from the subject effects. For example, `mef=list(MM = snp.mat)` links the relationship matrix `MM` to the original marker scores in `snp.mat`. The `mef` list would typically be set from a call to the `asreml meff()` method.
`last`	A named list restricting the order equations are solved in the sparse partition for the nominated model terms. Each component of the list is named by a model term and contains a scalar n specifying that the first n levels of the term be solved after all others in the sparse set. It is intended for use when there are multiple fixed terms in the sparse equations so that `asreml` will be consistent in which effects are identified as singular. A maximum of three factor/level pairs can be specified.
model.frame	If `TRUE` (the default) the model frame (a `data.table object` with additional attributes derived from the model specification) is included in the returned object. The model frame is required by the `asreml summary, plot, resid` and `fitted` methods. In large analyses, the model frame is likely to be a large object. If `model.frame` is a character string, the model frame is saved in a file as an `RDS` object by a call to `saveRDS()`, and named by the supplied string with the extension .`RDS`. If the model frame is not included in the returned asreml object, this `RDS` file is searched for by the methods noted above.
…	Additional arguments to `asreml` from `asreml.options(): maxit, workspace, pworkspace, fixgammas, trace, aom`.

Details

Models for asreml are specified symbolically in the formula objects fixed, random, sparseand residual. A typical model has the form response ∼ terms, fixed only, or ∼ terms for random, sparse and residual, where response is the (usually numeric) response vector and terms is a linear predictor for response. An exception is raised if the response is a factor and family is not multinomial.
The formulae objects are parsed in the context of the data frame, all internal data structures are constructed in R or compiled code, and the model is fitted by calls to the underlying Fortran REML routines (Gilmour et al., 1995). Variance models for random model terms are specified using special functions in the random and residual formulae. If not specified, the variance models default to (scaled) identity structures. A table of special model functions is included below; see the reference guide or appropriate vignette for further details and examples of their use. Some of these model functions require the formula arguments to be partially evaluated before the final model frame is computed; it is recommended that all names used in the formulae be resolvable in a data frame named by the data argument.
If the response is a matrix, a multivariate linear model is fitted to the columns unless
family = asr_multinomial() is declared.
The terms in the fixed formula are re-ordered by default so that main effects precede interactions in increasing order. The option keep.order (see asreml.options) can be used to modify this behaviour.
A formula has an implied intercept term. To remove the intercept use y ∼ -1 + …. This is only effective in the fixed formula; in all other formula arguments any reference to the intercept is ignored. Note that currently there must be at least one fixed effect in the model.
In addition to the formal arguments, various options can be set with asreml.options; these are stored in an environment for the duration of the R session.
asreml uses either a “gamma” (ratio) or “sigma” (component) scale parameterization for estimation depending on the residual model specification. The current default for single section analyses is the gamma parameterization if the error model specifies a correlation structure. In this case, all scale parameters are estimated as a ratio with respect to the residual variance, with correlation parameters unchanged. If the residual model specifies a variance structure then variance parameters are estimated on the sigma scale. For models with more than one residual section, asreml always estimates variance parameters on the sigma scale.

Value

An object of class asreml containing the results of the fitted model. Instances of generic methods such as plot(), predict() and summary() return various derived results of the fit; resid(), coef() and fitted() extract some of its components. See asreml.object for the components of the returned list.

Special functions

Special model functions are used in asreml formulae objects to create new or modify existing model terms, or more often to specify the variance model associated with one or more terms. These functions can be broadly categorised as constructor-type functions, or (default) identity, time-series, general-structure, 1d metric, 2d metric, known relationship, general variance structures that span more than one term (str), or user-defined structures.
The special model functions that are available in asreml model formulae are introduced below; see the user guide, relevent vignette or the man pages for selected functions for more details or illustration.
The symbols used in the following tables are defined as:

obj a factor in data.
n length(levels(obj)).
p number of parameters estimated by the base function.
v number of parameters estimated by the homogeneous function form.
h number of parameters estimated by the heterogeneous function form.

Constructor type functions

Call	Description
`con(obj)`	Apply sum-to-zero constraints to factor `obj`.
`C(obj, contr)`	Define contrasts among the levels of `obj` from the coefficients in `contr`.
`lin(obj)`	Fits factor `obj` as a variate.
`pow(obj, p, offset)`	Create the term `(offset+obj)^p`.
`pol(x, t)`	Orthogonal polynomials to degree t; –t omits the intercept polynomial.
`leg(x, t)`	Legendre polynomials to degree t; –t omits the intercept polynomial.
`spl(x, k)`	The random component of a cubic spline; optionally k knot points.
`dev(x)`	Fit variate `x` as a factor; typically used for spline deviations.
`ma(obj)`	Forms a moving average (1) design matrix from factor `obj`.
`at(obj, vec)`	Form conditioning covariables for the levels in `obj` given in `vec`.
`dsum(~term \| obj, ~model)`	Direct sum of `term` for the levels of `obj` with variance structure `model`. Used in residual to define multiple sections.
`and(obj, k)`	Add k times the design matrix for `obj` to the previous columns.
`grp(name)`	Include the term defined by `name` in the `group` argument in the model.
`mbf(name)`	Include the covariates defined by `name` in the `mbf` argument as a factor.
`sbs(name)`	Include the term defined in the `prune` argument in the model.
`gpf(name)`	Include the term defined in the `combine` argument in the model.
`uni(name)`	Include the term defined in the `uid` argument in the model.

Default identity

Call	Description	p	v	h
`id(obj)`	identity	0	1	n

Time series type models

Call	Description	p	v	h
`ar1(obj)`	autoregressive order 1	1	2	1+n
`ar2(obj)`	autoregressive order 2	2	3	2+n
`ar3(obj)`	autoregressive order 3	3	4	3+n
`sar(obj)`	symmetric autoregressive	1	2	1+n
`sar2(obj)`	symmetric autoregressive order 2	2	3	2+n
`ma1(obj)`	moving average order 1	1	2	1+n
`ma2(obj)`	moving average order 2	2	3	2+n
`arma(obj)`	autoregressive-moving average	2	3	2+n

Default identity

Call	Description	p	v	h
`cor(obj)`	simple correlation	1	2	1+n
`corb(obj, b)`	banded correlation; b bands	b	b+1	b+n
`corg(obj)`	general correlation	n(n-1)/2	1+n(n-1)/2	n(n+1)/2
`diag(obj)`	heterogeneous variance	n
`us(obj)`	unstructured variance	n(n+1)/2
`sfa(obj, k)`	factor analytic; k factors	kn+n
`fa(obj, k)`	sparse factor analytic	kn+n
`facv(obj, k)`	factor analytic, covariance form	kn+n
`rr(obj, k)`	reduced rank variant of fa	kn+n
`chol(obj, k)`	Cholesky order k	(k+1)(n-k/2)
`cholc(obj, k)`	Cholesky	(k+1)(n-k/2)
`ante(obj, k)`	antedependence order k	(k+1)(n-k/2)
`mthr(obj)`	multinomial models only

Metric based models in 1D or 2D

Call	Description	p	v	h
`exp(x)`	exponential 1D	1	2	1+n
`iexp(x, y)`	isotropic exponential 2D	1	2	1+n
`aexp(x, y)`	anisotropic exponential 2D	2	3	2+n
`gau(x)`	gaussian 1D	1		1+n
`igau(x, y)`	isotropic gaussian 2D	1		1+n
`agau(x, y)`	anisotropic gaussian 2D	2		2+n
`ieuc(x, y)`	isotropic euclidean 2D	1		1+n
`isp(x, y)`	isotropic spherical 2D	1		1+n
`cir(x, y)`	isotropic circular 2D	1		1+n
`mtrn(x, y, ...)`	Matern class 2D	*	*	*

∗ See the user guide for an extended discussion of the Matern class.

Known relationship structures

Call	Description
`vm(obj, source, singG)`	Create a term based on `obj` with known variance structure in `source`.
`ide(obj)`	Identity term based on `obj` with levels as for `vm(obj)`.

General variance structures

Call	Description
`str(~terms, ~model)`	Apply the direct product variance structure in `~model` to the set of terms in `~terms`.

User defined structures

Call	Description
`own(obj, fun, init, type)`	Call `fun` with the parameter estimates in `init` to compute the variance matrix and its derivatives.

References

Gilmour AR, Thompson R and Cullis BR (1995). “AI, An Efﬁcient Algorithm for REML Estimation in Linear Mixed Models.” Biometrics, 51, pp. 1440-1450.
Kenward MG and Roger JH (1997). “Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood.” Biometrics, 53, pp. 983-997.

Examples

## Not run:
data(oats)
oats.asr <- asreml(yield ~ Variety*Nitrogen, random = ~ Blocks/Wplots, data=oats)
## End(Not run)

Updated on July 15, 2021

Was this article helpful?

Yes No

Description

Usage

Arguments

Details

Value

Special functions

Constructor type functions

Default identity

Time series type models

Default identity

Metric based models in 1D or 2D

Known relationship structures

General variance structures

User defined structures

References

See Also

Examples

Was this article helpful?

Related Articles