Description
asreml estimates variance components under a general linear mixed model by residual maximum likelihood (REML).
Usage
asreml(fixed = y ~ 1, random = ~NULL, sparse = ~NULL, residual = ~NULL, G.param = list(), R.param = list(), data = sys.parent(), na.action = na.method(), subset, weights, predict = predict.asreml(), vcm = vcm.lm(), vcc = matrix(NA), family = asr_gaussian(), asmv = NULL, mbf = list(), group = list(), equate.levels = character(0), start.values = FALSE, knot.points = list(), pwr.points = list(), wald = list(), prune = list(), combine = list(), uid = list(), mef = list(), last = list(), model.frame = TRUE, ...)
Arguments
fixed |
A formula object specifying the fixed terms in the model, with the response on the left of a ∼ operator, and the terms, separated by + operators, on the right. If data is given, all names used in all formulae should appear in the data frame. A model with the intercept as the only fixed effect can be specified as ∼1; there must be at least one fixed effect specified. If the response (y) evaluates to a matrix then a factor trait with levels dimnames(y)[[2]] is added to the model frame, and must be explicitly included in the model formulae. |
random |
A formula object specifying the random effects in the model. This argument has the same general characteristics as fixed , but there can be no left side to the ∼ operator. Variance structures imposed on random terms are specified using special model functions. |
sparse |
A formula object, specifying the fixed effects for which the full variance-covariance matrix is not required. This argument has the same general characteristics as fixed , but there can be no left side to the ∼ expression. Wald statistics are not available for sparse fixed terms in order to reduce the computing load. |
residual |
A formula object specifying the residual model; any term specified on the left of the ∼ expression is ignored. The default is ∼units , where the reserved word units is defined as seq(1,nrow(data)) and is automatically included in the model frame. Variance models for the residual component of the model can be specified using special model functions.For single-section univariate models, the residual variance model determines the computational mode: If the residual variance model specifies a correlation structure (includes id ()), then the model is fitted on the gamma scale, otherwise the model is fitted on the sigma scale. The default is id(units) if not explicitly specified. |
G.param |
Either,
On termination, G.param is updated with the final |
R.param |
Either,
On termination, |
data |
A data frame in which to interpret the variables named in fixed , random , sparse and residual . If the data argument is missing, the default is sys.parent() . The data frame is converted internally to a data.table object and returned as such by model.frame . |
na.action |
A call to na.method() specifying the action to be taken when missing values are encountered in the response (y) or explanatory variables (x). The function definition for na.method is:function(y=c("include","omit","fail"), x=c("fail","include","omit")). The default action is to include (and estimate) missing values in the response, and raise an error if there are missing values in the explanatory variables. |
subset |
A logical vector identifying which subset of the rows of data should be used in the fit. All observations are included by default. |
weights |
A character string or name identifying the column of data to use as weights in the fit. |
predict |
A list object specifying the classifying factors and related options when forming predictions from the model. This list would normally be the value returned by a call to the method predict for asreml objects. |
vcm |
A matrix defining relationships among variance parameters. The matrix has a row for each original variance parameter and a column for each new parameter. The default is the identity matrix, that is, no action. See vcm.lm for further information and an example. |
vcc |
Equality constraints between variance parameters; a two-column numeric matrix with a dimnames attribute. The first column defines the grouping structure of equated components, that is, components within an equality group are given the same numeric index, and the second column contains the scaling coefficients.The dimnames()[[1]] attribute must match the component names in the asreml parameter vector; see start.values .The parameters are scaled relative to the first parameter in its group, so the scaling of the first parameter in each group is one. For example, the following vcc matrix
1 1 is equivalent to the 1 0 0 |
family |
A list of functions and expressions for defining the link and variance functions. Optionally a list of such structures for a multivariate analysis involving non-normal variates. Currently this is restricted to a bivariate model where the first variate (excluding the multinomial distribution) is non-normal. Supported families are gaussian, inverse Gaussian, binomial, negative binomial, poisson, Gamma and multinomial. Family objects are generated from the asreml family functions which prefix the usual function names with “asr_ “;for exampleasr_gaussian(), asr_binomial() etc. In addition to thelink argument, these functions take an additional dispersion argument and a total argument where relevent; for example:asr_binomial(dispersion=1.0, total=counts) .The default for asr_gaussian() is dispersion =NA, which implies that asreml will estimate the dispersion parameter, otherwise the scale is fixed at the nominated value. |
asmv |
A character string or name specifying the column in the data that identifies the traits in a multivariate analysis. If not NULL , asmv implies that the data for a multivariate analysis is set up as though it were for a univariate analysis with the response in a single variate. |
mbf |
A named list specifying sets of covariates to be included with one or more mbf() model functions. Each component of the list must in turn contain components named key and cov , where cov is a character string naming the data frame holding the covariates, and key is a character vector of length 2 naming the columns in data and cov , respectively, used to match corresponding records in the two data frames. The default is an empty list. |
group |
A named list where each component is a numeric vector specifying contiguous fields in data that are to be considered as a single term. The component names can then appear in asreml model formulae using the grp() special function. The default is an empty list. |
equate.levels |
A character vector of factor names whose levels are to be equated. If factor A has levels a,b,c,d and factor B has levels a,b,c,e, the effect of equate.levels(A, B) is that both A and B have 5 levels, with as.numeric(A) = 1,2,3,4 and as.numeric(B ) = 1,2,3,5. This may be necessary if using the and model function to overlay columns of the model’s design matrix in forming a compound term. The default is a zero length character vector. |
start.values |
If TRUE , asreml exits prior to the fitting process and returns a list of length 3: the G.param and R.param lists, and a data frame containing variance parameter names, initial values and boundary constraints. Initial values or constraints can then be set in the list or data frame objects.If a character string, then a file of that name is created and the data frame object containing initial parameter values is written out in comma separated form. This file can be edited externally and subsequently specified in the G.param or R.param arguments. |
knot.points |
A named list where each component is a vector of user supplied knot points for a particular spline term; the component name is the object of the spl() model function. |
pwr.points |
A named list with each component containing a vector of distances to be used in a one-dimensional power model. The component names must correspond to the object arguments of the power function model terms. |
wald |
A named list with four components: denDF, ssType, Ftest and kenadj .denDF a character string from the set [ "none", "numeric", "algebraic", "default" ] specifying the calculation of approximate denominator degrees of freedom. The default “none” is to suppress the computations.Algebraic computations are not feasible in large analyses, use “ default ” to automatically choose numeric or algebraic computations depending on problem size. The denominator degrees of freedom are calculated according to Kenward and Roger, 1997 for terms in the fixed model formula.ssType can be “ incremental ” for incremental sum of squares (the default) or “conditional ” for F-tests that respect both structural and intrinsic marginality.Ftest a formula object of the form ∼test_term | background_terms specifying a conditional Wald test of the contribution of test_term conditional on those fixed terms listed in background_terms , and the those in the random and sparse model formulae.kenadj a character string from the set [ "none", "expected", "observed ]”,specifying any adjustment to the variance matrix for fixed effects. The wald argument would typically be set by a call to the wald method, which returns more formal output. |
prune |
A named list with each component generated from a call to Subset(). prune , in conjunction with Subset and the model function sbs() , forms a new factor from an existing one by selecting a subset of its levels. The function Subset is defined as:function(f, x) where f is the name of an existing factor and x is a character or numeric vector of levels to select. The name of the list component is the new factor that may appear in the model formulae as the argument to the sbs() model function.For example, prune=list(A=Subset(Site, c(2,3))) creates a new factor A by selecting the second and third levels of Site , and would be included in the model as sbs(A ). While the actions of prune can be duplicated outside asreml , sbs() is necessary if the asreml method predict() is to be used. |
combine |
A named list with each component generated from a call to Levels(). combine , in conjunction with Levels and the model function gpf() forms a new factor from an existing one by merging a subset of its levels. The function Levels is defined as:function(f, x) where f is the name of an existing factor and x is a vector of lengthlength(levels(f)) defining the levels of f to merge. The name of thelist component is the new factor that may appear in the model formulae as the argument to the gpf() model function. For example, if Site has levels “1 “, “2 ” and “3 “,combine=list(A=Levels(Site, c("1","2","1"))) creates a new factor A with levels “ 1 ” and “2 ” by merging levels “1 ” and “3 ” of Site , and would be included in the model as gpf(A) . While the actions of combine can be duplicated outside asreml , gpf() is necessary if the asreml method predict() is to be used. |
uid |
A named list with each component generated from a call to Units(). uid , in conjunction with Units and the model function uni() forms a new factor by selecting a subset of records for an existing one. The function Units is defined as:function(f, n=0) where f is the name of an existing factor and n is a character or numeric scalar that determines which records are selected. The default, n =0, forms a factor with a level for each record where f is non-zero (strictly, f != 0). Otherwise, a factor with a level for each record in data where f has the value n is formed.For example, uid = list(A = Units(group, 1)) creates a new factor A with levels from row.names(data) where group = 1, and would be included in the model as uni(A) . While the actions of uid can be duplicated outside asreml , uni() is necessary if the asreml method predict() is to be used. |
mef |
A named list linking a relationship matrix (or its inverse) as specified in the vm() special function with the original matrix of subject x regressor (typically molecular marker) scores. If not an empty list (the default), mef flags the computation of the regressor (marker) effects from the subject effects. For example,mef=list(MM = snp.mat) links the relationship matrix MM to the original marker scores in snp.mat . The mef list would typically be set from a call to the asreml meff() method. |
last |
A named list restricting the order equations are solved in the sparse partition for the nominated model terms. Each component of the list is named by a model term and contains a scalar n specifying that the first n levels of the term be solved after all others in the sparse set. It is intended for use when there are multiple fixed terms in the sparse equations so that asreml will be consistent in which effects are identified as singular. A maximum of three factor/level pairs can be specified. |
model.frame | If TRUE (the default) the model frame (a data.table object with additional attributes derived from the model specification) is included in the returned object.The model frame is required by the asreml summary, plot, resid and fitted methods.In large analyses, the model frame is likely to be a large object. If model.frame is a character string, the model frame is saved in a file as an RDS object by a call to saveRDS() , and named by the supplied string with the extension .RDS . If the model frame is not included in the returned asreml object, this RDS file is searched for by the methods noted above. |
… | Additional arguments to asreml from asreml.options(): maxit, workspace, pworkspace, fixgammas, trace, aom . |
Details
Models for asreml
are specified symbolically in the formula objects fixed
, random
, sparse
and residual
. A typical model has the form response ∼ terms, fixed
only, or ∼ terms
for random
, sparse
and residual
, where response
is the (usually numeric) response vector and terms
is a linear predictor for response
. An exception is raised if the response is a factor and family
is not multinomial.
The formulae objects are parsed in the context of the data frame, all internal data structures are constructed in R or compiled code, and the model is fitted by calls to the underlying Fortran REML
routines (Gilmour et al., 1995). Variance models for random model terms are specified using special functions in the random
and residual
formulae. If not specified, the variance models default to (scaled) identity structures. A table of special model functions is included below; see the reference guide or appropriate vignette for further details and examples of their use. Some of these model functions require the formula arguments to be partially evaluated before the final model frame is computed; it is recommended that all names used in the formulae be resolvable in a data frame named by the data argument.
If the response is a matrix, a multivariate linear model is fitted to the columns unless
family = asr_multinomial()
is declared.
The terms in the fixed
formula are re-ordered by default so that main effects precede interactions in increasing order. The option keep.order
(see asreml.options
) can be used to modify this behaviour.
A formula has an implied intercept term. To remove the intercept use y ∼ -1 +
…. This is only effective in the fixed
formula; in all other formula arguments any reference to the intercept is ignored. Note that currently there must be at least one fixed effect in the model.
In addition to the formal arguments, various options can be set with asreml.options; these are stored in an environment for the duration of the R
session.
asreml
uses either a “gamma
” (ratio) or “sigma
” (component) scale parameterization for estimation depending on the residual model specification. The current default for single section analyses is the gamma parameterization if the error model specifies a correlation structure. In this case, all scale parameters are estimated as a ratio with respect to the residual variance, with correlation parameters unchanged. If the residual model specifies a variance structure then variance parameters are estimated on the sigma scale. For models with more than one residual section, asreml
always estimates variance parameters on the sigma scale.
Value
An object of class asreml
containing the results of the fitted model. Instances of generic methods such as plot()
, predict()
and summary()
return various derived results of the fit; resid(), coef()
and fitted()
extract some of its components. See asreml.object for the components of the returned list.
Special functions
Special model functions are used in asreml
formulae objects to create new or modify existing model terms, or more often to specify the variance model associated with one or more terms. These functions can be broadly categorised as constructor-type functions, or (default) identity, time-series, general-structure, 1d metric, 2d metric, known relationship, general variance structures that span more than one term (str), or user-defined structures.
The special model functions that are available in asreml
model formulae are introduced below; see the user guide, relevent vignette or the man pages for selected functions for more details or illustration.
The symbols used in the following tables are defined as:
obj a factor in data.
n length(levels(obj)).
p number of parameters estimated by the base function.
v number of parameters estimated by the homogeneous function form.
h number of parameters estimated by the heterogeneous function form.
Constructor type functions
Call | Description |
con(obj) |
Apply sum-to-zero constraints to factor obj . |
C(obj, contr) |
Define contrasts among the levels of obj from the coefficients in contr . |
lin(obj) |
Fits factor obj as a variate. |
pow(obj, p, offset) |
Create the term (offset+obj)p . |
pol(x, t) |
Orthogonal polynomials to degree t; –t omits the intercept polynomial. |
leg(x, t) |
Legendre polynomials to degree t; –t omits the intercept polynomial. |
spl(x, k) |
The random component of a cubic spline; optionally k knot points. |
dev(x) |
Fit variate x as a factor; typically used for spline deviations. |
ma(obj) |
Forms a moving average (1) design matrix from factor obj . |
at(obj, vec) |
Form conditioning covariables for the levels in obj given in vec . |
dsum(~term | obj, ~model) |
Direct sum of term for the levels of obj with variance structure model . Used in residual to define multiple sections. |
and(obj, k) |
Add k times the design matrix for obj to the previous columns. |
grp(name) |
Include the term defined by name in the group argument in the model. |
mbf(name) |
Include the covariates defined by name in the mbf argument as a factor. |
sbs(name) |
Include the term defined in the prune argument in the model. |
gpf(name) |
Include the term defined in the combine argument in the model. |
uni(name) |
Include the term defined in the uid argument in the model. |
Default identity
Call | Description | p | v | h |
id(obj) |
identity | 0 | 1 | n |
Time series type models
Call | Description | p | v | h |
ar1(obj) |
autoregressive order 1 | 1 | 2 | 1+n |
ar2(obj) |
autoregressive order 2 | 2 | 3 | 2+n |
ar3(obj) |
autoregressive order 3 | 3 | 4 | 3+n |
sar(obj) |
symmetric autoregressive | 1 | 2 | 1+n |
sar2(obj) |
symmetric autoregressive order 2 | 2 | 3 | 2+n |
ma1(obj) |
moving average order 1 | 1 | 2 | 1+n |
ma2(obj) |
moving average order 2 | 2 | 3 | 2+n |
arma(obj) |
autoregressive-moving average | 2 | 3 | 2+n |
Default identity
Call | Description | p | v | h |
cor(obj) |
simple correlation | 1 | 2 | 1+n |
corb(obj, b) |
banded correlation; b bands | b | b+1 | b+n |
corg(obj) |
general correlation | n(n-1)/2 | 1+n(n-1)/2 | n(n+1)/2 |
diag(obj) |
heterogeneous variance | n | ||
us(obj) |
unstructured variance | n(n+1)/2 | ||
sfa(obj, k) |
factor analytic; k factors | kn+n | ||
fa(obj, k) |
sparse factor analytic | kn+n | ||
facv(obj, k) |
factor analytic, covariance form | kn+n | ||
rr(obj, k) |
reduced rank variant of fa | kn+n | ||
chol(obj, k) |
Cholesky order k | (k+1)(n-k/2) | ||
cholc(obj, k) |
Cholesky | (k+1)(n-k/2) | ||
ante(obj, k) |
antedependence order k | (k+1)(n-k/2) | ||
mthr(obj) |
multinomial models only |
Metric based models in 1D or 2D
Call | Description | p | v | h |
exp(x) |
exponential 1D | 1 | 2 | 1+n |
iexp(x, y) |
isotropic exponential 2D | 1 | 2 | 1+n |
aexp(x, y) |
anisotropic exponential 2D | 2 | 3 | 2+n |
gau(x) |
gaussian 1D | 1 | 1+n | |
igau(x, y) |
isotropic gaussian 2D | 1 | 1+n | |
agau(x, y) |
anisotropic gaussian 2D | 2 | 2+n | |
ieuc(x, y) |
isotropic euclidean 2D | 1 | 1+n | |
isp(x, y) |
isotropic spherical 2D | 1 | 1+n | |
cir(x, y) |
isotropic circular 2D | 1 | 1+n | |
mtrn(x, y, ...) |
Matern class 2D | * | * | * |
∗ See the user guide for an extended discussion of the Matern class.
Known relationship structures
Call | Description |
vm(obj, source, singG) |
Create a term based on obj with known variance structure in source . |
ide(obj) |
Identity term based on obj with levels as for vm(obj) . |
General variance structures
Call | Description |
str(~terms, ~model) |
Apply the direct product variance structure in ~model to the set of terms in ~terms . |
User defined structures
Call | Description |
own(obj, fun, init, type) |
Call fun with the parameter estimates in init to compute the variance matrix and its derivatives. |
References
Gilmour AR, Thompson R and Cullis BR (1995). “AI, An Efficient Algorithm for REML Estimation in Linear Mixed Models.” Biometrics, 51, pp. 1440-1450.
Kenward MG and Roger JH (1997). “Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood.” Biometrics, 53, pp. 983-997.
See Also
asreml.options
asreml.object
asr_families
Examples
## Not run: data(oats) oats.asr <- asreml(yield ~ Variety*Nitrogen, random = ~ Blocks/Wplots, data=oats) ## End(Not run)