Covariance structures

When fitting simple models (as in many examples of univariate analysis one needs to specify only the model equation (the bit that looks like y ~ mu...) but nothing about the covariances that complete the model specification. This is because ASReml assumes that, in absence of any additional information, the covariance structure is the product of a scalar (a variance component) by a design matrix. For example, the residual covariance matrix in simple examples is R = I σ_e², or the additive genetic variance matrix is G = A σ_a² (where A is the numerator relationship matrix).

However, there are several situations when the analysis require a more complex covariance structure, usually a direct sum or direct product of two or more matrices. For example, an analysis of data from several sites might consider different error variances for each site, that is R = Σd R_i, where Σd represents a direct sum (see any matrix algebra book for an explanation) and R_i is the residual matrix for site i.

Other example of a more complex covariance structure is a multivariate analysis in one site, where both the residual and additive genetic covariance matrices are constructed as the product of two matrices. For example, R = I * R₀, where I is an identity matrix of size number of observations, * is the direct product operation (do not confuse with a plain matrix multiplication) and R₀ is the error covariance matrix for the traits involved in the analysis. Similarly, G = A * G₀ where all the matrices are as previously defined and G₀ is the additive covariance matrix for the traits.

You will see that the ASReml (and ASReml-R) notation for this type of analysis closely resembles matrix notation. ASReml supports a large number of covariance structures (and I will present only a few of them), which are particularly useful for longitudinal and spatial analysis. The structures are easier to understand (at least for me) if we express a covariance matrix (M) as the product of a correlation matrix (C) pre- and postmultiplied by a diagonal matrix (D) containing standard deviations for each of the traits. That is:

     | v11 c12 c13 c14 |         |  1  r12 r13 r14 |
     | c21 v22 c23 c24 |         | r21  1  r23 r24 |
 M = | c31 c32 v33 c34 |     C = | r31 r32  1  r34 |
     | c41 c42 c43 v44 |         | r41 r42 r43  1  |


     | s11  0   0   0  |
     |  0  s22  0   0  |
 D = |  0   0  s33  0  |
     |  0   0   0  s44 |
 
 M = D*C*D

where the v are variances, the r correlations and the s standard deviations.

If we do not impose any restriction on M, apart from being positive (p.d.) definite, we are talking about an unstructured matrix (US in ASReml parlance). Thus, M or C can take any value (as long as it is p.d.) as is usual when analyzing multiple trait problems.

There are cases when the order of assessment or the spatial location of the experimental units create patterns of variation, which are reflected by the covariance matrix. For example, the breeding value of an individual i observed at time j (a_ij) is a function of genes involved in expression at time j – k (a_ij-k), plus the effect of genes acting in the new measurement (α_j), which are considered independent of the past measurement a_ij = ρ_jk a_ij-k + α_j, where ρ_jk is the additive genetic correlation between measures j and k.

Rather than using a different correlation for each pair of ages, it is possible to postulate mechanisms which model the correlations. For example, an autoregressive model (AR in ASReml lingo), where the correlation between measurements j and k is r^|j-k|. In this model M = D * C_AR* D, where C_AR (for equally spaced assessments) is:

        |  1  r^1 r^2 r^3 |
        | r^1  1  r^1 r^2 |
 C_AR = | r^2 r^1  1  r^1 |
        | r^3 r^2 r^1  1  |

A model including this structure will certainly be more parsimonious (economic on terms of number of parameters) than one using an unstructured approach. Looking at the previous pattern it is a lot easier to understand why they are called ‘structures’. A similar situation is considered in spatial analysis, where the ‘independent errors’ assumption of typical analyses is relaxed. A common spatial model will consider the presence of autocorrelated residuals in both directions (rows and columns). Here the level of autocorrelation will depend on distance between trees rather than on time.

Another structure, based on random regressions, is explained in the longitudinal analysis section of the cookbook. ASReml allows fitting many more different structures, so see variance model specification in the manual for more details.

Updated on September 22, 2021

Was this article helpful?

Yes No