The GWAS assumption is \(\boldsymbol{Y} = \boldsymbol{X}\beta + \boldsymbol{\epsilon}\)
Example with n=4 \[\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}\cdot \beta + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \end{bmatrix}\]
\[\boldsymbol{\epsilon_i} \sim N(0, \sigma^2)\]
\[\begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \end{bmatrix} \sim N \left(\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} + \sigma_\epsilon^2\cdot \begin{bmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ 0&0&0&1 \end{bmatrix}\right) \]
We estimate \(\beta\) using (typically) linear regression. In fact, the estimated \(\hat{\beta}\) is an MLE (maximum likelihood estimate) from linear regression.
Using a random effects, we cann account for population structure and relatedness.
\[\boldsymbol{Y} = \boldsymbol{X}\cdot\beta + u + \boldsymbol{\epsilon}\]
\[\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}\cdot \beta + \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \end{bmatrix}\]
In contrast to \(\beta\) which is a fixed effect (not random), \(u\) is a random effect.
We describe random effects by their distribution, i.e. the parameters of the distribution of the r.v. It’s common to use a normal distribution for that \[u_i \sim N(0, \sigma^2_g)\]
The full vector of random effects (one per individual) in n=4 example is:
\[\begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} \sim N \left( \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4\\ \end{bmatrix}, \sigma^2_g\cdot \begin{bmatrix} k_{11}&k_{12}&k_{13}&k_{14}\\ k_{21}&k_{22}&k_{23}&k_{24}\\ k_{31}&k_{32}&k_{33}&k_{34} \\ k_{41}&k_{42}&k_{43}&k_{44} \\ \end{bmatrix} \right) \]
The authors of EMMAX proposed that if we use the genetic relatedness matrix, then this model can account for population structure and relatedness. Let’s look at a very simple example where the population structure is given by a random effect that depends only on population membership. \[u = \begin{bmatrix} u_\text{AFR} \\ u_\text{AFR} \\ u_\text{EUR} \\ u_\text{EUR} \end{bmatrix} \] \(u_\text{AFR} \sim N(0, \sigma^2_g)\), \(u_\text{EUR} \sim N(0, \sigma^2_g)\), and \(u_\text{AFR} \bot u_\text{EUR}\)
We assume that the first two individuals have AFR ancestry and that the last two people have EUR ancestry; we use \(u_\text{AFR}\) to represent AFR specific random effect and \(u_\text{EUR}\) for the EUR specific random effect. Let’s assume that both have the same variance, \(\sigma^2_g\) and that they are independent of each other. Therefore:
\[E~ u_\text{AFR} = E ~ u_\text{EUR} = 0\] \[E~ u_\text{AFR}^2 = E ~ u_\text{EUR}^2 = \sigma^2_g\] \[E~ u_\text{AFR} \cdot u_\text{EUR} = 0\]
\[\boldsymbol{Y} = \boldsymbol{X}\cdot\beta + u + \boldsymbol{\epsilon}\]
\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}\cdot \beta + \begin{bmatrix} u_\text{AFR} \\ u_\text{AFR} \\ u_\text{EUR} \\ u_\text{EUR} \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \end{bmatrix}\] Where \(\beta\) is a fixed effect
\[ u \sim N \left (\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \sigma^2_g\cdot K \right) \] \[ \epsilon \sim N \left (\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \sigma^2_e\cdot\begin{bmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ 0&0&0&1 \end{bmatrix} \right) \] Let’s calculate \(K\) now. \(K\) is a similarity matrix and is sometimes called a kernel.
Calculate \(K = \sigma^2_g \cdot \text{Var}(u)\) \[ \text{Var} \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} = Euu' = E \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} \cdot \begin{bmatrix} u_1 & u_2 & u_3 & u_4 \end{bmatrix} \] Using that \(Var(u) = E(u-Eu)(u-Eu)' = Euu'\) Since \(u\) has mean 0: \[Eu = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}\]
\[ \sigma^2_g \cdot \mathbf{K}= E\cdot \begin{bmatrix} u_1 u_1 & u_1 u_2 & u_1 u_3 & u_1 u_4 \\ u_2 u_1 & u_2 u_2 & u_2 u_3 & u_2 u_4 \\ u_3 u_1 & u_ 3 u_2 & u_3 u_3 & u_3 u_4\\ u_4 u_1 & u_4 u_2 & u_4 u_3 & u_4 u_4 \end{bmatrix} = \begin{bmatrix} E~u_1 u_1 & E~u_1 u_2 & E~u_1 u_3 & E~u_1 u_4 \\ E~u_2 u_1 & E~u_2 u_2 & E~u_2 u_3 & E~u_2 u_4 \\ E~u_3 u_1 & E~u_ 3 u_2 & E~u_3 u_3 & E~u_3 u_4\\ E~u_4 u_1 & E~u_4 u_2 & E~u_4 u_3 & E~u_4 u_4 \end{bmatrix} \] \[ \sigma^2_{g}\cdot \mathbf{K} = \sigma^2_g\cdot\begin{bmatrix} 1&1&0&0 \\ 1&1&0&0 \\ 0&0&1&1 \\ 0&0&1&1 \ \end{bmatrix} \]