The GWAS assumption is \(\boldsymbol{Y} = \boldsymbol{X}\beta + \boldsymbol{\epsilon}\)

Example with n=4 \[\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}\cdot \beta + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \end{bmatrix}\]

\[\boldsymbol{\epsilon_i} \sim N(0, \sigma^2)\]

\[\begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \end{bmatrix} \sim N \left(\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} + \sigma_\epsilon^2\cdot \begin{bmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ 0&0&0&1 \end{bmatrix}\right) \]

We estimate \(\beta\) using (typically) linear regression. In fact, the estimated \(\hat{\beta}\) is an MLE (maximum likelihood estimate) from linear regression.

Using a random effects, we cann account for population structure and relatedness.

\[\boldsymbol{Y} = \boldsymbol{X}\cdot\beta + u + \boldsymbol{\epsilon}\]

\[\begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}\cdot \beta + \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \end{bmatrix}\]

In contrast to \(\beta\) which is a fixed effect (not random), \(u\) is a random effect.

We describe random effects by their distribution, i.e. the parameters of the distribution of the r.v. It’s common to use a normal distribution for that \[u_i \sim N(0, \sigma^2_g)\]

The full vector of random effects (one per individual) in n=4 example is:

\[\begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} \sim N \left( \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4\\ \end{bmatrix}, \sigma^2_g\cdot \begin{bmatrix} k_{11}&k_{12}&k_{13}&k_{14}\\ k_{21}&k_{22}&k_{23}&k_{24}\\ k_{31}&k_{32}&k_{33}&k_{34} \\ k_{41}&k_{42}&k_{43}&k_{44} \\ \end{bmatrix} \right) \]

The authors of EMMAX proposed that if we use the genetic relatedness matrix, then this model can account for population structure and relatedness. Let’s look at a very simple example where the population structure is given by a random effect that depends only on population membership. \[u = \begin{bmatrix} u_\text{AFR} \\ u_\text{AFR} \\ u_\text{EUR} \\ u_\text{EUR} \end{bmatrix} \] \(u_\text{AFR} \sim N(0, \sigma^2_g)\), \(u_\text{EUR} \sim N(0, \sigma^2_g)\), and \(u_\text{AFR} \bot u_\text{EUR}\)

We assume that the first two individuals have AFR ancestry and that the last two people have EUR ancestry; we use \(u_\text{AFR}\) to represent AFR specific random effect and \(u_\text{EUR}\) for the EUR specific random effect. Let’s assume that both have the same variance, \(\sigma^2_g\) and that they are independent of each other. Therefore:

\[E~ u_\text{AFR} = E ~ u_\text{EUR} = 0\] \[E~ u_\text{AFR}^2 = E ~ u_\text{EUR}^2 = \sigma^2_g\] \[E~ u_\text{AFR} \cdot u_\text{EUR} = 0\]

\[\boldsymbol{Y} = \boldsymbol{X}\cdot\beta + u + \boldsymbol{\epsilon}\]

\[ \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix}\cdot \beta + \begin{bmatrix} u_\text{AFR} \\ u_\text{AFR} \\ u_\text{EUR} \\ u_\text{EUR} \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \end{bmatrix}\] Where \(\beta\) is a fixed effect

\[ u \sim N \left (\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \sigma^2_g\cdot K \right) \] \[ \epsilon \sim N \left (\begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \sigma^2_e\cdot\begin{bmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ 0&0&0&1 \end{bmatrix} \right) \] Let’s calculate \(K\) now. \(K\) is a similarity matrix and is sometimes called a kernel.

Calculate \(K = \sigma^2_g \cdot \text{Var}(u)\) \[ \text{Var} \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} = Euu' = E \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix} \cdot \begin{bmatrix} u_1 & u_2 & u_3 & u_4 \end{bmatrix} \] Using that \(Var(u) = E(u-Eu)(u-Eu)' = Euu'\) Since \(u\) has mean 0: \[Eu = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}\]

\[ \sigma^2_g \cdot \mathbf{K}= E\cdot \begin{bmatrix} u_1 u_1 & u_1 u_2 & u_1 u_3 & u_1 u_4 \\ u_2 u_1 & u_2 u_2 & u_2 u_3 & u_2 u_4 \\ u_3 u_1 & u_ 3 u_2 & u_3 u_3 & u_3 u_4\\ u_4 u_1 & u_4 u_2 & u_4 u_3 & u_4 u_4 \end{bmatrix} = \begin{bmatrix} E~u_1 u_1 & E~u_1 u_2 & E~u_1 u_3 & E~u_1 u_4 \\ E~u_2 u_1 & E~u_2 u_2 & E~u_2 u_3 & E~u_2 u_4 \\ E~u_3 u_1 & E~u_ 3 u_2 & E~u_3 u_3 & E~u_3 u_4\\ E~u_4 u_1 & E~u_4 u_2 & E~u_4 u_3 & E~u_4 u_4 \end{bmatrix} \] \[ \sigma^2_{g}\cdot \mathbf{K} = \sigma^2_g\cdot\begin{bmatrix} 1&1&0&0 \\ 1&1&0&0 \\ 0&0&1&1 \\ 0&0&1&1 \ \end{bmatrix} \]

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT.

Suggest changes

If you find any mistakes (including typos) or want to suggest changes, please feel free to edit the source file of this page on Github and create a pull request.

Citation

For attribution, please cite this work as

Haky Im (2021). Mixed Effects Model to Handle Population Structure. HGEN 471 Class Notes. /post/2021/02/11/mixed-effects-model-to-handle-population-structure/

BibTeX citation

@misc{
  title = "Mixed Effects Model to Handle Population Structure",
  author = "Haky Im",
  year = "2021",
  journal = "HGEN 471 Class Notes",
  note = "/post/2021/02/11/mixed-effects-model-to-handle-population-structure/"
}