Learning Objectives
- Recall that LDSC can be used to estimate heritability and genetic correlation
- Revisit mixed effects modeling using random SNP effects
- Describe prediction methods
- PRS
- elastic net
- ridge regression
- lasso
- Discuss the clinical utility and challenges of PRS
Material for the lecture
Here are the slides for today download link
Take home messages
LD Score regression (LDSD)
can be used to calculate chip heritability, genetic correlation between traits, and to attribute the inflation of summary statistics from GWAS to either polygenicity (true causal effect) or confounding due to population structure or other confounders.
Mixed effects models
Unlike the simple GWAS where only the effect of one SNP is considered, mixed effects modeling allows for join modeling of all SNPs.
\[Y = \text{fixed effects} + \sum_j X_k \cdot \beta_k + \epsilon\] It is not possible to estimate the effect sizes of millions of SNPs using a much smaller number of observations. Recall that you need at least as many equations as unknowns in a traditional setting. If we assume that \(\beta_k\)’s are normally distributed with mean 0, then we only need to estimate one parameter: the variance of the \(\beta_k\), \(\sigma_\beta^2\). That’s a huge advantage of using mixed effects modeling.
Connection between EMMAX random effect and the aggregate effect of all SNPs on the trait
\[u = \sum_k \beta_k X_k\]
We demonstrated that the random effect proposed by EMMAX and \(\sum_k \beta_k X_k\) are the same. This fact can be shown by confirming that the variance of \(u\) is the same as the variance of \(\sum_k \beta_k X_k\) (\(\sigma^2 \cdot \mathbf{K}\) where $ $ is the genetic relatedness matrix \(\approx\) correlation matrix of the genotype data \(\mathbf{X}\cdot\mathbf{X}'/M\))
LOCO (Leave One Chromosome Out)
LOCO improves the power of EMMAX by removing the chromosome where the test SNP is in the calculation of the genetic relatedness matrix.
Biobank scale mixed effects methods
With ever increasing sample sizes, methods that can handle a Million individuals is needed. BOLT-LMM AND fastGWA do that by implementing a computationally more efficient versions of EMMAX. In the case of the UK Biobank, these methods allowed the inclusion of related individuals to the analysis, expanding the sample size by 50%.
Genetic prediction
Good predictors of phenotypes based on genotype are needed to make genetic data more useful in the clinic. Polygenic Risk Scores are built from GWAS summary results by adding the estimated effect sizes across a large number of SNPs, most of them not genome-wide significant.
Currently, the GWAS of many common diseases are yielding PRS that have enough predictive power to be potentially useful for stratifying patients into different prevention strategy groups.
PRS work reasonably well in the pouplation where the GWAS was performed (overwhelmingly European) but they fail to extrapolate to other ancestries. Transfer across population of PRS is an active area of research.
References
EMMAX
Kang et al (2010). Variance component model to account for sample structure in genome-wide association studies. Nature Genetics.
boltLMM
Loh et al (2018). Mixed-model association for biobank-scale datasets. Nature Genetics.
SAIGE
Jiang et al (2019). A resource-efficient tool for mixed model association analysis of large-scale data. Nature Genetics.
BSLMM
Zhou X, Carbonetto P, Stephens M (2013) Polygenic Modeling with Bayesian Sparse Linear Mixed Models. PLOS Genetics 9(2): e1003264. https://doi.org/10.1371/journal.pgen.1003264
PRS methods
- Zhu, X., & Stephens, M. (2017). Bayesian large-scale multiple regression with summary statistics from genome-wide association studies. AOAS
- Vilhjálmsson et al. (2015). Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. AJHG
- Mak,et al (2017). Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology, 41(6), 469–480.
- Luke R. Lloyd-Jones (2019). Improved polygenic prediction by Bayesian multiple regression on summary statistics. BioRxiv.
- Ge, T., Chen, CY., Ni, Y. et al. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776 (2019). https://doi.org/10.1038/s41467-019-09718-5
Clinical Utility of PRS
Khera et al (2018) Nature Genetics
Issues with transfer across ancestries
Martin, A.R., Kanai, M., Kamatani, Y. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 51, 584–591 (2019). https://doi.org/10.1038/s41588-019-0379-x