Q1 Visualize Family Structure in Genetic Relatedness Matrix

The genetic relatedness matrix (which can be used to calculate heritability and also to adjust for relatedness and population structure) encodes the structure of the data. To get an intuitive sense of how that works, do the following.

  1. (10 points) Calculate GRM of hapmap_chr22 data

  2. (10 points) Select a couple of CEU families and a couple of YRI families. Plot (tile plot)

  3. (10 points) What structure do you see? Do you see family structure?

Feel free to reuse code here https://hakyimlab.github.io/hgen471/L8-GRM.html

Q2 Plot the \(\chi^2\) statistic vs. LD score

To get an intuitive sense of how LD-score regression works, download the pre-calculated LD score on chromosome 22 and summary statistics for a height GWAS from here and do the following

  1. (10 points) Plot the histogram of LD-score

  2. (10 points) Plot \(\chi^2\) statistic vs. LD-score

  3. (10 points) Regress \(\chi^2\) on LD-score

Feel free to reuse code https://hakyimlab.github.io/hgen471/L8-LD-score.html and complete the followings.

Q3 Calculate heritability and genetic correlation with LD score regression

This question is optional, we will get back to this after lab 8 on stratified LDSC

In this exercise, you will use LD score regression method to calculate the chip heritability of two GWAS phenotypes from the UK Biobank data and look for evidence of population stratification.

(20 points) Install the LDSC regression software

To install LDSC regression software, go to GitHub repository (https://github.com/bulik/ldsc) and follow the installation instructions in (https://github.com/bulik/ldsc#getting-started). The installation requires conda being pre-installed. conda is package manager for command line tool, Python, and R. We highly encourage you to try conda if you haven’t done so. If you have trouble installing the software, please consult with your TA and/or instructors.

Go through the LDSC documentation

To get started on using the software for this problem, follow the tutorial here to get the reference files needed (https://github.com/bulik/ldsc/wiki/Heritability-and-Genetic-Correlation).

a. (10 points) LD score formula.

Write down the relationship between the expected value of the \(\chi^2\) statistic and heritability, number of markers, sample size, population stratification effect

b. Pick and download two phenotypes

from (https://nealelab.github.io/UKBB_ldsc/downloads.html), you will probably want to download the ones that are already in LDSC format.

c. (10 points) Calculate heritability using LDSC

What is the intercept for each trait? How do you interpret the values?

d. (10 points) Calculate genetic correlation between the two UKB traits.

Did you expect the results you got? Comment.

Hint: take a look this old solution in a different platform here(https://bios25328.hakyimlab.org/post/2021/04/15/homework-ldsc-regression/). If you use the pre-formatted summary statistics, you may not need to run the munging script.

Data: You can download the following files

  • chr22.l2.ldscore.gz,
  • gwas_giant_chr22.txt,
  • hapmap_chr22.bed,
  • hapmap_chr22.bim,
  • hapmap_chr22.fam

from https://uchicago.box.com/s/iqxg6yo7pi50hyudnfcv2xnhtfp6euv8

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The source code is licensed under MIT.

Suggest changes

If you find any mistakes (including typos) or want to suggest changes, please feel free to edit the source file of this page on Github and create a pull request.

Citation

For attribution, please cite this work as

Haky Im (2022). Homework 5. HGEN 471 Class Notes. /post/2022/02/09/homework-5/

BibTeX citation

@misc{
  title = "Homework 5",
  author = "Haky Im",
  year = "2022",
  journal = "HGEN 471 Class Notes",
  note = "/post/2022/02/09/homework-5/"
}