CoMM

CoMM: a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information

Introduction

This vignette provides an introduction to the CoMM package. R package CoMM implements CoMM, a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information. The package can be installed with the command:

library(devtools)

install_github("gordonliu810822/CoMM")

The package can be loaded with the command:

library("CoMM")

Fit CoMM using simulated data

We first generate genotype data using function genRawGeno:

library(mvtnorm) L = 1; M = 100; rho =0.5 n1 = 350; n2 = 5000; maf = runif(M,0.05,0.5) X = genRawGeno(maf, L, M, rho, n1 + n2);

Then, effect sizes are generated from standard Gaussian distribution with sparse structure:

beta_prop = 0.2; b = numeric(M); m = M * beta_prop; b[sample(M,m)] = rnorm(m);

Subsequently, the gene expression y is generated by controlling cellular heritability at prespecified level (h2y):

h2y = 0.05; b0 = 6; y0 <- X%*%b + b0; y <- y0 + (as.vector(var(y0)*(1-h2y)/h2y))^0.5*rnorm(n1+n2);

Finally, the phenotype data is generated as the generative model of CoMM with a prespecified trait heritability (h2) as:

h2 = 0.001; y1 <- y[1:n1] X1 <- X[1:n1,] y2 <- y0[(n1+1):(n1+n2)] X2 <- X[(n1+1):(n1+n2),] alpha0 <- 3 alpha <- 0.3 sz2 <- var(y2*alpha) * ((1-h2)/h2) z <- alpha0 + y2*alpha + rnorm(n2,0,sqrt(sz2))

The genotype data X1 and X2 are normalized as

y = y1; mean.x1 = apply(X1,2,mean); x1m = sweep(X1,2,mean.x1); std.x1 = apply(x1m,2,sd) x1p = sweep(x1m,2,std.x1,"/"); x1p = x1p/sqrt(dim(x1p)[2]) mean.x2 = apply(X2,2,mean); x2m = sweep(X2,2,mean.x2); std.x2 = apply(x2m,2,sd) x2p = sweep(x2m,2,std.x2,"/"); x2p = x2p/sqrt(dim(x2p)[2]) w2 = matrix(rep(1,n2),ncol=1); w1 = matrix(rep(1,n1),ncol=1);

Initilize the parameters by using linear mixed model (function lmm_pxem, LMM implemented using PX-EM algorithm):

fm0 = lmm_pxem(y, w1,x1p, 100) sigma2beta =fm0$sigma2beta; sigma2y =fm0$sigma2y; beta0 = fm0$beta0;

Fit CoMM w/ and w/o constraint that alpha = 0 as

fmHa = CoMM_covar_pxem(y, z, x1p, x2p, w1, w2,constr = 0); fmH0 = CoMM_covar_pxem(y, z, x1p, x2p, w1, w2,constr = 1); loglikHa = max(fmHa$loglik,na.rm=T) loglikH0 = max(fmH0$loglik,na.rm=T) tstat = 2 * (loglikHa - loglikH0); pval = pchisq(tstat,1,lower.tail=F) alpha_hat = fmHa$alpha

Fit CoMM using GWAS and eQTL data

The example of running CoMM using GWAS and eQTL data in plink binary format

file1 = "1000G.EUR.QC.1"; file2 = "NFBC_filter_mph10"; file3 = "Geuvadis_gene_expression_qn.txt"; file4 = ""; file5 = "pc5_NFBC_filter_mph10.txt"; whichPheno = 1; bw = 500000;

Here, file1 is the prefix for eQTL genotype data in plink binary format, file2 is the GWAS data in plink binary format, file3 is the gene expression file with extended name, file4 and file5 are covariates file for eQTL and GWAS data, respectively. Then run fm = CoMM_testing_run(file1,file2,file3, file4,file5, whichPheno, bw);. For gene expresion file, it must have the following format (rows for genes and columns for individuais):

lower	up	genetype1	genetype2	TargetID	Chr	HG00105	HG00115
59783540	59843484	lincRNA	PART1	ENSG00000152931.6	5	0.5126086	0.7089508
48128225	48148330	protein_coding	UPP1	ENSG00000183696.9	7	1.4118007	-0.0135644
57846106	57853063	protein_coding	INHBE	ENSG00000139269.2	12	0.5755268	-1.0162217
116054583	116164515	protein_coding	AFAP1L2	ENSG00000169129.8	10	1.1117776	0.0407033
22157909	22396763	protein_coding	RAPGEF5	ENSG00000136237.12	7	0.2831573	-0.1772559
11700964	11743303	lincRNA	RP11-434C1.1	ENSG00000247157.2	12	0.2550282	-0.2831573

To make ‘CoMM’ further speeding, we implement multiple thread version of ‘CoMM’ by just run fm = CoMM_testing_run_mt(file1,file2,file3, file4,file5, whichPheno, bw, coreNum); where coreNum = 24 is the number of cores in your CPU.

Figures

The following data and codes are used to produce one of the figures in the Yang et al. (2018).

Liu Lab

CoMM

CoMM: a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information

Jin Liu

2018-05-10

Introduction

Fit CoMM using simulated data

Fit CoMM using GWAS and eQTL data

Figures

Sidebar