CoMM
CoMM: a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information
Jin Liu
2018-05-10
Introduction
This vignette provides an introduction to the CoMM
package. R package CoMM
implements CoMM, a collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information. The package can be installed with the command:
library(devtools)
install_github("gordonliu810822/CoMM")
The package can be loaded with the command:
library("CoMM")
Fit CoMM using simulated data
We first generate genotype data using function genRawGeno:
library(mvtnorm) L = 1; M = 100; rho =0.5 n1 = 350; n2 = 5000; maf = runif(M,0.05,0.5) X = genRawGeno(maf, L, M, rho, n1 + n2);
Then, effect sizes are generated from standard Gaussian distribution with sparse structure:
beta_prop = 0.2; b = numeric(M); m = M * beta_prop; b[sample(M,m)] = rnorm(m);
Subsequently, the gene expression y
is generated by controlling cellular heritability at prespecified level (h2y
):
h2y = 0.05; b0 = 6; y0 <- X%*%b + b0; y <- y0 + (as.vector(var(y0)*(1-h2y)/h2y))^0.5*rnorm(n1+n2);
Finally, the phenotype data is generated as the generative model of CoMM with a prespecified trait heritability (h2
) as:
h2 = 0.001; y1 <- y[1:n1] X1 <- X[1:n1,] y2 <- y0[(n1+1):(n1+n2)] X2 <- X[(n1+1):(n1+n2),] alpha0 <- 3 alpha <- 0.3 sz2 <- var(y2*alpha) * ((1-h2)/h2) z <- alpha0 + y2*alpha + rnorm(n2,0,sqrt(sz2))
The genotype data X1
and X2
are normalized as
y = y1; mean.x1 = apply(X1,2,mean); x1m = sweep(X1,2,mean.x1); std.x1 = apply(x1m,2,sd) x1p = sweep(x1m,2,std.x1,"/"); x1p = x1p/sqrt(dim(x1p)[2]) mean.x2 = apply(X2,2,mean); x2m = sweep(X2,2,mean.x2); std.x2 = apply(x2m,2,sd) x2p = sweep(x2m,2,std.x2,"/"); x2p = x2p/sqrt(dim(x2p)[2]) w2 = matrix(rep(1,n2),ncol=1); w1 = matrix(rep(1,n1),ncol=1);
Initilize the parameters by using linear mixed model (function lmm_pxem, LMM implemented using PX-EM algorithm):
fm0 = lmm_pxem(y, w1,x1p, 100) sigma2beta =fm0$sigma2beta; sigma2y =fm0$sigma2y; beta0 = fm0$beta0;
Fit CoMM w/ and w/o constraint that alpha = 0 as
fmHa = CoMM_covar_pxem(y, z, x1p, x2p, w1, w2,constr = 0); fmH0 = CoMM_covar_pxem(y, z, x1p, x2p, w1, w2,constr = 1); loglikHa = max(fmHa$loglik,na.rm=T) loglikH0 = max(fmH0$loglik,na.rm=T) tstat = 2 * (loglikHa - loglikH0); pval = pchisq(tstat,1,lower.tail=F) alpha_hat = fmHa$alpha
Fit CoMM using GWAS and eQTL data
The example of running CoMM using GWAS and eQTL data in plink binary format
file1 = "1000G.EUR.QC.1"; file2 = "NFBC_filter_mph10"; file3 = "Geuvadis_gene_expression_qn.txt"; file4 = ""; file5 = "pc5_NFBC_filter_mph10.txt"; whichPheno = 1; bw = 500000;
Here, file1 is the prefix for eQTL genotype data in plink binary format, file2 is the GWAS data in plink binary format, file3 is the gene expression file with extended name, file4 and file5 are covariates file for eQTL and GWAS data, respectively. Then run fm = CoMM_testing_run(file1,file2,file3, file4,file5, whichPheno, bw);
. For gene expresion file, it must have the following format (rows for genes and columns for individuais):
lower | up | genetype1 | genetype2 | TargetID | Chr | HG00105 | HG00115 |
---|---|---|---|---|---|---|---|
59783540 | 59843484 | lincRNA | PART1 | ENSG00000152931.6 | 5 | 0.5126086 | 0.7089508 |
48128225 | 48148330 | protein_coding | UPP1 | ENSG00000183696.9 | 7 | 1.4118007 | -0.0135644 |
57846106 | 57853063 | protein_coding | INHBE | ENSG00000139269.2 | 12 | 0.5755268 | -1.0162217 |
116054583 | 116164515 | protein_coding | AFAP1L2 | ENSG00000169129.8 | 10 | 1.1117776 | 0.0407033 |
22157909 | 22396763 | protein_coding | RAPGEF5 | ENSG00000136237.12 | 7 | 0.2831573 | -0.1772559 |
11700964 | 11743303 | lincRNA | RP11-434C1.1 | ENSG00000247157.2 | 12 | 0.2550282 | -0.2831573 |
To make ‘CoMM’ further speeding, we implement multiple thread version of ‘CoMM’ by just run fm = CoMM_testing_run_mt(file1,file2,file3, file4,file5, whichPheno, bw, coreNum);
where coreNum = 24
is the number of cores in your CPU.
Figures
The following data and codes are used to produce one of the figures in the Yang et al. (2018).
// add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); });