APcluster meta-analysis method: a trans-ethnic meta-analysis of rare and low frequency variants based on a partitioning approach

The method was created to address the challenges of allelic and effect size heterogeneity across populations of different ancestries in trans-ethnic meta-analysis of rare and low-frequency variants. The approach is based on a clustering of ethnic groups using a population genetics argument, with the subsequent quantification of evidence of association within each cluster followed by the combination of evidence between clusters.

 

Program Download

The R functions for the Apcluster meta-analysis method and the code to analyze an example dataset (Apcluster meta-analysis method using Sequence Kernel Association Test of Wu et al group-level p-values) can be downloaded here.

 

Example Dataset

The archive contains genotype and phenotype data for each of 12 ethnic groups: Europeans E1, E2. E3, E4; East Asians A1, A2, A3, A4; Africans Af1, Af2, Af3, Af4. Each file contains a quantitative phenotype in the first column, and variants coded as a minor allele counts in the subsequent columns (1000 individuals and 50 variants). The variants across ethnic groups are matched by columns; so a variant in the same column number is the same across populations. A zero column within an ethnic group means a variant is not observed. Example dataset can be downloaded here.

 

Population genetics simulations dataset

The archive contains population genetics simulations data sets used to estimate the performance of Apcluster meta-analysis method. Each folder contains files corresponding to one simulation scenario (N12, N8, N8C, N4, N4C, A7, A4, A2). Files within folders are named according to the following convention .Population_replicate.txt., where .Population. is a population name and .replicate. is a number of meta-analysis data replicate, 1000 for each scenario. Population names are the following:

  • For non-admixed scenarios (N12, N8, N8C, N4, N4C) four European ethnic groups E1, E2. E3, E4; four East Asians A1, A2, A3, A4; four Africans Af1, Af2, Af3, Af4.
  • For admixed scenarios (A7, A4, A2) one European population (E), one East Asian (A), one African (Af), admixed European and East Asian (EA), admixed European and African (EAf), admixed East Asian and African (AAf), admixed European East Asian and African (EAAf).

The file format is the same as described above for the example data. Each data file contains genotype and phenotype data for 1000 individuals. The first column is a quantitative phenotype followed by variants coded as allele counts (in general, not minor allele counts). Both rare and common variants are included in this data. Variants for populations within the same scenario and with the same replicate number are matched by columns. A zero column within an ethnic group means a variant is not observed for that ethnic group. The dataset can be downloaded here.

 

Population genetics simulations dataset

If you have any questions regarding the use of the program or datasets, please send an e-mail to both of the following people:

  • Sergii Zakharov ( a0076597@nus.edu.sg )
  • A/Prof Yik Ying Teo ( statyy@nus.edu.sg )