ABOUT OUR PROJECT
This study aims to characterize the extent of common variation in the human genome across at least 1 million single nucleotide polymorphisms (SNPs) for DNA samples from each of the three ethnic groups in Singapore – Chinese, Malays and Indians. The data generated will supplement the public database of genetic variation provided by the International HapMap Project which surveyed individuals from four populations across Africa, Europe and East Asia. The data will be used to assess the difference in the extent of linkage disequilibrium between the different ethnic groups, evaluate genetic heterogeneity in sample collections, document broad-scale recombination hotspots and map the extent of copy number variations in each ethnic group. The results of the analysis will be used to guide and optimize the design of large-scale genetic association studies, as well as for investigating gene-environment interactions. Knowledge of the degree of genetic commonality across ethnic groups will also provide preliminary indication of whether genes involved in drug and enzyme metabolism are common across the ethnic groups.
Genotyping platforms used are:
- The Affymetrix Genome-Wide Human SNP Array 6.0 which assays approximately 900,000 SNPs and more than 946,000 genetic markers probing for copy number variations
- The Illumina Human1M single BeadChip which assays about 1 million SNPs and copy number polymorphisms.
Data from both platforms have been merged for this release.
A. SGVP samples
The SGVP samples comprise of 292 samples – 99 Chinese, 98 Malays and 95 Indians. The inclusion criterion specifies that parents and both sets of grandparents have to belong to the same ethnic group
Population label | Population | Number of samples |
---|---|---|
CHS | Chinese | 99 |
MAS | Malays | 98 |
INS | Indians | 95 |
B. SNP Genotype data
- Genotype calling
Illumina genotypes were assigned by the proprietary calling algorithm GenCall (GC) in BeadStudio 3.0 using the cluster files provided by Illumina. A threshold of 0.15 was implemented on the GC score to decide on the confidence of the assigned genotypes, i.e. any genotype with a GC score ≥ 0.15 will be accepted and assigned NULL otherwise.
Affymetrix genotypes were called by the Birdseed calling algorithm from Broad and available in the Affymetrix Power Tools apt-1.8.6 (release March 4, 2008). Models files were based on na24 release.
C. Quality control
Quality control was performed separately on the two platforms. Samples are identified for removal on the basis of:
- High rates of missingness (> 2%)
- Excessive heterozygosity
- Cryptic relatedness by excessive identify-by-states
- HAdmixture or discordant ethnic membership through the use of principal components analysis
In all genotype files, alleles are expressed on the forward strand of the NCBI build 36. QC+ datasets contains SNPs that passed the above quality criteria and are polymorphic in at least one ethnic group while QC+mono datasets include SNPs that are monomorphic across the three ethnic groups.
D. Merged data
Data from the two platforms were merged by rsID, and further checks were done with chromosomal positions. Only individuals with genotype data on both platforms are kept. For common SNPs, those with < 95% concordance between the two platforms are removed. The remaining SNPs with higher than 95% concordance, the genotype calls from the platform with higher call rates are kept. For SNPs with the same extent of missingness, the Illumina genotypes are retained. Genomic positions were further checked to confirm uniqueness of the SNPs in the dataset.
Click here to download sample information
Population | Final Number of samples | QC+ | QC+Mono |
---|---|---|---|
CHS> | 96 | 1,405,417 | 1,584,040 |
MAS | 89 | 1,402,256 | 1,580,905 |
INS | 83 | 1,404,699> | 1,583,454 |
E. Data Release Policy
Please cite the following publication if you are using the data in any publication.
Teo YY, Sim X, Ong RTH, Tan AKS, Chen JM, Tantoso E, Small KS, Ku CS, Lee EJD, Seielstad M and Chia KS. Singapore Genome Variation Project: A Haplotype map of three South-East Asian populations. Genome Research (In press).
F. Funding agencies/Acknowledgements
- Yong Loo Lin School of Medicine, National University of Singapore (NUS)
- NUS Life Science Institute
- Department of Community, Occupational and Family Health (COFM), NUS)
- Genome Institute of Singapore (GIS)