Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity to comprehensively characterize the polymorphic variants in the population. While the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage inadvertently compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the extension of the study design to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. Sequencing at an average of 30-fold coverage, we illustrate the higher sensitivity at detecting low-frequency and rare variants, and the ability to investigate the presence of hotspots of functional mutations. The deeper coverage allows more functional variants to be identified for each person when compared to the low-pass sequencing in 1KGP. This set of whole-genome sequence data is expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population genetic studies. We also expect the high coverage will enable methodological and technological assessments of current strategies in sequence data analysis.

Sequencing Platform

  • Illumina HiSeq 2000
  • Target coverage of 30-fold
  • Paired-end sequencing
  • 100 base-pairs (bp) read length
  • Target insert size of between 300 and 400bp

SSMP Samples

  • 100 subjects (50 males and 50 females) from the Singapore Population Health Study

Data Release Policy

Please cite the following publication(s) if you are using the data in any publication:

  • LP Wong et al. Deep whole genome sequencing of 100 Southeast Asian Malays. (Submitted)