Program Name ============================= HaploPS version 1.0 Last updated: June 2012 Description ============================= HaploPS uses phased data files and locate poential regions carrying positive selection signal. It works by searching for long haplotypes at a range of haplotype frequencies. Potential regions carrying positive selection signals should be much longer than neutral regions, at the same haplotype frequency. Program Execution ============================== Command line to run HaploPS at 0.5 frequency ./HaploPS -geno test_genotype.txt -legend test_legend.txt -freq 0.5 -out test_0.5_chr22.txt To identify signals of positive selection, you will need to run at a range of frequencies, say from 0.05 to 0.95 at step size of 0.05. THerefore, the commands are like: ./HaploPS -geno test_genotype.txt -legend test_legend.txt -freq 0.05 -out output_0.05_chr22.txt ./HaploPS -geno test_genotype.txt -legend test_legend.txt -freq 0.1 -out output_0.1_chr22.txt ./HaploPS -geno test_genotype.txt -legend test_legend.txt -freq 0.15 -out output_0.15_chr22.txt ... ... ./HaploPS -geno test_genotype.txt -legend test_legend.txt -freq 0.8 -out output_0.8_chr22.txt The above can be done automatically by the following shell scrip: ############################################# ## ## shell script for running HaploPS ## ############################################# #!/bin/bash load_thres=5 #chip='illu1M' for (( i=95 ; i>=5 ; i=i-5)) { for (( chr=1;chr<=22;chr=chr+1)) { sysload=$(cat /proc/loadavg | cut -d" " -f1) while [ $(echo "$load_thres < $sysload" | bc -l) -eq 1 ] do #echo "System load > threshold" sleep 10 sysload=$(cat /proc/loadavg | cut -d" " -f1) done freq=$(echo "scale=2;$i/100"|bc) ~/haploPS/HaploPS -geno your_haplotype_$chr -legend your_legend_$chr -freq 0$freq -out outputname_0${freq}_chr${chr}.txt & sleep 15 } } ~ ##############It will automatically run haploPS at 5 to 95 percent frequencies. ################################################################# Input File Format ================================ Input files: the SNPs in one legend file and one haplotype file that are one to one correspondence, eg: test_legend.txt & test_genotype.txt Legend files: chromosome, snpid, genetic distance and position Haplotype files: 1/0. Number of rows=2xsamples. Number of column=number of snps included in legend file. Output analysis ============================= put the combine_files.perl,remove_centromere.perl, centromere_positions.txt and rank_signal.r in the same directory as output files. step 1, remove centromere by running: perl combine_remove_haploPS.perl outputname ##if the output is named as outputname_0.05_chr1.txt etc step 2, get the haploPS scores by running: R CMD BATCH ./rank_signal.r #modify the input and output name in the R scripts step 3, extract the significant regions by running R CMD BATCH ./HaploPS_significant.r ##modify the input and output name in the R scripts.