Efficiency and consistency of haplotype tagging of dense SNP maps in multiple samples.
Ke X., Durrant C., Morris AP., Hunt S., Bentley DR., Deloukas P., Cardon LR.
Haplotype tagging is a means of retaining most of the information in high density marker maps, while reducing genotyping requirements. Estimates of the numbers of tagging SNPs required to cover the human genome have varied widely, ranging from 100,000 to 1,000,000. Tagging has been applied to a number of gene-based datasets but has not been evaluated in contexts reflecting those of genome-wide association studies--large chromosome regions and multiple samples drawn from the same population. We analysed 5000 common markers across a 10 Mb segment of human chromosome 20 in three samples (UK Caucasian, CEPH Caucasian, African American) to evaluate tagging efficiency and consistency. Overall, the results indicate a high degree of efficiency, yielding 3-5-fold savings in Caucasians and 2-3-fold savings in African Americans. These levels varied according to linkage disequilibrium (LD) levels, tagging thresholds and allele frequencies, but in high LD regions they did not vary markedly due to marker density. However, a strong positive relationship between marker density and tagging was observed, relating to the fact that increasing marker density yields greater sequence coverage in high LD, thus requiring more tag SNPs to cover a greater fraction of the genome. Encouragingly, whatever the density employed, a high level of robustness was observed between UK and CEPH samples, as most of the htSNPs selected in one sample were also appropriate as tags in the other.