Genotype prediction using a dense map of SNPs.
Evans DM., Cardon LR., Morris AP.
The International Haplotype Mapping Project (HapMap) aims to characterize the distribution and extent of linkage disequilibrium (LD) throughout the human genome, thereby facilitating genome-wide association analysis and the search for the genetic determinants of complex diseases. Implicit in the rationale behind the project is the expectation that hidden (unobserved) disease-causing variants will be in significant LD with surrounding typed markers and will thus be amenable to detection using association-based mapping approaches. In order to investigate the validity of this assumption, we examined more than 5,000 SNPs across a 10-MB region of chromosome 20 in a sample of 96 unrelated African-American and 96 unrelated Caucasian individuals. We treated observed loci as surrogates for hidden SNPs by pretending that individuals' genotypes were unknown. We then attempted to predict these genotypes at the surrogate hidden SNP by using information about LD in the region and genotypes at surrounding observed loci. Our method is based on finding the most likely genotype for each individual, given all possible haplotype pairs consistent with observed genotypes for that individual at surrounding loci, and given the frequencies of those haplotypes in an independent sample. Our method performs extremely well in predicting genotypes in areas of high LD. Furthermore, in areas of low LD, our method results in substantial gains in predictive accuracy as compared to pair-wise strategies. These results suggest that pair-wise tests of disease-marker association may be inferior to multipoint methods, which take advantage of the information contained within multi-locus haplotypes.