Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.

Genome-wide association (GWA) studies generate large data files. This chapter concentrates on how to deal successfully with these data files using widely and generally freely available software. The first stage of dealing with GWA data is to determine the format. There are many possible genotype formats available; if the data have already been processed, they may already be formatted for such programs as PLINK or SNPtest; however, the format could be that from the genotype calling algorithm which may not be directly usable in the program of choice. The easiest way to determine the format is usually by looking at the data but the files will likely be too large to open in a text editor. In this case a good way to visualize them is to use the Unix functions, head, tail, less and more. There are two main types of genotype files. Both are tabular format, the first having one row (line) per SNP and column, or columns, per individual as exemplified by those used by SNPtest and the second having one row per individual and column(s) per SNP as exemplified by PLINK format pedigree (ped) files. It is less common to get files with one row per genotype; these tend to have few columns but many rows and give the largest file size. © 2011 Elsevier Inc. All rights reserved.

Original publication





Book title

Analysis of Complex Disease Association Studies

Publication Date



87 - 94