An important step toward improving the annotation of the human genome is to identify cis-acting regulatory elements from primary DNA sequence. One approach is to compare sequences from multiple, divergent species. This approach distinguishes multispecies conserved sequences (MCS) in noncoding regions from more rapidly evolving neutral DNA. Here, we have analyzed a region of approximately 238kb containing the human alpha globin cluster that was sequenced and/or annotated across the syntenic region in 22 species spanning 500 million years of evolution. Using a variety of bioinformatic approaches and correlating the results with many aspects of chromosome structure and function in this region, we were able to identify and evaluate the importance of 24 individual MCSs. This approach sensitively and accurately identified previously characterized regulatory elements but also discovered unidentified promoters, exons, splicing, and transcriptional regulatory elements. Together, these studies demonstrate an integrated approach by which to identify, subclassify, and predict the potential importance of MCSs.

Original publication




Journal article


Proc Natl Acad Sci U S A

Publication Date





9830 - 9835


Animals, Base Sequence, Computational Biology, Conserved Sequence, Gene Components, Genome, Human, Genomics, Globins, Humans, Molecular Sequence Data, Regulatory Sequences, Nucleic Acid, Sequence Alignment, Sequence Analysis, DNA, Species Specificity