“Sequencing an individual’s DNA is useless in medicine unless there is a frame of reference to compare it to,” said Yale University’s Mark Gerstein, the Albert L. Williams Professor of Biomedical Informatics and one of more than 1,000 scientists who participated in international effort.
An individual human genome contains on an average 3 million variations. Without a reference library of variations, trying to hone in on the most informative variant is akin to looking for a needle in a haystack, he said. This compendium allows researchers to distinguish between frequently occurring, usually harmless variants, and rare potentially disease-causing genetic changes.
The work shows humans differ from one another over a whole range of genetic changes ranging from single letter substitutions, called single nucleotide polymorphism or SNPs, to large structural variations – big stretches of DNA that can be millions nucleotides in length. On average, there is a single letter change or SNP in about one out of every 800 letters of DNA in humans. A small percentage of variants appear to be specific to populations, and may account for some of the physical differences in humans in different parts of the world.
The catalog notes that each person carries hundreds of rare variations outside of their genes, in regions that do not code for proteins but appear to be evolutionary conserved. This knowledge makes a foundation for understanding of how personal variants predispose a people to particular diseases without affecting genes.
Gerstein’s lab is a leader in assessing which of these variants are functionally important and played a key role in the ENCODE project published in September, which surprisingly identified large regions of the human genome that play a role in regulating biological processes. With the publication of the 1000 Genomes project, scientists are now armed with a reference library of genetic variants that will allow them to discover rare variants associated with disease.