An international research consortium has announced the 1000 Genomes Project, an effort that will involve sequencing the genomes of at least a thousand people to create the most detailed and medically useful catalogue to date of human genetic variation.
Drawing on the expertise of multi-disciplinary research teams, the map developed by the 1000 Genomes Project will provide a view of biomedically relevant DNA variations at a resolution unmatched by current resources. Data from the Project will be made swiftly available to the worldwide scientific community through freely available public databases.
“The 1000 Genomes Project would have been unthinkable only two years ago,” said Richard Durbin, PhD, of the Wellcome Trust Sanger Institute, who is co-chair of the consortium.
“Today, thanks to amazing strides in sequencing technology, bioinformatics and population genomics, it is within our grasp. So, we are moving forward to examine the human genome at a level of detail that no one has done before, expanding and accelerating efforts to find more of the genetic factors involved in human health and disease,” Dr. Durbin added.
Any two humans are more than 99 percent the same at the genetic level: the small fraction of genetic material that varies among people can help to explain individual differences in susceptibility to disease, response to drugs or reaction to environmental factors.
Using recently developed catalogues of human genetic variation, such as the HapMap and Wellcome Trust Case Control Consortium (WTCCC), researchers already have discovered more than 100 regions of the genome that contain genetic variants associated with susceptibility to common human diseases such as diabetes, coronary artery disease, prostate and breast cancer, rheumatoid arthritis, inflammatory bowel disease and age-related macular degeneration.
However, researchers often must follow those studies with costly and time-consuming DNA sequencing to help pinpoint the precise causative variants. The new map will enable researchers to zero in more quickly on disease-related genetic variants, speeding efforts to use genetic information to develop new strategies for diagnosing, treating and preventing common diseases.
“This new project will increase the sensitivity of disease discovery efforts across the genome fivefold and within gene regions at least tenfold,” said NHGRI Director Francis S. Collins, MD, PhD.
“Our existing databases do a reasonably good job of cataloging variations found in at least 10 percent of a population. By harnessing the power of new sequencing technologies and novel computational methods, we hope to give biomedical researchers a genome-wide map of variation down to the 1 percent level. This will change the way we carry out studies of genetic disease.”
Current methods can detect rare variants that have a significant consequence, such as cystic fibrosis, and which are studied in affected families, or relatively common variants, such as those described in 2007 by the WTCCC, many of which have weak effects on common disease.
“Between these two types of genetic variants – very rare and fairly common – we have a significant gap in our knowledge,” said David Altshuler, MD, PhD, of the Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard University in Cambridge, MA, who is the consortium’s co-chair and was a leader of the HapMap Consortium. “The 1000 Genomes Project is designed to fill that gap, which we anticipate will contain many important variants that are relevant to human health and disease.”
Importantly, the 1000 Genomes Project will map not only the single-letter differences in DNA, called single nucleotide polymorphisms (SNPs), but also structural variants – rearrangements, deletions or duplications of segments of the human genome.
The importance of these variants has become increasingly clear in the past 18 months from the Wellcome Trust Sanger Institute’s Copy Number Variation Project and similar research, which show that structural variants may play a role in susceptibility to certain conditions, such as mental retardation and autism.
The project depends on large-scale implementation of several new sequencing platforms. Using standard DNA sequencing technologies, the effort would cost more than $500 million.
However, leaders of the 1000 Genomes Project expect the costs to be far lower – in the range of $30 million to $50 million. The Project consists of pilot and production phases.
In year one, three pilot projects will determine how to produce most efficiently and cost effectively the project’s detailed map of human genetic variation.
During its two-year production phase, the data produced by the 1000 Genomes Project – an average equivalent to more than two human genomes every 24 hours – poses a major challenge for leading experts in the fields of bioinformatics and statistical genetics.
“The scale of this project is immense. At 6 trillion DNA bases, the 1000 Genomes Project will generate 60-fold more sequence data over its three-year course than have been deposited into public DNA databases over the past 25 years,” said Gil McVean, PhD, of the University of Oxford in England, one of the co-chairs of the consortium’s analysis group.
“In fact, when up and running at full speed, this project will generate more sequence in two days than was added to public databases for all of the past year.”
The data will be held by and distributed from the European Bioinformatics Institute (EBI) near Cambridge, UK and the National Center for Biotechnology Information (NCBI) in the USA.
The 1000 Genomes Project will use samples from volunteer donors who gave informed consent for their DNA to be analyzed and placed in public databases. NHGRI established extensive and careful ethical procedures for previous projects, such as the HapMap.
The first thousand samples for the 1000 Genomes Project will come from those used for the HapMap and from additional samples in the extended HapMap set, which used the same collection processes. Populations from Africa, Asia, America and Europe are included.
“The scale of the 1000 Genomes Project is ambitious, but it is essential for building on the important work carried out since the Human Genome Project,” says Dr Alan Schafer, Head of Molecular and Physiological Sciences at the Wellcome Trust.
“It is clear that as humans, we differ from each other genetically by only a small fraction, yet this is enough to cause variation in human health and disease. By studying this many people, we aim to generate a comprehensive catalogue of variation that will facilitate identification of the disease related variation.”
The detailed map of human genetic variation will be used by many researchers seeking to relate genetic variation to particular diseases.
In turn, such research will lay the groundwork for the personal genomics era of medicine, in which people routinely will have their genomes sequenced to predict their individual risks of disease and response to drugs.