Oxford Team Creates a “First Draft” of Humanity’s Family Tree
Oxford Team Creates a “First Draft” of Humanity’s Family Tree
Complete the form below and we will email you a PDF version of "Oxford Team Creates a “First Draft” of Humanity’s Family Tree"
The influence of genetics on science and human
Researchers from the University of Oxford’s Big Data Institute (BDI) have created a “first draft” of the family tree of humanity by combining modern and ancient genome sequences from eight large data banks. The data is published in Science.
The “genomic era” saw advances in next-generation sequencing (NGS) technology that bolstered the field of genetics. Today, DNA sequencing is faster, cheaper and more accurate than ever before, enabling its application across a wide range of scientific disciplines. DNA sequencing is commonly used in modern medicinefrom diagnostics to drug development, agriculture and, in the case of ancient DNA analysis, to answer questions about Earth’s history.
A growing number of genomic data repositories exist that gather and store sequencing data from populations across the globe. Data sets are often utilized in research studies that may, for example, explore the genetic constitution of a disease by studying individuals diagnosed with that specific condition within the data bank. However, these repositories of data are not exclusively for human DNA samples; there are genomic data banks for a myriad of different organisms found on Earth.
As the amount of publicly available genomic data continues to increase, the opportunities to study the genetic ancestry of the human population also increase. Consequently, researchers are able to piece together a picture of how our genetic diversity arose, and how different people in the world are related to one another.
“The theoretical background of genome-wide genealogies, which describe how we have inherited our genes from our ancestors, has been developed for the past three decades,” Dr. Anthony Wilder Wohns, postdoctoral researcher at the Broad Institute of MIT and Harvard and former PhD student at the University of Oxford’s Big Data Institute (BDI) told Technology Networks. “However, actually estimating this structure is a tremendously difficult statistical problem.”
The difficulties associated with combining huge data sets from large numbers of different databases have been a huge barrier to this research effort – until now. Wilder Wohns is the lead-author of a new study, conducted during his time at the BDI, which reports on a novel method to easily combine millions of genome sequences from both modern and ancient populations.
Alongside his colleagues, Wilder Wohns utilized this method to create a “first draft” of humanity’s family tree. “We devised a novel algorithm that deduces genetic relationships without needing to compare every DNA sequence with every other and coupled it with another algorithm that places dates on common ancestors by treating the whole ancestry as a single network,” Wilder Wohns said. “Furthermore, by estimating the entire genealogy of humanity, we were able to create algorithms that allowed us to use the entire genome to estimate when and, for the first time, where our ancestors lived.”
A network of 27 million ancestors
The data was combined from a total of eight datasets, seven of which were already publicly available and one novel set – known as Afanasievo – which contains ancient DNA.
The tree contains whole genomes obtained from 3,609 individuals across 215 different populations. Because each individual inherits specific genomic regions (known as alleles) from one parent, the genealogy maps over 7,000 genomes. The ancestry of each genomic region can be likened to a tree. Looking at the “trees” collectively across the large data sets, the research team were able to link specific regions back through time by utilizing the ancient DNA data, creating an “ancestral recombination graph” that demonstrates where the genetic variation first appeared. This graph contained ~27 million ancestors.
Using the algorithms to determine where ancestors lived, the researchers were able to recapture events – from a genetic perspective – that are known to have occurred in human history, such as migration out of Africa.
Understanding DNA, genes and alleles
A gene is a section of DNA that encodes a specific trait. The composition of a gene can differ between copies of the same gene, meaning a gene can exist in different forms – or “variations” – across individuals. These variations of a gene are known as alleles, and its position within the genome is a locus.
Understanding alleles can be made easier when we think about eye color. There are different alleles of this gene – brown, blue or green. Our eye color is determined by which allele we inherit from our mother or our father. We may inherit two copies of the same allele (where our genotype would be classes as homozygous for that locus) or one copy of two different alleles (known as a heterozygous genotype).
Alleles of the same gene are either autosomal dominant or recessive. This means that, if we inherit two different alleles, whichever allele is autosomal dominant will be preferentially expressed over the recessive allele.
Wilder Wohns expands on why this tree is a “first draft”: “We have created the largest human family tree ever, which describes the origin and spread of human genetic variation. While the tree is comprehensive, it cannot be truly ‘complete’ unless we had the genome of everyone alive today and all of their ancestors, as well as knowledge of where and when they lived. We thus think of what we have created as a ‘first draft’ of the family tree of all of humanity.”
He adds that no inference method is perfect, and the genomic data sets have been constructed from a wide variety of different sources that initially utilized varying methods, which may result in some degree of error: “Nonetheless, we have shown that genealogy does capture many key, well-understood events in human history, giving us confidence in its accuracy.”
Laying the groundwork for the next generation of DNA sequencing
While creating the first draft of humanity’s family tree is a major milestone for genetics research, the team aren’t stopping there. This study “lays the groundwork”, Wilder Wohns explains, for the next generation of DNA sequencing. As the number of high-quality genome samples grow, the tree will continue to expand. It’s possible that the data could be utilized in studying the origins of specific genetic variants that are associated with human disease, the research team emphasize.
Considering the amount of genetic information that continues to be gathered from individuals – either in a medical context or beyond – there are issues to deliberate and overcome with regards to data privacy and ethics. This is an issue that Wilder Wohns acknowledges will be a factor in future studies, “However, the genealogy we created contains only publicly available genomes so we are confident that no data privacy concerns in this particular work,” he concludes.
Reference: Wohns Wilder A, Wong Y, Jeffery B et al. A unified genealogy of modern and ancient genomes. Science. 2022;375(6583):eabi8264. doi: 10.1126/science.abi8264.
Dr. Anthony Wilder Wohns was speaking to Molly Campbell, Senior Science Writer for Technology Networks.