At the turn of the millennium, the cost to sequence a single human genome exceeded $50 million, and the process took a decade to complete. Microbes have genomes, too, and the first reference genome for a malaria parasite was completed in 2002 at a cost of roughly $15 million. But today researchers can sequence a genome in a single afternoon for just a few thousand dollars. Related technologies make it possible to capture information about all genes in the genome, in all tissues, from multiple individuals.
Such advances have ushered in the era of “Big Data,” where biologists collect immense datasets, seeking patterns that may explain important diseases or identify drug and vaccine targets. But to be useful this deluge of data must be organized, maintained and made accessible to researchers.
Since 2000, a team led by University of Pennsylvania and University of Georgia scientists has been responsible for developing genome database resources for microbial pathogens, including the parasites responsible for malaria, sleeping sickness, toxoplasmosis and many other important diseases. To ensure this important work will continue, the National Institute of Allergy and Infectious Disease has awarded the institutions a new contract for 2014-15 worth $4.3 million. Assuming annual renewal, this five-year award is expected to total $23.4 million.
The contract supports the Eukaryotic Pathogen Genomics Database, or EuPathDB. By providing the global scientific community with free access to a wealth of genomic data related to pathogens important to human health and biosecurity, EuPathDB expedites biomedical research in the lab, field and clinic, enabling the development of innovative diagnostics, therapies and vaccines.
The latest contract is the third time that the National Institutes of Health has awarded support to EuPathDB, building on previous contracts issued in 2004 and 2009, as well as prior grant funding from the NIH and the Burroughs Wellcome Fund. Affiliated projects have also been supported by the Wellcome Trust, U.K., the Bill & Melinda Gates Foundation, the Sloan Foundation, the World Health Organization, the U.S. Department of Agriculture, the Brazilian government and other organizations.
EuPathDB is jointly directed by principal investigators David S. Roos, E. Otis Kendall Professor of Biology in Penn’s School of Arts & Sciences, and Jessica C. Kissinger, director of the Institute of Bioinformatics at the University of Georgia. Christian Stoeckert of Penn’s Perelman School of Medicine is a co-investigator.
One of four Pathogen Bioinformatics Resource Centers, or BRCs, supported by the NIH, EuPathDB encompasses disease-causing eukaryotes, which are organisms that possess a membrane-bound nucleus. Other BRCs support data on viruses, bacteria and insect vectors of disease.
Plasmodium species are responsible for malaria, causing an estimated 200 million illnesses and 600,000 deaths each year. These parasites were among the first to be integrated into EuPathDB, but the database has since expanded greatly, leveraging core infrastructure supported by the NIH contract to incorporate more than 3,000 genomes from more than 300 species. Others include important threats to public water supplies, such as Cryptosporidium, Entamoeba and Giardia; Toxoplasma gondii, a parasite responsible for neurological disease in infants and immunocompromised adults; Trichomonas, a cause of vaginitis; and numerous clinically, economically and scientifically important fungal and agricultural pathogens.
Since its prototype was launched in 1999, EuPathDB has become increasingly complex and increasingly valuable as a resource for researchers around the world. In total, the database comprises about nine terabytes of data and has been cited more than 8,000 times in the scientific literature. Each month, EuPathDB receives over 6.5 million hits from 13,000 unique visitors in more than 100 countries, including areas where tropical diseases such as malaria are endemic. India is now the second largest user of the Plasmodium genome database, and more than 5 percent of users hail from Africa. The overall project employs 28 people on four continents.
“It is truly inspiring to see how access to these on-line resources has helped to invigorate and engage scientific colleagues around the world,” Roos said. “EuPathDB occupies a large global footprint.”
While NIH funding supports core infrastructure, additional partners have helped to expand the project’s reach. For example, the Bill & Melinda Gates Foundation and the Wellcome Trust helped extend the EuPathDB project to cover parasites responsible for kala azar (Leishmania), African sleeping sickness (Trypanosoma brucei) and Chagas disease (Trypanosoma cruzi).
“Recent years have witnessed a dramatic increase in research and drug discovery for these organisms, and we are glad that EuPathDB has helped to move this work forward,” Roos said.
Using EuPathDB and other resources, researchers around the world can now conduct cutting-edge research “in silico,”on the computer, maximizing the chance of success when translated to the lab or clinic.
“This database has expedited research in many ways,” UGA’s Kissinger said. “Vaccine scientists frequently want to examine how proteins have changed over time, to identify those with signatures indicating that they provoke the human immune system. Those studying a specific antigen may wish to examine its structure and diversity, in order to prioritize those regions that might be most promising and relatively unlikely to develop resistance.”
The implications for the practice of medicine are broad, especially as medical professionals move toward capturing genomic data from individual patients, using electronic medical records to capture its complexity.
“The sophistication of the questions people can ask continues to increase,” Roos said. “As we move to the next phase of this project, our job is to ensure that this resource remains dynamic, taking into account how people interact with the data in ways that can have a real impact on global health.”