A Treasure Trove of Genetic Variation
Blog Jan 17, 2017
A team of researchers from the UK and USA recently announced the completion of a huge project to support the creation of improved wheat strains. Together they have generated a resource enabling open access to ten million mutations in bread and pasta wheat varieties.
This study has already supported several success stories. Using this mutation data has already allowed scientists and breeders to generate improved wheat varieties with larger grains and greater nutritional value. However, the study wasn’t without its challenges and required the development of some incredible new technology and methods in order to finally crack open the wheat genome.
To learn more about the study itself, I spoke to Dr Ksenia Krasileva, Group Leader at the Earlham Institute and The Sainsbury Laboratory.
JR: What has prevented researchers from compiling such a comprehensive database on wheat mutations until now? Why is it such a complex plant to unravel at a genetic level?
KK: The wheat genome is 17 Gb, this is 5 times larger than human genome. To add to the challenge of this amount of genetic material, wheat has 100,000 genes which occupy only 1-2% of its genome. The major breakthrough was our development of wheat exome capture – the ability to decode only gene space in each plant which is cost effective and therefore allowed us to sequence 2700 lines that comprise the resource.
JR: Why is it important that this new data is made available in a public database? Where can it be found and how it can it be utilised?
KK: Public availability of data and seeds ensures that everyone has access to it, from researchers in every country, to breeders and industry throughout the world, including in developing countries. This widens the use of the resource and will accelerate both wheat research and breeding. The resource is online at wheat-tilling.com and dubcovskylab.ucdavis.edu/wheat-tilling
JR: This project involved sequencing 400 billion bases of DNA from 2735 mutant wheat lines. How did you handle and process all that data? This must have presented huge challenges?
KK: The data was processed in high performance computing (HPC) environment here at Earlham Institute and at UC Davis. The HPC systems were used to map sequencing reads to the wheat genome reference for 2700 individual wheat lines. This step took 24 to 48 hours to run per sample using 8 cpu and required up to 60 GB RAM. In the second step, mutations were called using the HPC cluster. This step could be parallelized and distributed across the cluster in 2000 individual tasks which took a few hours each to run using 4 cpus and 7 GB per task. Overall, it took us several years to generate and process the data. If we were to re-do all analyses today using the most modern computing, it would still take us several months.
JR: What expertise did each of the collaborators offer to make this project a success?
KK: This is a truly collaborative project across four institutes: UC Davis, Earlham Institute, John Innes Centre and Rothamsted. The original wheat populations were developed at UC Davis and Rothamsted, and, also maintained by JIC. Earlham generated half of the sequencing data and Davis another half. All institutes were involved in data analyses. Personally, I have been involved in this project since 2011 first as postdoctoral USDA fellow at UC Davis and for the past two years as a group leader here at Earlham. What greatly contributed to the project’s success is openness about data and analyses, sharing all of the tools and involvement of all institutes from the very beginning of the project.
Dr Ksenia Krasileva was speaking to Jack Rudd, Senior Editor for Technology Networks.