TGAC And Scientific Partners Awarded £6m To Tackle Big Data Challenges In Bioscience
News Feb 13, 2015
As part of the UK’s Biotechnology and Biological Sciences Research Council (BBSRC) big data infrastructure announcement at the AAAS 2015 Annual Meeting, TGAC, with partner Institutes have been awarded £6m for three joint projects:
1) Big Data Infrastructure for Crop Genomics
2) iPlant UK – creating a cyber-infrastructure for plant sciences
3) Establishing the infrastructure for functional annotation of farmed animal genomes
BBSRC has invested £7.5M in new infrastructure to tackle bioscience big data challenges. The new funding will improve the storage and curation of enormous datasets that will unlock untold discoveries in important areas like health, agriculture and sustainable fuels.
Biological discovery is increasingly being driven by ground-breaking technologies, such as high-throughput genomic analysis and next generation biological imaging, which generate massive and complex datasets. In order to investigate complex biological phenomena, researchers need access to comprehensive, integrated data resources that are accessible for the whole community.
Access to primary research data is vital for the advancement of science; to validate existing observations and provides the raw materials for new discoveries. Sharing data in a standardised way can enable exciting breakthroughs as researchers interrogate big data sets to spot undiscovered patterns of biological importance.
However, many biologists, and in some areas the community as a whole, struggle to take full advantage of the data generated because of a lack of computing resource, appropriate support and technical skill. To meet these challenges, BBSRC is strengthening investment in bioinformatics and biological resources, focusing on the needs of the research community, and facilitating the development of sustainable models of operation.
Big data infrastructure for crop genomics
Stimulating new opportunities in crop development to help improve some of the world’s most important crops
The project led by Crop Genomics and Diversity Group Leader at TGAC Dr Sarah Ayling, with EMBL-EBI, has been awarded £2m to develop an open-source platform to enable genomics research in crop science.
Recent advances in sequencing technologies and computational tools have made it possible to sequence the genetic information of some of the world’s most important crop species, such as rice, barley, rapeseed, maize, soya and wheat. These crops constitute a substantial part of the daily food intake for most of the population of the world and any improvements in the breeding for more efficient and nutritious varieties will have a direct impact on ensuring global food security.
While obtaining the genome sequences for these crops provides a hugely useful resource for giving insights into the differences between species, it is through understanding genetic differences in individuals from the same or closely-related species which allows us to identify useful genetic variants which can be selected for during plant breeding. These approaches require a combination of genetic data and data about the plant characteristics. This funding will help to develop a crop bioinformatics platform which enables users to access this genetic and characteristic variation data and to perform analyses.
The platform will be developed using open source principles and publicly available data. This novel platform for crop bioinformatics will promote opportunities for collaborative work with R&D groups in industry, research and academia. The availability of data generated by publicly funded resources, and the concomitant development of new, production-quality tools will lower the barriers to information-enabled crop science, stimulating new opportunities for research and application. The platform will also open up new opportunities for the UK bioinformatics community, traditionally focused on biomedical applications, by developing alternative career paths around biotechnology and agri-food.
Dr Ayling said: “Producing enough food to feed the world's growing population under changing climatic conditions is an enormous challenge. The development of this crop bioinformatics platform will support the use of genomics technologies to explore genetic diversity for crop species and help to speed up the breeding process, producing more sustainable crops sooner.”
Establishing the infrastructure for functional annotation of farmed animal genomes
Aiding our future food supplies by providing an important framework for the discovery of genetic variation in domesticated animals and how that influences their characteristics
The project, co-led by Director of Science/Head of Vertebrates and Health Genomics at TGAC Dr Federica Di Palma, with The Roslin Institute at the University of Edinburgh and EMBL-EBI, has been awarded £1.9m to develop an infrastructure to deliver reference genomes to enable research into economically important animals.
Research on domestic animals has important benefits, including improvements in agriculture, animal health and welfare, and medical research. High quality annotated genome sequences (sequence of DNA that make up the genetic material of an organism) provide an important framework for the discovery of genetic variation and how that influences the characteristics of an animal – such as genes responsible for disease resistance in chickens or greater milk production in cows.
Today, technology to sequence DNA is both rapid and relatively cheap, resulting in vast quantities of available data to make new discoveries. Genome sequences are available for many domesticated animals, including poultry (chicken, turkey and duck), livestock (cattle, pig, goat and sheep), fish (cod, tilapia and salmon) and companion animals (dog and horse).
Our knowledge of the functional elements and in particular of the regulatory sequences within these animal genomes is limited. Identifying the functional elements within the genome and the consequence of variation in these functional sequences is essential.
This funding will establish hardware and compute capacity at TGAC, The Roslin Institute, and EMBL-EBI, together with software, to enable the functional annotation of animal genomes.
The BBSRC grant will provide key infrastructure for the three partner Institutes in the recently launched international collaborative Functional Annotation of Animal Genomes (FAANG) initiative. The FAANG initiative is concerned with addressing the need for high quality annotated genomes as key sources of information and is critical for contemporary research in the biological sciences. It is valuable not only to academic researchers, but also to scientists working in animal breeding, animal health and pharmaceutical industries. This project is concerned with the infrastructure for delivering high quality annotated reference genomes to enable research on economically important animals.
Dr Di Palma said: “High-quality, annotated genomes are essential for the research communities to develop the sophisticated molecular biology tools necessary to facilitate research studies in these economically important models. Farm animal genomic resources will not only facilitate research in basic animal biology but will also aid developments in the animal health industries including animal breeding, food, and sustainable agriculture."
Collaborative bioinformatics UK infrastructure for data-intensive plant science
iPlant UK node to help spread expertise and best practice between the UK and US
The project, co-led by Data Infrastructure and Algorithms Group Leader Dr Robert Davey and Head of Scientific Computing at TGAC Dr Tim Stitt, with University of Warwick, University of Liverpool, University of Nottingham, University of Arizona and the Texas Advanced Computing Centre, has been awarded £1.78m to establish a UK iPlant node that will connect the UK with the US’s cyberinfrastructure for the plant sciences. TGAC’s National Capability in Genomics will support the computational infrastructure.
Plant science research generates huge volumes of data containing untold discoveries, which could help tackle global challenges in medicine, biofuels, biodiversity and agriculture. A current bottleneck to these discoveries is a lack of capacity to share enormous data files and analyse them in an efficient, user-friendly way.
The iPlant Collaborative is a virtual organisation funded by the US National Science Foundation (NSF) to create cyberinfrastructure for the plant sciences. Harnessing the power of some of the world’s fastest supercomputers, iPlant provides huge cloud-based storage space and a virtual lab bench, which put global plant science data and online tools in one place. Users can share datasets and tools to analyse data with as many or as few people as they wish. Tools to analyse data developed by iPlant staff, or built by others, can be shared with the wider community in a similar manner to smartphone ‘apps’.
The iPlant Collaborative is currently distributed across three US locations and in less than 10 years has amassed over 18,500 users. The BBSRC funding will extend this into an international collaboration by building a UK iPlant node at TGAC in Norwich, which provides National Capability for computational infrastructure. Software tools developed for specific plant science sequencing, systems biology and image analysis projects at the Universities of Warwick, Liverpool and Nottingham will be adapted by a dedicated team of programmers so that they can be integrated into iPlant UK. These will then be made freely and openly available for the wider plant science community to use.
A UK iPlant node will help to spread expertise and best practice between the UK and US, allows the UK to input to the future direction of this valuable resource, and provide an exemplar project to others wishing to establish international iPlant nodes.
By establishing iPlant UK and promoting access to a resource that allows users to readily store, analyse and collaborate on their data, this project will help support a wide range of research, including genome-wide association projects exploiting natural variation in crops, predicting biological networks and pathways, and the high-throughput imaging and image analysis services that take researchers one step closer to fully understanding which genes are linked to specific traits in plants.
Dr Davey said: “The deployment of the iPlant platform at TGAC, in conjunction with the expansion of National Capability hardware and collaboration with the iPlant US team, will provide a long-term data management, analysis and sharing hub for the UK plant science community. Infrastructure and training that empower researchers through robust, efficient and intuitive tools are vital for the UK to continue its advances into understanding and addressing key scientific challenges.”
Large-scale supercomputing is essential to solving complex questions about our world. But storage platforms essential for these advanced computer systems have been stuck in a rigid framework that required users to either choose between customization of features or high availability. Now, researchers have found a way to give high-performance computing data systems the flexibility to thrive with a framework called BespoKV, perhaps helping to one day achieve the HPC goal of performing at the exascale, or 1 billion billion calculations per second.READ MORE
As genome editing technologies advance toward clinical therapies, they are raising hopes of a completely new way to treat disease. However, challenges need to be addressed before potential treatments can be widely used in patients. To tackle these challenges, the National Institutes of Health has launched the Somatic Cell Genome Editing program, which has awarded multiple grants including more than $3.6 million to assess the safety of genome editing in human cells and tissues.