New Software Helps Researchers Solve Genomic 'Jigsaw Puzzle'
News Feb 10, 2015
The high-performance computing workflow system, ‘RAMPART’, enables researchers to design and execute their own assembly workflows using a set of third-party open-source bioinformatics tools to provide improved results for their particular genome assembly projects.
De novo genome assembly is the process of reconstructing full-length chromosomes from the shorter genomic fragments produced by sequencing devices. The process has been described as like putting together a multi-million piece jigsaw puzzle. The assembly algorithms use overlapping information in the sequenced reads (jigsaw pieces) to reconstruct longer genomic sequences (larger chunks of the final picture). This is a common task in the study of many non-model organisms, as having longer sequences allows scientists to do many other downstream analyses, such as identifying genes or making comparisons to other individuals or organisms.
The de novo genome assembly process is a complex task and typically involves testing multiple tools, parameters and approaches to produce the best possible assembly of the available data. This is because it is not always known beforehand which tools and settings will work best on the available sequence data given the organism’s specific genomic properties, such as genome size, ploidy (number of sets of chromosomes in the nucleus of a cell) and the composition of repetitive genomic content. Despite advances in computing hardware, algorithms and sequencing technologies, de novo assembly, particularly for more complex eukaryotic genomes, remains a challenging task.
Recently, several tools, such as ‘iMetAMOS’ and ‘A5’ approach this problem by exhaustively testing many tools in parallel and then identifying and selecting the best assembly. However, these pipelines focus on bacterial genomes, where the computational demands are more manageable and the genomes are smaller and generally easier to assemble. Larger, more complex, genomes require more computational power and prohibit exhaustive testing of all tools and parameters with current computing hardware. For these projects, the user must use the literature and their own experience to decide which possibilities are worth considering.
The new workflow system, RAMPART, led by Daniel Mapleson at TGAC, allows the user to design and execute their own assembly workflows using a set of third-party open source bioinformatics tools. This reduces human error and relieves the burden of organising data files and executing tools manually, frequently helping to produce better assemblies more efficiently.
RAMPART gives the user the freedom to compare tools and parameters to identify the effect these have on the given data sets. The flexibility to roll-your-own workflow enables the user to tackle even complex assembly projects, tailoring the amount of work to be done based on the availability of computing resources, quantity of sequence data and complexity of the genome. In addition, RAMPART produces logs, metrics and reports throughout the workflow, which allows users to identify, and subsequently rectify, problems that may occur.
Daniel Mapleson, Analysis Pipelines Project Leader in the Regulatory & Environmental Genomics group at TGAC, said: “RAMPART helps us to speed up de novo genome assembly projects by helping to coordinate available sequence data and pass it through a multi-stage pipeline, comprising of existing tried and trusted assembly tools. It provides a mechanism to systematically analyse the quality of multiple assemblies produced using various tools and settings, without requiring a reference genome sequence, making it particularly suitable for genome assembly projects of non-model organisms. This software should help to reduce the costs of producing high-quality draft genome assemblies in the future."
The scientific paper, titled: “RAMPART: a workflow management system for de novo genome assembly” is published in Oxford Journals Bioinformatics.
Researchers Awarded $28M for Illuminating Druggable Genome NIH GrantsNews
Researchers receive grants as part of the NIH program focused on experimental and informatics approaches to characterize understudied proteins from three gene families: ion channels, G protein-coupled receptors (GPCRs), and protein kinases.READ MORE
No Country for Old GenesNews
Our modern world is radically different from the one we evolved in, and that creates a mismatch between the environment our genes were evolved to face, and the world those genes now encounter. A new review looks at how certain genes that benefited humans in our genetic past now predispose us to disease in old age.READ MORE
CRISPR Editing Stops HIV Virus in Infected CellsNews
Human immunodeficiency virus-1 (HIV-1) infection is a chronic disease affecting more than 35 million people worldwide. The infection can be controlled by antiretroviral therapy (ART), but there is still no complete cure. Now, a new study targeting the regulatory genes of the virus using CRISPR/Cas9 has helped block the production of the virus by infected cells.READ MORE