New Software Helps Researchers Solve Genomic 'Jigsaw Puzzle'
News Feb 10, 2015
The high-performance computing workflow system, ‘RAMPART’, enables researchers to design and execute their own assembly workflows using a set of third-party open-source bioinformatics tools to provide improved results for their particular genome assembly projects.
De novo genome assembly is the process of reconstructing full-length chromosomes from the shorter genomic fragments produced by sequencing devices. The process has been described as like putting together a multi-million piece jigsaw puzzle. The assembly algorithms use overlapping information in the sequenced reads (jigsaw pieces) to reconstruct longer genomic sequences (larger chunks of the final picture). This is a common task in the study of many non-model organisms, as having longer sequences allows scientists to do many other downstream analyses, such as identifying genes or making comparisons to other individuals or organisms.
The de novo genome assembly process is a complex task and typically involves testing multiple tools, parameters and approaches to produce the best possible assembly of the available data. This is because it is not always known beforehand which tools and settings will work best on the available sequence data given the organism’s specific genomic properties, such as genome size, ploidy (number of sets of chromosomes in the nucleus of a cell) and the composition of repetitive genomic content. Despite advances in computing hardware, algorithms and sequencing technologies, de novo assembly, particularly for more complex eukaryotic genomes, remains a challenging task.
Recently, several tools, such as ‘iMetAMOS’ and ‘A5’ approach this problem by exhaustively testing many tools in parallel and then identifying and selecting the best assembly. However, these pipelines focus on bacterial genomes, where the computational demands are more manageable and the genomes are smaller and generally easier to assemble. Larger, more complex, genomes require more computational power and prohibit exhaustive testing of all tools and parameters with current computing hardware. For these projects, the user must use the literature and their own experience to decide which possibilities are worth considering.
The new workflow system, RAMPART, led by Daniel Mapleson at TGAC, allows the user to design and execute their own assembly workflows using a set of third-party open source bioinformatics tools. This reduces human error and relieves the burden of organising data files and executing tools manually, frequently helping to produce better assemblies more efficiently.
RAMPART gives the user the freedom to compare tools and parameters to identify the effect these have on the given data sets. The flexibility to roll-your-own workflow enables the user to tackle even complex assembly projects, tailoring the amount of work to be done based on the availability of computing resources, quantity of sequence data and complexity of the genome. In addition, RAMPART produces logs, metrics and reports throughout the workflow, which allows users to identify, and subsequently rectify, problems that may occur.
Daniel Mapleson, Analysis Pipelines Project Leader in the Regulatory & Environmental Genomics group at TGAC, said: “RAMPART helps us to speed up de novo genome assembly projects by helping to coordinate available sequence data and pass it through a multi-stage pipeline, comprising of existing tried and trusted assembly tools. It provides a mechanism to systematically analyse the quality of multiple assemblies produced using various tools and settings, without requiring a reference genome sequence, making it particularly suitable for genome assembly projects of non-model organisms. This software should help to reduce the costs of producing high-quality draft genome assemblies in the future."
The scientific paper, titled: “RAMPART: a workflow management system for de novo genome assembly” is published in Oxford Journals Bioinformatics.
Researchers Discover Mutation That Appears to Protect Against Multiple Aspects of Biological AgingNews
The first genetic mutation that appears to protect against multiple aspects of biological aging in humans has been discovered in an extended family of Old Order Amish living in the vicinity of Berne, Indiana, report Northwestern Medicine scientists.READ MORE
Computer Program Helps Find Ways to Repurpose Existing DrugsNews
Researchers have developed a computer program to find new indications for old drugs. The computer program, called DrugPredict, matches existing data about FDA-approved drugs to diseases, and predicts potential drug efficacy.READ MORE
Machine Learning: Helping Determine How a Drug Affects the BrainNews
Machine learning could improve our ability to determine whether a new drug works in the brain, potentially enabling researchers to detect drug effects that would be missed entirely by conventional statistical tests, finds a new UCL study published today in Brain.READ MORE