August 30, 2011
By Andrea Anderson - Via Genome Web
Research teams affiliated with the international 1001 Arabidopsis Genomes project are relying on whole-genome re-sequencing of carefully selected strains of Arabidopsis thaliana — combined with RNA sequencing, partial de novo assembly, and re-annotation, in some cases — to get a handle on the model plant's genetic and geographic diversity.
"It's very important to move beyond just a catalog of variants and trying to interpret what those things do relative to the reference," senior author Richard Mott, a bioinformatics and statistics researcher at the Wellcome Trust Centre for Human Genetics at the University of Oxford, told In Sequence.
Two new studies published this week, including one by Mott and his colleagues, are also ratcheting up the Arabidopsis genome count, moving researchers a bit closer to the goal of sequencing 1001 Arabidopsis genomes (IS 10/7/08).
For their study, appearing online in Nature this week, Mott and colleagues from the UK, Germany, and US sequenced, assembled, and annotated the genomes of 18 A. thaliana accessions known to have a worldwide distribution and range of phenotypic features. RNA-sequencing data generated for seedlings from these accessions not only provided information on gene expression in the plants, but also helped in verifying coding SNPs, annotating the genomes, and identifying loci that influence gene expression in Arabidopsis.
"Using the genomes, seedling transcriptomes, and computational gene predictions we have characterized the ancestry, polymorphism, gene content, and expression profile of the accessions," the researchers wrote. "We show that the functional consequences of polymorphisms are often difficult to interpret in the absence of gene re-annotation and full sequence data."
Mott and his co-authors generated between about 27 and 63 times coverage of each genome, using the Illumina Genome Analyzer platform to sequence both 200-base pair and 400-base pair libraries for most of the strains.
In general, each of the new genomes was about one to two percent smaller than the 119-million base Col-0 reference genome, which represents an accession known as Columbia that was sequenced in 2000.
The Columbia accession and the 18 accessions sequenced in the Nature study have been crossed to make more than 700 Arabidopsis lines in the Multiparent Advanced Generation Inter-Cross, or MAGIC, collection, Mott explained, and analyses of the parental strains is expected to help interpret data for MAGIC descendants in the future.
The team's analyses uncovered 1.2 million insertions and deletions, along with millions of SNPs in the new genomes. Compared to the Col-0 reference, each accession tested contained between 497,668 and 789,187 single-base variants. Of these, about 100,000 SNPs per strain turned up in coding sequences that were also interrogated by RNA-sequencing of seedling tissue.