A team of scientists has broken a new record for sequencing the largest animal genome to date – the Australian lungfish, one of the few living relatives of the first land vertebrates. The genome contains 43 billion base pairs which is 14 times larger than the human genome. The research is published in the journal Nature.1
The Australian lungfish (Neoceratodus forsteri) is found in the slow-flowing rivers and still waters of Southeast Queensland Australia. It is unique both in its appearance and its biology compared to other aquatic creatures. A single dorsal lung acts as an additional respiratory organ when the fish's activity is high, it possesses fins that resemble limbs and has a strong sense of smell.
Nineteenth century scientists mistakenly classified the lungfish as an amphibian, but it is now known to belong to the Sarcopterygii (lobe-finned fish) clade. The lobe-finned fish clade – which also includes coelacanths and tetrapods – of the Devonian period gave rise to all land vertebrates. The lungfish's position in the tree of life therefore makes it an interesting creature to study for understanding how aquatic vertebrates evolved and adapted for life on land, particularly at the genomic level.
Scientists from the Research Institute of Molecular Pathology (IMP) and the Universities of Vienna, Würzburg, Konstanz and Hamburg have successfully sequenced the genome of the Australian lungfish de novo, setting a record for the largest animal genome to have ever been sequenced.
A homogenous puzzle
To conduct the study, the team obtained biopsy material from a juvenile Australian lungfish that was imported from Australia.
Sequencing large genomes is no easy feat. It requires the breakdown of the entire genomic sequence into millions of smaller pieces that are then reassembled. "Large genomes contain tremendous amounts of repetitive content. Once the sequencing machine splits the genome into tens of millions of readable (and overlapping) pieces, it is tricky to put them back together because many pieces look alike. This is further complicated by the inherent error rate these sequencing machines have," explained Siegfried Schloissnig, a computer scientist and biologist at the IMP and co-first author of the study. He likened the task to assembling a puzzle with a very homogenous pattern.
Thus, the scientists had to develop their own suite of tools – called MARVEL assembler – to stitch the sequence back together correctly and efficiently. The pieces were then assembled into larger structures, such as whole chromosomes, using a chromosome conformation capture technique, Hi-C. Hi-C belongs to a family of molecular biology techniques that can be used to analyze the organization of chromatin in a cell at the spatial level.
Lungfish are more closely related to four-legged animals than coelacanths
The lungfish genome was found to be a staggering 43 billion base pairs in size. That is 30% larger than the Mexican axolotl's genome – previously considered to be the largest genome sequenced – and 14 times larger than the human genome. What evolutionary secrets were found hidden within this enormous genome?
"Biologists have long argued whether lungfish or coelacanths (another group of lobe-finned fish) were the closest living relatives of the first land vertebrates," said Elly Tanaka, group leader at IMP and corresponding author on the study.
The team's phylogenetic analyses confirmed that the lungfish are closer to amphibians, reptiles, birds and mammals, than the coelacanths. Lungfish branched away from the ancestral line that would lead to four-legged animals just 420 million years ago.
IMP Scientists Siegfried Schloissnig and Elly Tanaka at Haus des Meeres in Vienna. Credit: IMP.
The researchers analyzed levels of gene expression in various tissues of the lungfish, which revealed some parallels between the species and land vertebrates. "We also discovered that the number and expression levels of genes related to lung surface proteins, air-borne odor receptors and limb development were closer to those of amphibians than to other fish. These findings point to a set of adaptations that were necessary to transition to a terrestrial lifestyle," added Tanaka.
No genome too large?
When asked about potential study limitations, the authors highlighted the bioinformatics barriers they faced, as the majority of algorithms currently available for sequencing projects had not been designed to process genomes of this size.
"We would love to say to the programming world, “please write genome analysis programs with 42 Gb genomes in mind!” – Schloissnig.
Furthermore, the lungfish was listed as a "threatened" species under the Australian Environment Protection and Biodiversity Conservation Act in 1999 due to urban growth in the Queensland area adversely impacting aquatic ecosystems.2 Consequently, the researchers were limited in the amount of tissue and embryonic material that was at their disposal. "We would have loved to analyze gene expression in other organs such as the brain to understand how they have evolved," Shloissnig commented. Tanaka added that she hopes the genome data may contribute to the ongoing conservation efforts in Australia to preserve the lungfish.
As for next steps, the researchers are keen to understand the genetic relationship between geographically separated lungfish, including species inhabiting Africa and South America. They are now on the lookout for the next large genome that can be tackled using their custom algorithms, with their sights sets on salamanders – watch this space.
Siegfried Schloissnig and Elly Tanaka were speaking to Molly Campbell, Science Writer, Technology Networks.
1. Meyer A, Schloissnig S, Franchini P, et al. Giant Lungfish genome elucidates the conquest of land by vertebrates. Nature. 2021. doi: 10.1038/s41586-021-03198-8.
2. Arthington AH. Australian lungfish, Neoceratodus forsteri, threatened by a new dam. Environ. Biol. Fishes. 2009;84(2):211-221. doi:10.1007/s10641-008-9414-y.