We've updated our Privacy Policy to make it clearer how we use your personal data. We use cookies to provide you with a better experience. You can read our Cookie Policy here.


New CRISPR Systems Discovered in Coal Mine Waters

Coal mine waters.
Credit: Marcin Jozwiak / Unsplash.
Listen with
Register for free to listen to this article
Thank you. Listen to this article using the player above.

Want to listen to this article for FREE?

Complete the form below to unlock access to ALL audio articles.

Read time: 2 minutes

Dr. Feng Zhang’s laboratory develops novel gene-editing systems that can be utilized as research tools or in drug discovery.

An important aspect of this work is the discovery process: identifying organisms, be it prokaryotes or eukaryotes, which possess enzymes or other systems that can be repurposed for gene editing.

Thanks to advances in next-generation sequencing (NGS), extensive databases comprising genomic data for a wide variety of organisms now exist. But their sheer size means that mining these extensive databases can be time consuming and challenging.

In the journal Science, Zhang and colleagues at the Broad Institute of MIT and Harvard, the McGovern Institute for Brain Research at MIT and the National Center for Biotechnology Information (NCBI) at the National Institutes of Health, published a new search algorithm: Fast Locality-Sensitive Hashing-based clustering, or FLSHclust.

The algorithm uses clustering approaches to swiftly search through gargantuan amounts of genetic data. “We applied FLSHclust in a sensitive CRISPR discovery pipeline and identified 188 previously unreported CRISPR-associated systems, including many rare systems,” the researchers said.

A faster way to search for CRISPR systems

The researchers applied FLSHclust to three public databases from the NCBI, its Whole Genome Shotgun database and the Joint Genome Institute. These databases contain genomic information for “rare and unusual” bacteria, such as those found in dog saliva, coal mines, Antarctic lakes and breweries. 


FLSHclust is based on a technique called locality-sensitive hashing, which is used in computer science and big data mining to identify similarities between objects or data points at an efficient speed. Zang and colleagues were able to trawl the three databases to search for genes associated with CRISPR in just a few weeks, identifying a “surprising” number of CRISPR systems.


“This new algorithm allows us to parse through data in a time frame that’s short enough that we can actually recover results and make biological hypotheses,” said Dr. Soumya Kannan, a former graduate student in the Zhang lab, post-doctoral researcher at Harvard University and the study’s co-first author.

Want more breaking news?

Subscribe to Technology Networks’ daily newsletter, delivering breaking science news straight to your inbox every day.

Subscribe for FREE

“This is a testament to what you can do when you improve on the methods for exploration and use as much data as possible,” added Dr. Han Altae-Tran, another former student of the Zhang lab who is now a postdoctoral researcher at the University of Washington. Altae-Tran is also the study’s other co-first author. “It’s really exciting to be able to improve the scale at which we search,” he said.

Finding “molecular gems” 

The researchers characterized four of the discovered systems in the laboratory, discovering their capacity to edit DNA or RNA in human cells, among other functions.

“Some of these microbial systems were exclusively found in water from coal mines,” Kannan said. “If someone hadn’t been interested in that, we may never have seen those systems. Broadening our sampling diversity is really important to continue expanding the diversity of what we can discover.”

Several new variants of Type 1 CRISPR systems, which utilize a guide RNA that is 32 base pairs in length, were discovered. Compared to the Cas9 guide, which is 20 nucleotides in length, Type 1 systems can offer more precise gene-editing approaches according to the researchers, helping to overcome the issue of off-target effects.

“The CRISPR-linked systems that we discovered represent an untapped trove of diverse biochemical activities linked to RNA-guided mechanisms, with great potential for development as biotechnologies,” the authors said.


“Biodiversity is such a treasure trove, and as we continue to sequence more genomes and metagenomic samples, there is a growing need for better tools, like FLSHclust, to search that sequence space to find the molecular gems,” Zhang concluded.


Reference: Altae-Tran H, Kannan S, Suberski AJ, et al. Uncovering the functional diversity of rare CRISPR-Cas systems with deep terascale clustering. Science. 2023;382(6673):eadi1910. doi: 10.1126/science.adi1910


This article is a rework of a press release issued by the Broad Institute. Material has been edited for length and content.