A population of cells is not as uniform as researchers once thought. Single-cell genomic DNA sequencing has offered insights into the different species of cells that can be present in a sample. Even among cells which are genetically identically, single-cell sequencing of RNA or epigenetic modifications can reveal important cellular diversity. Here, we will describe the latest developments in single-cell sequencing and their applications in diverse fields such as oncology and microbiology.
How single-cell sequencing works
Before the advent of single-cell sequencing technology, bulk sequencing was used. However, when cells in tissues are sequenced in bulk, rare cells are diluted and averaged out. Single-cell sequencing is useful for biological studies as it enables cells that are rare and cannot be cultured to be identified. Additionally, it facilitates understanding of cellular heterogeneity in complex samples.
The process of single-cell sequencing includes a few main steps: Tissue dissociation into cell suspension is required when the sample is a tissue.1 If the sample already exist as a cell suspension, it can be used as it is. Next, target cells are trapped as single cells using micromanipulation, flow cytometry sorting or microfluidics technology. Micromanipulation technique using micropipette is the cheapest but also most laborious while semi-automated microfluidics method for single-cell encapsulation in droplets are becoming more popular. Individual isolated cells are lysed to capture as many DNA or RNA molecules as possible. Next, for single-cell RNA sequencing (scRNA-seq), the RNA is converted into complementary (c)DNA by reverse transcription before undergoing PCR amplification. The DNA or cDNA library is then prepared and sequenced via next generation sequencing technology for genomic and transcriptomics sequencing, respectively. Finally, bioinformatics approaches are used to analyze the data.
Single-cell sequencing is a rapidly growing field. Although the workflow is well-defined, better bioinformatics tools are continually being developed to generate more reliable data for cell clustering. Furthermore, single-cell sequencing is also being applied to a wide range of scientific problems ranging from environmental analysis to cancer therapeutics.
Improved analysis by integrating data from bulk and single-cell sequencing
Each type of single cell sequencing method provides unique information to decipher heterogeneity in a complex cell population. For instance, single-cell RNA, assay for transposase-accessible chromatin (ATAC) and chromosome conformation (Hi-C) sequencing offers information on gene expression profiling, accessible chromatin regions and chromatin contacts, respectively. However, even with single-cell RNA and ATAC experiments, our understanding of the specific regulatory networks is incomplete if we are unable to connect the 3D contacts between the active regulatory elements and gene promoters – an area where bulk sequencing provides better information.
Zeng and colleagues introduced a new method called DC3 (De-Convolution and Coupled Clustering) for an improved joint analysis of bulk and single-cell data that is able to deconvolve data from bulk profiles into sub-population-specific profiles.2 The authors applied their DC3 method in the joint analysis of scRNA-seq, single-cell ATAC sequencing (scATAC-seq) and bulk chromatin immunoprecipitation (Hi-ChIP) on mouse embryonic stem cells after four days of retinoic acid-induced differentiation.
DC3 identified three sub-populations. The authors focused on sub-population two that express both EpCAM and CD38 surface markers and are distinctively different from the other sub-populations. Next, they performed a Hi-ChIP experiment, and from the principal analysis component plot from the loop profiles, the double positive (EpCAM+CD38+) sample had a Pearson Correlation Coefficient (PCC) of 0.7633 with that of sub-population two cells, which is significantly higher than that of sub-population cell one and three. This demonstrates the validity of DC3 in deconvolution of bulk loop data in a complex biological sample to better classify cell types using single-cell sequencing data.
Applying the DC3 method to gene ontology enrichment analysis, Zeng and colleagues also found that DC3 containing bulk loop profiles of cells provided better enrichment results than using scRNA-seq data alone. Specifically, when the sequencing depth (i.e. the number of times a nucleotide in the genome is being read) is low, DC3-inferred loop information provided an improved characterization of the sub-populations over scRNA-seq data alone. This makes DC3 a powerful method as it can be used to improve interpretation from existing data sets of single-cell RNA and ATAC sequencing (especially those with low sequencing depths) by performing additional HiChIP experiments that are simple and not too costly.
Multiplexed single-cell sequencing to understand cancer heterogeneity
Intra-tumoral heterogeneity exists in patients’ samples which could account for differences in therapeutic efficacy. Kinker and colleagues wanted to understand whether these differences were a result of native tumor microenvironment or intrinsic cellular plasticity. “In particular, we wondered how well the diversity seen within patient samples can be reproduced in “traditional” cell lines,” said Itay Tirosh, principal investigator at the Weizmann Institute of Science.
Making use of multiplexed single-cell sequencing, they sought to investigate cellular diversity within a large number of cell lines from the Cancer Cell Line Encyclopedia (CCLE).3 The team made use of scRNA-seq- to sequence pooled cell lines before computationally assigning them to clusters based on bulk RNA/gene expressions and single nucleotide polymorphisms. This method worked well, and cell line assignments were consistent for 98% of the cells. Interestingly, the authors found that cell line culturing, and even co-culture, did not significantly affect the expression patterns, suggesting that intrinsic cellular plasticity is likely to be independent of the time period of culture.
To identify discrete sub-populations of cells, the authors made use of a variety of computational techniques including t-distributed stochastic neighbor embedding and density-based clustering and defined a new term – "recurrent heterogenous program (RHP)" – that clusters cells by functional enrichment of their signature genes.
RHP has proven useful to classify the cell lines. For instance, RHP related to epithelial to mesenchymal transition (EMT) featured strongly in a melanoma cell line. The RHP classification was also helpful to determine drug sensitivity. The authors screened ~2200 bioactive molecules against cells in the high epithelial senescence program (one of the RHP) and showed that this group of cells are sensitive to senolytics drugs and predictive of clinical drug response.
“Through our studies, we found that a large fraction of the heterogeneity that was seen within patient samples can indeed be modelled in traditional cell lines, and we identified the best cell line models for that. We were also able to characterize a number of different 'programs' of cellular heterogeneity that are each shared across many cell lines and thus represent a common feature of certain cancers. Some of these programs are biologically and clinically significant, such as EMT which is important for metastasis and the senescence program that we found was predictive of response to treatment,” said Tirosh. “In the future, we hope to use the identified model systems for following up on particular “programs” of intra-tumor heterogeneity, testing further their regulation and drug responses.”
Single-cell genomics (DNA) sequencing for ocean microbiome
“Marine microorganisms are of essential importance in geochemical cycling, nutrient remineralization and climate formation; they comprise one of the largest microbiomes on Earth and have been extensively explored by meta-omics approaches,” said Ramunas Pachiadaki, senior research scientist at the Bigelow Laboratory for Ocean Sciences.
However, culture-based microbiology methods cannot capture the diversity of environmental microbial. Existing reference genomes represent only 0.4% of marine metagenomes. The lack of better reference genomes is a limiting factor to understanding the microbiome of our environment.
Pachiadaki and team showed that single-cell genomics is a powerful, alternative method for culture-independent representation of microbial genomes.4 They generated and sequenced a large-scale and randomized library of single amplified genomes of marine planktonic bacteria and archaea, inclduing prokaryoplankton that play important roles – such as geochemical cycling and climate formation – from from 28 different field samples. These samples spanned the surface ocean in tropical and subtropical latitudes from 40oS to 40oN.
“Our dataset differs from earlier single-cell genomics projects in both its large scale and randomized, unbiased cell selection strategy, making it suitable for quantitative data mining that is agnostic to the original hypotheses of the study.”
From their samples, the team generated a dataset called Global Oceans Reference Genomes (GORG) Tropics and found that most average nucleotide identity values were <80%, suggesting that few of the prokaryoplankton were related. This is based on the standard of 94-96% in average nucleotide identity values for microbial species to be annotated as related.
The GORG Tropics library represented an average 40% of global prokaryoplankton diversity despite containing samples from the Atlantic and Pacific only. In particular, the authors found strong metagenome fragments from the Indian Ocean which support that microbes can be dispersed longitudinally across oceans.
The team went on to sequence complete and near complete 16S rRNA gene sequences from the samples which generated surprising taxonomic assignment. Most of the prevalent prokaryoplankton lineages have small genomes (one to two mega base pair (Mbp)), low GC content (29-35%) and small cell diameters (0.2-0.5 mm) which is consistent with previous reports. However, there were also a good number of lineages that did not conform to this pattern, as they have average genome sizes (>3 Mbp), GC content >45% and cell diameter >0.4 mm, suggesting specialized ecological niches and divergent adaptations.
Pachiadaki and colleagues also found no evidence of nitrogen fixation pathways in any of the prokaryoplankton that were analyzed, including members of the Planctomycetes phylum, contrary to a recent report. The GORG-Tropics library also provided clues to biosynthetic pathways including species that contained multiple polyketide synthase systems that have showed utility as natural product antibiotics.5
“The approach we employed here enabled the first methodical, lineage-resolved survey of gene clusters involved in energy, nitrogen and secondary metabolisms. This confirmed an earlier finding of the genomic potential for aerobic anoxygenic photosynthesis in Ca. Luxescamonaceae and showed that this lineage of Alphaproteobacteria is substantially more abundant than previously thought. The abundance and diversity of the identified biosynthetic clusters suggested an importance of secondary metabolites in the dilute environment of free-living prokaryoplankton and offered a bioprospecting roadmap for biotechnology applications. We also propose that randomized single-cell genomics should serve as a new, instrumental approach for studies of soil, plant, mammalian and other microbiomes in order to fill our major knowledge gaps about these important microbial players in the functioning of diverse ecosystems and macroorganisms, as well as in climate change and other global processes," said Pachiadaki.
“We are currently using the obtained GORG-Tropics database to analyze global patterns of gene exchange and viral infections in marine microorganisms. We are also expanding the GORG project into the dark ocean – water column below the sunlit surface – which constitutes 90% of the entire ocean volume and remains largely unexplored," he concluded.
1. Haque A, Engel J, Teichmann SA, Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Medicine. 2017;9(1):75. doi:10.1186/s13073-017-0467-4.
2. Zeng W, Chen X, Duren Z, Wang Y, Jiang R, Wong WH. DC3 is a method for deconvolution and coupled clustering from bulk and single-cell genomics data. Nature Communications. 2019;10(1):4613. doi:10.1038/s41467-019-12547-1.
3. Kinker GS, Greenwald AC, Tal R, et al. Pan-cancer single-cell RNA-seq identifies recurring programs of cellular heterogeneity. Nature Genetics. 2020;52(11):1208-1218. doi:10.1038/s41588-020-00726-6.
4. Pachiadaki MG, Brown JM, Brown J, et al. Charting the Complexity of the Marine Microbiome through Single-Cell Genomics. Cell. 2019;179(7):1623-1635.e11. doi:10.1016/j.cell.2019.11.017.
5. Karpiński TM. Marine Macrolides with Antibacterial and/or Antifungal Activity. Mar Drugs. 2019;17(4):241. 2019. doi:10.3390/md17040241.