Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Identification of cell barcodes from long-read single-cell RNA-seq with BLAZE

Fig. 1

Experimental overview and comparison of identified cell barcodes. A BLAZE Workflow. Step 1: locate putative barcodes by first locating the adaptor in each read. Putative barcodes include those originating from different cells and empty droplets. In the schematic, putative barcodes with the same color come from the same original cell/droplet. Black blocks on putative barcodes represent basecalling errors. Step 2: select high-quality putative barcodes. Bases representing sequencing errors tend to have low quality scores. Putative barcodes with minQ < 15 are filtered out (faded in the figure) and the majority of the remaining putative barcodes are expected to have no errors. Step 3: identify cell-associated barcodes. BLAZE counts and ranks unique high-quality putative barcodes and outputs a list of cell-associated barcodes whose counts pass a quantile-based threshold. B Schematic of experimental design. Human induced pluripotent stem cells (hiPSC) undergoing cortical neuronal differentiation were dissociated into a single-cell suspension and processed to generate single-cell full-length cDNA. Full-length cDNA was sequenced using both short and long-read methods and barcode whitelists generated using Cell Ranger, BLAZE, and Sockeye followed by gene and isoform quantification and clustering. Three nanopore sequencing runs were performed on the same cDNA sample, a higher-depth PromethION run, a lower-depth GridION run, and a higher accuracy run using the Q20 protocol on the GridION. C Barcode upset plot comparing the different whitelists. The bar chart on the left shows the total number of barcodes found by each tool. The bar chart on the top shows the number of barcodes in the intersection of whitelists from specific combinations of methods. The dots and lines underneath show the combinations. The colors of the combinations are used to distinguish barcodes in Fig. 1D. D Barcode rank plot. Unique barcodes are ranked based on the counts output by each method and colored by which method(s) included each barcode in their barcode whitelist(s). The colors for different combinations of methods follow those in C, and barcodes not included in any of the whitelists are in gray. Cell Ranger short-read counts, Sockeye long-read counts, and BLAZE long-read counts shown on left, middle, and right knee plots, respectively. Sockeye and BLAZE analyze the same dataset. Cell Ranger analyzes counts from a short-read library, deriving from the same original cDNA. Unique barcodes are ranked on the x-axis based on the number of reads/unique molecules observed for each (y-axis). Shifts on the x-axis are intentionally added to make the dots with different colors non-overlapping. Note that these three methods generate counts in different ways so the three plots have different y-axis labels

Back to article page