Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments

Fig. 2

Comparison of UMI collision and sequencing error correction methods. Comparison of UMI collision adjustment and UMI correction algorithms is shown using the 10x post-transplant BMMC dataset (dataset 7). a The scatter plot shows percentage error (y-axis) in estimation of the molecular counts for different genes using computationally trimmed UMIs (down to 6–9-nucleotide lengths, as designated by color) from their original 10-nucleotide length, as a function of the full-length UMI estimate (x-axis; see “Methods”). The line shows spline-smoothed dependency with the 95% confidence band. Points show median y value for a given x. The errors result from two opposing trends, with UMI sequencing errors inflating the resulting count estimates, and UMI collisions deflating the estimates. Shortened UMIs result in a larger number of collisions. b The effect of different UMI collision corrections is shown on the 6-nucleotide trimmed UMIs. c Comparison of different UMI sequence error correction methods is shown for the 8-nucleotide trimmed UMIs. UMI collisions were corrected using an empirical approach in all cases except for “no correction”. d We estimated theoretical distribution of edit distances (x-axis) between two randomly sampled UMIs. The theoretical probability of observing a given edit distance is shown as a number above each edit distance group. The histograms show relative absolute difference between this theoretical distribution and observed distributions after the different UMI correction algorithms. For each method and edit distance, the y-axis shows the absolute difference between the observed and theoretical distribution, expressed as a fraction of the theoretical probability of observing that edit distance. e Dependency of the magnitude of UMI correction (y-axis) on the expression magnitude without correction (x-axis) is shown. Each point represents a single gene within a cell, pulled across all cells. Genes with expression magnitude < 10 were omitted

Back to article page