Skip to main content
Figure 4 | Genome Biology

Figure 4

From: Deep sequencing of the X chromosome reveals the proliferation history of colorectal adenomas

Figure 4

Accuracy of Illumina frequency estimation and mutation clustering in the four tumors. (A) Distribution of somatic mutations according to their frequency. Green bars represent clonal mutations. (B) Linear regression curve of the mutation frequency of 10 proportions of mutated allele measured with qPCR and Illumina sequencing. (C) Pipeline for mutation clustering. First, 95% confidence interval was assigned to each mutation. Second, mutations with non-overlapping confidence interval or, in case of overlap, with the smallest confidence interval, were identified as cluster seeds. Third, mutations unambiguously overlapping with only one seed were assigned to that seed. Finally, all mutations overlapping with more than one seed were assigned to a given cluster according to the highest binomial probability. (D) Clusters of mutations in the four samples. In all samples, clusters are highlighted in yellow and numbered progressively. For each cluster, the maximum, the minimum, and the number of mutations are shown. Green clusters contain clonal mutations. (E) Expected and observed somatic mutations for each cluster. The expected number of mutations per cluster was calculated as the number of observed mutations over the fraction of positions with coverage equal or higher than the minimum coverage for those positions. The number of observed mutations reflected that of expected mutations, except for low frequency mutations that were less than expected. These mutations were under-represented in our datasets likely because they are more difficult to identify and to distinguish from random errors. (F) Clustering performance. Shown are the distributions of the number of clusters obtained from 1,000 simulations. At each iteration, the frequency of 40% random mutations was varied within 95% confidence intervals, and mutations were re-clustered with our method. In all samples, the median of the distribution is equal to the observed number of clusters. Except for sample A3, the clustering of all other samples is robust even upon modification of higher percentage of mutations (Figure S4 in Additional file 2).

Back to article page