Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: MeDeCom: discovery and quantification of latent components of heterogeneous methylomes

Fig. 1

Computational framework of MeDeCom. a The conceptional background of MeDeCom. The measured methylomes (e.g., as 450K data, shown in the center) can be seen as a composition of binary single-cell methylome signatures (C) with their frequencies in each sample (F). Single-cell signatures of a particular cell type form a cell-type specific cluster in C. MeDeCom decomposes the measured methylation data into a matrix T, representing latent methylation components (LMCs), which in turn correspond to the averaged cell methylomes of a cell-type-specific cluster in C, and into A, the relative proportions of LMCs (respectively, cell types) in the sample. b Histograms of the values in the estimated T matrices for the 500 most varying CpG sites for the cell reconstruction experiment of neuronal cells (see text). We observe that both MeDeCom with no regularization (λ=0), and RefFreeCellMix are unable to match the distribution of the reference profiles (ground truth), which is biased towards zero and one. However, MeDeCom with our regularizer (parameter λ is chosen by cross-validation) biases the entries of the LMCs towards zero (unmethylated) and one (methylated). Thus, the distribution of the entries of the estimated LMCs matches approximately the ground truth leading to a significantly better estimation of T as well as A. c-d Geometric intuition about the different methods for a fully synthetic example of two CpGs (n=30, k=3). Each LMC corresponds to a column of T and, thus, is a point in [0,1]2. c shows the estimated LMCs (squares) of RefFreeCellMix and MeDeCom with λ=0 and λ=10−2, and the ground truth (black squares) together with the data (blue dots). The data points are mixtures of the ground truth points and, thus, lie in the convex hull of the latter. Factorization problem (2) (see “Methods”) is ill-posed as the solution is not unique. MeDeCom with appropriate regularization estimates T (red squares) very accurately as the solution is biased towards zero or one, whereas RefFreeCellMix and MeDeCom with λ=0 are unable to find the correct LMCs. This also leads to huge errors in the estimation of the proportions as visualized by the ternary plot for ten randomly selected data points (d). In contrast, MeDeCom with appropriate regularization estimates A very accurately

Back to article page