Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: PAUSE: principled feature attribution for unsupervised gene expression analysis

Fig. 1

Principled attributions complement biologically-structured networks to create more interpretable unsupervised models. a An outline of a general workflow of unsupervised analysis of gene expression data, comparing classical linear approaches (left in each subpanel) and deep learning approaches (right in each subpanel). (1) First, a dimensionality reduction algorithm such as PCA (left) or a deep autoencoder (right) is applied to a dataset of gene expression values to learn a low-dimensional representation. (2) After this low-dimensional representation is learned, the learned dimensions must be ranked by their importance. This ranking is inherently provided in PCA, which sequentially maximizes directions of unexplained variance in the data. There currently are no principled approaches to provide this ranking in deep models, which is the gap in the literature filled by our novel loss attribution. (3) After finding the most important latent dimensions, the biological meaning of these dimensions is interpreted. In PCA (left), the contribution of different genes to each dimension can be found by examining the magnitude of the gene loadings. For deep learning models, feature attribution methods can be applied to determine gene contributions. b In standard autoencoders, the learned latent variables have opaque meanings, as their relationships with input genes are unknown. Biologically-constrained models increase the interpretability of latent variables by using sparse connections or regularization to ensure that latent dimensions correspond to pre-defined pathways. c We apply principled attribution methods to help rank the latent dimensions of autoencoder models and to interpret the biological meaning of the most important dimensions. Attributing the model’s reconstruction error to the latent dimensions quantifies the importance of each latent dimension. Attributing the output of each latent dimension to the input genes quantifies the contribution of each input gene to each learned pathway

Back to article page