Skip to main content
Fig. 7 | Genome Biology

Fig. 7

From: MSV: a modular structural variant caller that reveals nested and complex rearrangements by unifying breakends inferred directly from reads

Fig. 7

A visual overview of our approach for inferring a folded adjacency matrix from reads. A introduces a reference genome, a sequenced genome, and a history of basic SV (consisting of a deletion of the section \(V\), inversion of \(X\), and insertion of \(I\)) that transforms the former genome into the latter genome. The black-boxed numbers indicate the order of the breakends on the sequenced genome. A tilde over a number expresses that the corresponding breakend is on the reverse strand of the sequenced genome. The arrowheads of the genome sections \(U\), \(V\), \(W\), \(X\), \(Y\), and \(Z\) symbolize their direction on the reference; the colored boxes above and below are their nucleotides (see Additional file 2) on forward and reverse strand, respectively. B shows the genomic rearrangement of A in form of a diagrammatic dot-plot (details on these dot-plots are in Additional file 2). Each of the breakend pairs \(a,b,c\), and \(d\) of A is indicated via an equally labeled arrow. C displays the skew-symmetric graph for the genomic rearrangement of B. The dashed box on the graph highlights an exemplary pair of mate vertices. The labeled edges of the graph correspond to the equally labeled breakend pairs of A. The weights \(I\) on the edges labeled \(d\) represent the inserted sequence on the forward and reverse strands. D introduces three error-free reads \(r1,r2\), and \(r3\). Their locations on the sequenced genome are visualized via gray boxes and their MEMs are displayed by colored arrows. \(I\) is not covered by seeds because it is an insertion. E comprises the unfolded adjacency matrix for the skew-symmetric graph in C. The matrix is inferred from the three reads of D, where the MEMs can be associated via their numbers. For example, the entry \(a\) corresponds to the two breakends (1) and (2), which are discovered via the MEMs 1.1 and 1.2 of the read \(r1\). The first and last seed of each read has no breakend on the y-axis and x-axis, respectively. Such seeds are distinguished by using thin arrows. The edge weights \(I\) and \(\sim I\) on the mate edges labeled \(d\) denote the inserted sequences on the forward and reverse strands, respectively. The coloring scheme for the matrix entries memorizes the strand information of edges as described in the “Folding of Adjacency Matrices” chapter of the methods section. F visualizes the adjacency matrix folding scheme of our approach. A step-by-step description of the folding for the matrix in E is given in Additional file 11. G depicts the folded form of the matrix E. In the folded form, the forward and reverse strands are unified. Therefore, all equally labeled entries of the matrix in E appear as a single entry in G

Back to article page