Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries

Fig. 3

Workflow with details on the HIBF Index structure. Using the HIBF is done in three steps. (1) Preprocessing: Based on the size (representative k-mer content) of the input sequences, a layout is computed with the tool Chopper (see the “Availability of data and materials” section). (2) Build: From a given layout, Raptor (see the “Availability of data and materials” section) builds an HIBF index depicted in the middle box. (3) Query: The HIBF index can then be used to query the membership of sequences inside the input samples. The exemplary HIBF with \(t_{max} = 5\) on 11 user bins (UB-A to UB-K) has 3 levels. The first level (L1) is always a single IBF (IBF-1) with exactly \(t_{max}\) technical bins (TB). This IBF stores the full data set, structured in a way such that its size is significantly smaller than that of a normal IBF. The individual boxes inside an IBF represent its TBs, which store the k-mer content of the labeled user bin(s). For example, in IBF-1 the content of UB-A is stored in two TBs (split), UB-B is stored in one TB (single), and UB-C to UB-D as well as UB-E to UB-K are collected in one TB each (merged). Subsequent levels may have several IBFs. Specifically, they will have one IBF for each merged bin on the previous level. For example, on the second level (L2), IBF-2 and IBF-3 correspond to the first and second merged bin of IBF-1, respectively. Note that the IBFs in the layout form a tree, where the root is the top-level IBF and the leaves are formed by IBFs without merged bins

Back to article page