Skip to main content

Table 1 Runtime penalty for an increasing number of bins in an IBF. The values represent ratios by which the query run time increases when using an IBF with b bins compared to an IBF with 64 bins. We simulated b equally sized (user) bins for \(b\in \{64,256,512,...,32768\}\) of random sequence content (1 GiB in total) and sampled 1  million reads of length 250. We then constructed an IBF over these b bins using 32-mers and 2 hash functions. We conducted this experiment for different false-positive rates, including 0.0001, 0.0125, and 0.3125, which resulted in IBFs of different densities. Next, we measured the time required to count the k-mer occurrences in each of the b bins for all 1 million reads. The reported values are the average of five repetitions

From: Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries

\(\textbf{b}\)

64

128

256

512

1024

2048

4096

8192

16384

32768

\(p_{fpr}=0.0001\)

1.00

1.06

1.35

1.35

1.57

1.96

3.41

5.49

6.81

10.95

\(p_{fpr}=0.0125\)

1.00

1.01

1.17

1.34

1.83

2.70

5.39

8.62

15.05

28.33

\(p_{fpr}=0.3125\)

1.00

1.28

1.55

2.26

3.78

6.54

12.94

24.45

47.63

93.47