Skip to main content
Fig. 4 | Genome Biology

Fig. 4

From: Dashing: fast and accurate genomic distances with HyperLogLog

Fig. 4

a Relationship between maximum leading zero count (Max LZC) and set size for three randomly-generated sets of 8-bit numbers. The Max LZC roughly estimates the log2 of the set size, though with high variance; here, two of three estimates are off by 2-fold. b Schematic of HyperLogLog sketch. Input items are hashed and hash value is partitioned into prefix p and suffix q. p indexes into the array of HLL registers. A register contains the maximum leading zero count among all suffixes q that mapped there. Register-level estimates are then combined to obtain an overall cardinality estimate. c Estimating cardinalities of sets A and B, and d estimating the cardinality of their union. For intersection cardinalities using inclusion-exclusion principle, estimated set and union cardinalities are combined. e Direct estimation of intersection cardinality with Ertl’s JMLE

Back to article page