Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Reconstruction of private genomes through reference-based genotype imputation

Fig. 1

Overview of genome reconstruction attack on public imputation servers. The attack scenario we demonstrate in this work consists of two stages: haplotype reconstruction (HR; A) and haplotype linking (HL; B). In each round of HR, (1) a pool of queries targeting a short genomic region including a low-frequency variant is constructed and (2) passed to the imputation server, generating imputed data. (3) A classifier processes the output patterns to predict how many reference panel (RP) haplotypes a query matched. (4) If a query matched a small number of RP haplotypes, it is strategically extended to generate fewer matches, then passed back through imputation. (5) If a query matched a single RP haplotype, the corresponding imputed output often exactly reveals that haplotype. A set of reconstructed haplotypes (representing chromosomes or sub-chromosome chunks, depending on the configuration of the imputation server) are passed to HL. (6) HL leverages an auxiliary genomic dataset (a “relative set”) which might contain relatives of RP individuals whose data are among the reconstructed haplotype set (the “target haplotypes”). (7) HL runs an identity-by-descent (IBD) detection algorithm to get shared segments between each possible target haplotype and relative set sample pair. These segments are used to compute the semi-kinship (SK) coefficient for each pair, a measure of relatedness. (8) A probabilistic linking algorithm we developed uses these SK scores to link sets of haplotypes predicted to originate from the same individual. These correspond to RP genomes that are successfully reconstructed by the attack

Back to article page