Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Reconstruction of private genomes through reference-based genotype imputation

Fig. 3

Our haplotype linking strategy leverages shared relatedness patterns across genomic regions to link reconstructed haplotypes from the same individual. We first visualize the distribution of semi-kinship (SK) coefficients across different degrees of relatedness (1st, 2nd, and 3rd), compared to unrelated individuals (A). SK coefficients are separately calculated for non-overlapping 20-Mbp chunks of each chromosome. Markers indicate the mean, and error bars indicate standard deviation. The distributions of the larger (max) and the smaller (min) SK values between the two target haplotypes, compared against their relative, are plotted separately. Elevated SK for related pairs distinguishes reconstructed haplotypes from the same individual, enabling them to be linked by our algorithm. B Left, the average number of haplotypes linked by our algorithm (out of 310 chunks in total), by degree of available relative. Error bars indicate standard deviation. “Incorrect” haplotypes refer to haplotypes assigned to the wrong individual. The rightmost bar represents unassigned (UA) sets, not assigned because the majority of haplotypes did not come from the same individual, with the number of such sets indicated in parentheses. Right, estimated proportion of individuals and their genomes which an adversary could expect to successfully link, given access to an nth-degree relative for those individuals. Each point (pg) on the curve indicates that at least proportion g of the genome (in base pair length) could be linked for proportion p of the samples. These curves show smoothed cumulative distributions summarized in the bar chart (left). C Estimated proportion of RP individuals and the proportion of their genomes we could expect an adversary to be able to link, given access to a relative set containing a particular fraction of the population to which the RP individuals belong. Each point (pg) on the curve indicates that at least proportion g of the genome (in base pair length) could be linked for proportion p of the samples. Our estimation leverages a population genetic model to calculate the probability of an adversary having access to relatives of different degrees, on which basis the degree-specific distributions in (B) are combined with weights

Back to article page