Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Reconstruction of private genomes through reference-based genotype imputation

Fig. 2

Our haplotype reconstruction strategy extracts a substantial number of chromosome-length haplotypes from imputation reference panels. We tested our haplotype reconstruction strategy, leveraging fractional dosage data output by imputation, on different reference panels (RPs), including (A) 1000 Genomes Phase 3 (1KG) and (B) two population-specific subsets of All of Us data: one including 1250 Asian American individuals and another including 5000 Black or African American individuals. C The “discrete imputation” version, utilizing only the discrete predictions of most likely genotype at each site, was also tested on 1KG. A reconstructed haplotype was “correct” if it had no more than 100 variants with mismatching genotypes compared to a RP haplotype and that closest haplotype had not previously been reconstructed correctly. We chose 100 as an example cutoff to allow nearly-perfect reconstructions to be considered correct, but as illustrated in Additional file 1: Fig. S4, any value in the range of 0 to 500 could likely have been chosen with little effect on the results as visualized in these plots. The count of “incorrect” haplotypes was incremented if a reconstructed haplotype had more than 100 genotype differences from the closest RP match and was sufficiently different (> 100 mismatches) from the previous incorrect haplotypes. Horizontal dotted lines represent percentages of the total number of haplotypes in a RP. In all cases, our strategy accurately reconstructed a large portion of the RP using a realistic number of queries. The results shown are for imputing chromosome 20; analogous results for imputing only the first 20-Mbp chunk of chromosome 20 are provided in Additional file 1: Fig. S9

Back to article page