From: Genomic variant benchmark: if you cannot measure it, you cannot improve it
Publication Title | Project name | Year | Doi | PMID | Data | Number of samples | Technology | Status Sample | Cell | Variants | Reference included % | Reference |
---|---|---|---|---|---|---|---|---|---|---|---|---|
A comprehensive catalogue of somatic mutations from a human cancer genome | The catalogue of somatic mutations | 2010 | 20016485 | Whole genome sequencing | 1 sample (COLO-829) | Illumina GAII | Patient | Somatic | SNV and indel < 50 bp | N/A | NCBI36 | |
A map of human genome variation from population-scale sequencing | 1000 Genomes Project | 2010 | 20981092 | Whole genome sequencing, exon-targeted sequencing | 882 samples (low-coverage whole-genome sequencing of 179 individuals; high-coverage sequencing of two mother–father–child trios; exon-targeted sequencing of 697 individuals) | 454 GS FLX, Illumina Genome Analyzer, and AB SOLiD System | Healthy | Germline | SNV and indel < 50 bp | 85 | NCBI36 | |
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls | GIAB v.2.19 | 2014 | 24531798 | Whole genome sequencing, exome sequencing | 1 sample (NA12878, 11 whole-genome and 3 exome) | 454, Complete Genomics, Illumina, Ion Torrent and SOLiD 4 | Healthy | Germline | SNV and indel < 50 bp | 77 | GRCh37 | |
svclassify: a method to establish benchmark structural variant calls | svclassify | 2016 | 26772178 | whole genome sequencing | 1 sample (NA12878) | Illumina HiSeq, Moleculo and PacBio | Healthy | Germline | SV and indel < 50 bp | N/A | GRCh37 | |
Extensive sequencing of seven human genomes to characterize benchmark reference materials | GIAB Public Data | 2016 | 27271295 | Whole genome sequencing | 7 samples (HG001-7) | 10xGenomics, BioNano, Complete Genomics (paired-end and LFR), GemCode WGS, Illumina (exome and WGS paired-end, mate-pair, and synthetic long reads), Ion Proton exome, ONT, PacBio, and SOLiD | Healthy | Germline | SNV, indel, and SV | N/A | GRCh37 | |
A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree | Platinum Genomes | 2017 | 27903644 | Whole genome sequencing | 2 samples (2 individuals with benchmarks, but using short-read WGS from 11 children and 4 grandparents from CEPH pedigree 1463) | Illumina | Healthy | Germline | SNV and Indel < 50 bp | 96.7 | GRCh37 | |
A synthetic-diploid benchmark for accurate variant calling evaluation | CHM-eval, aka Syndip | 2018 | 30013044 | Whole genome sequencing | 2 samples (Synthetic mixture of two effectively haploid hydatidiform mole cell lines) | PacBio CLR | Haploid cell lines | Germline | SNV, indel > 1 bp, and SV | 96 | GRCh37 and GRCh38 | |
An open resource for accurately benchmarking small variant and reference calls | GIAB v.3.3.2 | 2019 | 30936564 | Whole genome sequencing | 7 samples (HG001-7) | 10 × Genomics, Illumina, Complete Genomics, Ion Torrent and SOLiD 4 | Healthy | Germline | SNV and indel < 50 bp | 85.4 | GRCh37 and GRCh38 | |
A robust benchmark for detection of germline large deletions and insertions | NIST v0.6 SV benchmark set | 2020 | 32541955 | Whole genome sequencing | 1 sample (HG002) | 10 × Genomics, Illumina, PacBio CLR, ONT | Healthy | Germline | indel >  = 50 bp | 86 | GRCh37 | |
A diploid assembly-based benchmark for variants in the major histocompatibility complex | MHC benchmark | 2020 | https://0-doi-org.brum.beds.ac.uk/10.1038/s41467-020-18564-9 | 32963235 | Whole genome sequencing | 1 sample (HG002) | 10 × Genomics, PacBio HiFi, and ONT | Healthy | Germline | SNV and indel < 50 bp | N/A | GRCh37 and GRCh38 |
Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing | SEQC2 Tumor-normal | 2021 | https://0-doi-org.brum.beds.ac.uk/10.1038/s41587-021-00993-6 | 34504347 | Whole genome sequencing, exome sequencing | 1 tumor/normal cell line pair | 10 × Genomics, Illumina, Ion Torrent, and PacBio HiFi | Patient | Somatic | SNV and indel < 50 bp | N/A | GRCh38 |
A verified genomic reference sample for assessing performance of cancer panels detecting small variants of low allele frequency | SEQC2 Cancer panel | 2021 | https://0-doi-org.brum.beds.ac.uk/10.1186/s13059-021-02316-z | 33863366 | Targeted sequencing | Mixed tumor cell lines | Targeted Illumina Sequencing | Patient | Somatic | SNV and indel | N/A | GRCh37 and GRCh38 |
Benchmarking challenging small variants with linked and long reads | GIAB v.4.2.1 | 2022 | https://0-doi-org.brum.beds.ac.uk/10.1016/j.xgen.2022.100128 | 36452119 | Whole genome sequencing | 7 samples (HG001-7) | 10 × Genomics, Complete Genomics, Illumina, PacBio HiFi | Healthy | Germline | SNV and indel < 50 bp | 92.2 | GRCh37 and GRCh38 |
Curated variation benchmarks for challenging medically relevant autosomal genes | CMRG v1.00 | 2022 | https://0-doi-org.brum.beds.ac.uk/10.1038/s41587-021-01158-1 | 35132260 | Whole genome sequencing | 1 sample (HG002) | PacBio HiFi | Healthy | Germline | SNV and SV | N/A | GRCh37 and GRCh38 |
A multi-platform reference for somatic structural variation detection | Somatic SV truth set | 2022 | https://0-doi-org.brum.beds.ac.uk/10.1016/j.xgen.2022.100139 | 36778136 | Whole genome sequencing | 1 sample (COLO-829) | 10xGenomics, Bionano, Illumina, ONT, PacBio | Patient | Somatic | SV and indel | N/A | GRCh37 and GRCh38 |
Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet | Chinese Quartet | 2022 | N/A | Whole genome sequencing | Two monozygotic twin daughters and their biological parents | Illumina, BGI, PacBio, and Oxford Nanopore Technology | Healthy | Germline | SNVs, indels, and SVs | N/A | GRCh38 |