Skip to main content

Table 1 Chronological order of benchmark datasets for different variant types including point mutation, insertion, deletions, and structural variant for healthy and patient samples

From: Genomic variant benchmark: if you cannot measure it, you cannot improve it

Publication Title

Project name

Year

Doi

PMID

Data

Number of samples

Technology

Status Sample

Cell

Variants

Reference included %

Reference

A comprehensive catalogue of somatic mutations from a human cancer genome

The catalogue of somatic mutations

2010

https://0-doi-org.brum.beds.ac.uk/10.1038/nature08658

20016485

Whole genome sequencing

1 sample (COLO-829)

Illumina GAII

Patient

Somatic

SNV and indel < 50 bp

N/A

NCBI36

A map of human genome variation from population-scale sequencing

1000 Genomes Project

2010

https://0-doi-org.brum.beds.ac.uk/10.1038/nature09534

20981092

Whole genome sequencing, exon-targeted sequencing

882 samples (low-coverage whole-genome sequencing of 179 individuals; high-coverage sequencing of two mother–father–child trios; exon-targeted sequencing of 697 individuals)

454 GS FLX, Illumina Genome Analyzer, and AB SOLiD System

Healthy

Germline

SNV and indel < 50 bp

85

NCBI36

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

GIAB v.2.19

2014

https://0-doi-org.brum.beds.ac.uk/10.1038/nbt.2835

24531798

Whole genome sequencing, exome sequencing

1 sample (NA12878, 11 whole-genome and 3 exome)

454, Complete Genomics, Illumina, Ion Torrent and SOLiD 4

Healthy

Germline

SNV and indel < 50 bp

77

GRCh37

svclassify: a method to establish benchmark structural variant calls

svclassify

2016

https://0-doi-org.brum.beds.ac.uk/10.1186/s12864-016-2366-2

26772178

whole genome sequencing

1 sample (NA12878)

Illumina HiSeq, Moleculo and PacBio

Healthy

Germline

SV and indel < 50 bp

N/A

GRCh37

Extensive sequencing of seven human genomes to characterize benchmark reference materials

GIAB Public Data

2016

https://0-doi-org.brum.beds.ac.uk/10.1038/sdata.2016.25

27271295

Whole genome sequencing

7 samples (HG001-7)

10xGenomics, BioNano, Complete Genomics (paired-end and LFR), GemCode WGS, Illumina (exome and WGS paired-end, mate-pair, and synthetic long reads), Ion Proton exome, ONT, PacBio, and SOLiD

Healthy

Germline

SNV, indel, and SV

N/A

GRCh37

A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree

Platinum Genomes

2017

http://0-dx-doi-org.brum.beds.ac.uk/10.1101/gr.210500.116

27903644

Whole genome sequencing

2 samples (2 individuals with benchmarks, but using short-read WGS from 11 children and 4 grandparents from CEPH pedigree 1463)

Illumina

Healthy

Germline

SNV and Indel < 50 bp

96.7

GRCh37

A synthetic-diploid benchmark for accurate variant calling evaluation

CHM-eval, aka Syndip

2018

https://0-doi-org.brum.beds.ac.uk/10.1038/s41592-018-0054-7

30013044

Whole genome sequencing

2 samples (Synthetic mixture of two effectively haploid hydatidiform mole cell lines)

PacBio CLR

Haploid cell lines

Germline

SNV, indel > 1 bp, and SV

96

GRCh37 and GRCh38

An open resource for accurately benchmarking small variant and reference calls

GIAB v.3.3.2

2019

https://0-doi-org.brum.beds.ac.uk/10.1038/s41587-019-0074-6

30936564

Whole genome sequencing

7 samples (HG001-7)

10 × Genomics, Illumina, Complete Genomics, Ion Torrent and SOLiD 4

Healthy

Germline

SNV and indel < 50 bp

85.4

GRCh37 and GRCh38

A robust benchmark for detection of germline large deletions and insertions

NIST v0.6 SV benchmark set

2020

https://0-doi-org.brum.beds.ac.uk/10.1038/s41587-020-0538-8

32541955

Whole genome sequencing

1 sample (HG002)

10 × Genomics, Illumina, PacBio CLR, ONT

Healthy

Germline

indel >  = 50 bp

86

GRCh37

A diploid assembly-based benchmark for variants in the major histocompatibility complex

MHC benchmark

2020

https://0-doi-org.brum.beds.ac.uk/10.1038/s41467-020-18564-9

32963235

Whole genome sequencing

1 sample (HG002)

10 × Genomics, PacBio HiFi, and ONT

Healthy

Germline

SNV and indel < 50 bp

N/A

GRCh37 and GRCh38

Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing

SEQC2 Tumor-normal

2021

https://0-doi-org.brum.beds.ac.uk/10.1038/s41587-021-00993-6

34504347

Whole genome sequencing, exome sequencing

1 tumor/normal cell line pair

10 × Genomics, Illumina, Ion Torrent, and PacBio HiFi

Patient

Somatic

SNV and indel < 50 bp

N/A

GRCh38

A verified genomic reference sample for assessing performance of cancer panels detecting small variants of low allele frequency

SEQC2 Cancer panel

2021

https://0-doi-org.brum.beds.ac.uk/10.1186/s13059-021-02316-z

33863366

Targeted sequencing

Mixed tumor cell lines

Targeted Illumina Sequencing

Patient

Somatic

SNV and indel

N/A

GRCh37 and GRCh38

Benchmarking challenging small variants with linked and long reads

GIAB v.4.2.1

2022

https://0-doi-org.brum.beds.ac.uk/10.1016/j.xgen.2022.100128

36452119

Whole genome sequencing

7 samples (HG001-7)

10 × Genomics, Complete Genomics, Illumina, PacBio HiFi

Healthy

Germline

SNV and indel < 50 bp

92.2

GRCh37 and GRCh38

Curated variation benchmarks for challenging medically relevant autosomal genes

CMRG v1.00

2022

https://0-doi-org.brum.beds.ac.uk/10.1038/s41587-021-01158-1

35132260

Whole genome sequencing

1 sample (HG002)

PacBio HiFi

Healthy

Germline

SNV and SV

N/A

GRCh37 and GRCh38

A multi-platform reference for somatic structural variation detection

Somatic SV truth set

2022

https://0-doi-org.brum.beds.ac.uk/10.1016/j.xgen.2022.100139

36778136

Whole genome sequencing

1 sample (COLO-829)

10xGenomics, Bionano, Illumina, ONT, PacBio

Patient

Somatic

SV and indel

N/A

GRCh37 and GRCh38

Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet

Chinese Quartet

2022

https://0-doi-org.brum.beds.ac.uk/10.1101/2022.09.08.504083

N/A

Whole genome sequencing

Two monozygotic twin daughters and their biological parents

Illumina, BGI, PacBio, and Oxford Nanopore Technology

Healthy

Germline

SNVs, indels, and SVs

N/A

GRCh38