Skip to main content

Table 4 Description of microarray datasets

From: A prediction-based resampling method for estimating the number of clusters in a dataset

Dataset

Number of classes

Class sizes

Number of genes

Lymphoma* [1] (cDNA microarrays)

K = 3 classes

B-CLL (29) FL (9) DLBCL (43)

p = 4,682

Leukemia [3] (Affymetrix chips)

K = 3 classes

ALL B-cell (38) ALL T-cell (9) AML (25)

p = 3,571

NCI 60† [6] (cDNA microarrays)

K = 8 classes

Breast (7), CNS (6), colon (7), leukemia (6), melanoma (8), NSCLC (9), ovarian (6), renal (8)

p = 5,244

Melanoma‡ [30] (cDNA microarrays)

K = 2 classes

Group A (19) Group B (12)

p = 3,613

  1. *The DLBCL class for the lymphoma dataset is likely to contain two subclasses.†For the NCI60 data, the two prostate cell lines and the unknown cell line (ADR-RES) were excluded from our analysis because of their small class size. ‡Note that for the first three datasets, tumor classes were known a priori, whereas for the melanoma dataset the two classes were inferred by Bittner et al. [30] by cluster analysis but not confirmed on an independent dataset.