Skip to main content

Table 2 Brief description of function prediction methods used

From: A critical assessment of Mus musculusgene function prediction using integrated genomic evidence

Submission identifier

Approach

Name

Author initials

A

Compute several kernel matrices (SVM) for each data matrix, train one GO term specific SVM per kernel, and map SVMs' discriminants to probabilities using logistic regression

Calibrated ensembles of SVMs

GO, GL, JQ, CG, MJ, and WSN

B

Four different kernels are used per data set. Integration of best kernels and data sources is done using the kernel logistic regression model

Kernel logistic regression [55]

HL, MD, TC, and FS

C

Construct similarity kernels, assign a weight to each kernel using linear regression, combine the weighted kernels, and use a graph based algorithm to obtain the score vector

geneMANIA

SM, DW-F, CG, DR, and QM

D

Train SVM classifiers on each GO term and individual data sets, construct several Bayesian networks that incorporate diverse data sources and hierarchical relationships, and chose for each GO term the Bayes net or the SVM yielding the highest AUC

Multi-label hierarchical classification [56] and Bayesian integration

YG, CLM, ZB, and OGT

E

Combination of an ensemble of classifiers (naïve Bayes, decision tree, and boosted tree) with guilt-by-association in a functional linkage network, choosing the maximum score

Combination of classifier ensemble and gene network

WKK, CK, and EMM

F

Code the relationship between functional similarity and the data into a functional linkage graph and predict gene functions using Boltzmann machine and simulated annealing

GeneFAS (gene function annotation system) [2, 3]

TJ, CZ, GNL, and DX

G

Two methods with scores combined by logistic regression: guilt-by-association using a weighted functional linkage graph generated by probabilistic decision trees; and random forests trained on all binary gene attributes

Funckenstein

WT, MT, FDG, and FPR

H

Pairwise similarity features for gene pairs were derived from the available data. A Random Forest classifier was trained using pairs of genes for each GO term. Predictions are based on similarity between the query gene and the positive examples for that GO term

Function prediction through query retrieval

YQ, JK, and ZB

I

Construct an interaction network per data set, merge data set graphs into a single graph, and apply a belief propagation algorithm to compute the probability for each protein to have a specific function given the functions assigned to the proteins in the rest of the graph

Function prediction with message passing algorithms [57]

ML and AP

  1. AUC, area under the receiver operating characteristic curve; GO, Gene Ontology.