Skip to main content

Table 1 Statistical significance of the presence of predicted specificity residues in known interfaces of protein-protein and protein-DNA/RNA complexes

From: Determinants of protein function revealed by combinatorial entropy optimization

PDBa

Protein nameb

Superfamilyc

Alignmentd

Se

Ce

Ligandf

Ig

S&Ig

P S&I g

C&Ig

P C&I g

(S+C)&Ig

P (S+C)&I g

1wq1R1 (1 to 166)

Ras

P-loop containing nucleoside triphosphate hydrolases

Superfamily (human) 156/0.90/0.90

13

7

1wq1G, GDP, Mg, AF3

42

8

0.00434

5

0.0118

13

0.00007

1wq1G2 (718 to 1, 037)

P120Gap

GTPase activation domain, GAP

Superfamily (human) 20/0.90/0.90

36

15

1wq1R, GDP, Mg, AF3

33

11

0.00024

6

0.00183

17

0

1fvuA3 (1 to 133)

Botrocetin α-chain

C-type lectin-like

Superfamily (swiss) 64/0.90/0.90

21

14

1fvuB

39

10

0.092

5

0.391

15

0.035

1fvuB4 (401 to 525)

Botrocetin β-chain

C-type lectin-like

Superfamily (swiss) 136/0.90/0.90

29

8

1fvuA, Mg

39

13

0.077

3

0.507

16

0.0668

1a2kA5 (10 to 121)

NTF2

NTF2-like

Pfam 87/0.90/0.90

18

2

1a2kD, GDP, Mg

16

7

0.005

0

1

7

0.0085

1a2kD6 (12 to 170)

RAN

P-loop containing nucleoside triphosphate hydrolases

Superfamily (human) 170/0.90/0.90

17

7

1a2kA, GDP, Mg

27

6

0.0445

6

0.00009

12

0.00004

1i2mB7 (24 to 417)

RCC1

RCC1/BLIP-II

Superfamily (nrd90) 77/0.90/0.90

45

23

1i2mA

37

10

0.008

0

1

10

0.089

1i2mA8 (12 to 170)

RAN

P-loop containing nucleoside triphosphate hydrolases

Superfamily (human) 170/0.90/0.90

17

7

1i2mB

42

6

0.096

1

0.8

7

0.18

1rrpB9 (17 to 150)

NUP358

PH domain-like

Superfamily (nrd90+swiss) 59/0.90/0.90

31

3

1rrpA

51

16

0.075

2

0.323

18

0.032

1rrpA10 (12 to 170)

RAN

P-loop containing nucleoside triphosphate hydrolases

Superfamily (human) 170/0.90/0.90

17

7

1rrpB, GNP, Mg

53

3

0.964

6

0.0058

9

0.4

1blxB11 (41 to 72)

P19INK4D

Ankyrin repeat

PFAM (human) 1043/0.95/0.95

7

3

1blxA

11

7

0

0

1

7

0

1blxB11 (73 to 105)

P19INK4D

Ankyrin repeat

PFAM (human) 1043/0.95/0.95

7

3

1blxA

7

5

0

0

1

5

0

1blxB11 (106 to 137)

P19INK4D

Ankyrin repeat

PFAM (human) 1043/0.95/0.95

7

3

1blxA

1

1

0.21

0

1

1

0.3

1blxA12 (5 to 309)

CDK6

Protein kinase-like (PK-like)

Superfamily (human) 81/0.90/0.95

31

25

1blxB

24

4

0.19

0

1

4

0.19

2cciA13 (4 to 286)

CDK2

Protein kinase-like (PK-like)

Protein Kinase Resource 390

20

22

1h27B1, 1h27B2, TPO

78

13

0.0003

11

0.0173

24

0

2cciB114 (181 to 307)

Cyclin A

Cyclin-like

Pfam N-cyclin 379/0.95/0.90

17

16

2cciA, 2cciF, TPO

48

12

0.00356

7

0.396

19

0.0063

2cciB215 (309 to 431)

Cyclin A

Cyclin-like

Pfam C-cyclin 238/95/90

14

3

2cciA, TPO

4

2

0.063

0

1

2

0.092

1n7tA21 (14 to 98)

Erbin PDZ domain

PDZ domain

PFAM (human) 237/0.90/0.90

10

3

peptide

17

6

0.0036

1

0.493

7

0.0032

1g4dA16 (13 to 81)

Repressor protein C

Putative DNA-binding domain

Superfamily (nrd90) 244/0.90/0.95

12

0

DNA

25

9

0.0034

0

n/a

9

0.0034

1e3oC17 (104 to 160)

Oct-1 Pou

lambda repressor-like DNA-binding domains

Superfamily (swiss) 397/0.90/0.90

4

5

DNA

17

4

0.00603

3

0.151

7

0.0018

2up1A18 (10 to 92)

Hnrnp A1, Up1

RNA-binding domain (RBD)

Superfamily (swiss) 552/0.90/1.0

16

2

DNA

21

10

0.001

0

1

10

0.00166

1ec6A19 (4 to 90)

NOVA-2

Eukaryotic type KH-domain (KH-domain type I)

Superfamily (nrd90+swiss) 463/0.90/0.80

12

2

RNA

24

7

0.019

2

0.074

9

0.0019

1serB20 (501 to 610)

Seryl tRNA synthetase

tRNA-binding arm

Superfamily (swiss) 96/0.90/0.90

18

8

tRNA

19

7

0.022

2

0.412

9

0.0106

  1. aProtein Data Bank (PDB) four character code followed by the chain identifier. bName of the protein chain in the title of PDB file. cName of the corresponding Structural Classification of Proteins (SCOP) Superfamily. dSource of the alignment (Superfamily or Protein Families [PFAM]); actual number of homologous sequences used in calculations, and the fractional values of the selection filters used to clean the alignments: sequence identity and gap. eS and C represent the number of specificity and conserved residues, respectively. fPDB identifiers of the molecular fragments and co-factors (excluding water) interacting with the corresponding protein. gI, S&I, C&I, (S+C)&I stand, respectively, for the total number of interface residues (selected under ≤4.5 Å atom-atom distance threshold between ligands and the protein), the number of specificity residues in the interface, the number of conserved residues in the interface, and the number of specificity and conserved residues in the interface. PS&I, PC&I, and P(S+C)&I are the corresponding probabilities of obtaining these numbers by chance. Low values of the probabilities indicate good agreement between prediction and observation. Significant P values (< 0.05) are in bold.