. Author manuscript; available in PMC: 2008 Oct 10.

Published in final edited form as:Protein J. 2008 Jan;27(1):59–70. doi:10.1007/s10930-007-9108-x

Characterization of Protein–Protein Interfaces

Changhui Yan^1,^✉,Feihong Wu^2,^3,⁴,Robert L Jernigan^2,^3,^5,⁶,Drena Dobbs^2,^3,^6,⁷,Vasant Honavar^2,^3,^4,⁶

¹ Department of Computer Science, Utah State University, 4205 Old Main Hill, Logan, UT 84341, USA E-mail: cyan@cc.usu.edu

² Bioinformatics and Computational Biology Graduate Program, Iowa State University, Ames, IA 50010, USA

³ Center for Computational Intelligence, Learning, and Discovery, Iowa State University, Ames, IA 50010, USA

⁴ Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA 50010, USA

⁵ Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50010, USA

⁶ Laurence H Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50010, USA

⁷ Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50010, USA

^✉

Corresponding author.

PMC Copyright notice

PMCID: PMC2566606 NIHMSID: NIHMS67490 PMID:17851740

Abstract

We analyze the characteristics of protein–protein interfaces using the largest datasets available from the Protein Data Bank (PDB). We start with a comparison of interfaces with protein cores and non-interface surfaces. The results show that interfaces differ from protein cores and non-interface surfaces in residue composition, sequence entropy, and secondary structure. Since interfaces, protein cores, and non-interface surfaces have different solvent accessibilities, it is important to investigate whether the observed differences are due to the differences in solvent accessibility or differences in functionality. We separate out the effect of solvent accessibility by comparing interfaces with a set of residues having the same solvent accessibility as the interfaces. This strategy reveals residue distribution propensities that are not observable by comparing interfaces with protein cores and non-interface surfaces. Our conclusions are that there are larger numbers of hydrophobic residues, particularly aromatic residues, in interfaces, and the interactions apparently favored in interfaces include the opposite charge pairs and hydrophobic pairs. Surprisingly, Pro-Trp pairs are over represented in interfaces, presumably because of favorable geometries. The analysis is repeated using three datasets having different constraints on sequence similarity and structure quality. Consistent results are obtained across these datasets. We have also investigated separately the characteristics of heteromeric interfaces and homomeric interfaces.

Keywords: Heteromeric interfaces, Homomeric interfaces, Residue composition, Interface propensities, Contact preferences

1 Introduction

Protein–protein interactions play crucial roles in many biological functions. Elucidating the mechanisms of the interactions presents a challenge in molecular biology. One general approach to study the interaction between two proteins is to obtain a crystal structure of the protein–protein complex and then investigate the atomic properties of the protein–protein interface. Many studies have analyzed the characteristics of protein–protein interfaces in an effort to search for the factors that contribute to the affinity and specificity of protein–protein interactions [1–5]. These analyses show that the two surfaces of a protein–protein interface usually show high degrees of geometric and chemical complementarities. Electrostatic forces are also believed to play an important role in protein–protein interactions [6–8]. Several studies have shown that interfaces are biased in residue composition and inter-residue contacts [9,10]. Miyazawa and Jernigan [11] developed a method to extract inter-residue potentials from frequencies of contacts between different residues in proteins. Later, Keskin et al. [12] showed that the potentials of mean force for inter-residue interactions hold for both intra-molecular and inter-molecular interactions. The important role of hydrophobic forces in protein–protein interactions has been confirmed by several researchers [13,14]. However, a recent study [15] argues that it is the hydrophilic rather than the hydrophobic effect that makes the major contribution to protein–protein association. Another well-characterized property of interfaces is the existence of ‘‘hot-spot’’ residues, which are residues that make the largest contributions to complex formation [16].

Some studies divided the protein–protein interfaces into several subtypes and analyzed the characteristics of each subtype. Jones and Thornton [17] proposed a distinction between obligatory interactions and transient interactions. Using machine-learning methods, Block et al. [18] were able to extract physicochemical properties that are predictive of obligatory and transient interactions. Ofran and Rost [10] divided protein–protein interfaces into six types: intra-domain, domain–domain, homo-obligomer, hetero-obligomer, homo-complex, and hetero-complex. Chakrabarti and Janin [19] dissected the interfaces into a core and a rim based on solvent accessibility. Cho et al. [20] show that different functional types of protein–protein interactions have different molecular interactions specific to them.

We extracted all protein–protein interfaces from the Protein Data Bank (PDB) [21] and obtained three datasets that are much larger than any other dataset used in previous studies. Each protein was divided into three disjoint groups: interface, protein core, and non-interface surface. Comparisons show that the three groups are significantly different in residue composition, sequence entropy, and secondary structure. Since interfaces, protein cores, and non-interface surfaces have different solvent accessibilities, it is not known whether these differences are due to the differences in solvent accessibility or differences in functionality. To exclude the effect of solvent accessibility, we compared the interfaces with a set of residues that was randomly chosen from the overall residues and had the same solvent accessibility as the interfaces. The results show a clear trend that hydrophobic residues and aromatic residues are more frequent in the interfaces and hydrophilic residues are less common. Note that this trend cannot been found by comparing interfaces with protein cores and non-interface surfaces. We repeat the analysis using the three datasets and consistent results were obtained. We divided the interfaces into heteromeric interfaces and homomeric interfaces based on the similarities of the interacting chains. Comparisons show significant differences between the two types of interfaces in residue composition, sequence entropy, secondary structure, size, and contact preferences.

2 Materials and Methods

2.1 Selecting Structures for Dataset100, Dataset30, and Dataset30_3

All protein complexes in the PDB with at least two protein chains having at least 50 amino acids in each chain were obtained. We tried different thresholds of minimum length ranging from 20 to 100 amino acids. No obvious differences in interface characteristics were observed. In order to eliminate crystal packing, PDB complexes were split into individual quaternary structures based on the Protein Quaternary Structure (PQS) database [22]. In the construction of PQS database, a procedure was used to discriminate crystal packing and biological interfaces based on buried area, number of buried residue, a delta-solvation energy of folding, number of salt bridges at the interface and the presence of disulphide bridge. Then within each quaternary structure, a pair of protein chains is considered as interacting if the buried area on one chain is at least 200 Å². The same threshold of buried area was used in the SPIN-PP database (http://honiglab.cpmc.columbia.edu/SPIN/intro.html). A minimum buried area of 400 Å² on one chain has been used in some studies to define biological interfaces (reviewed in [3]). In this study, we also tried a minimum buried area of 400 Å². The only difference observed is that in the distribution of interface sizes, fewer interfaces have small sizes, since some small interfaces have been removed. No obvious differences in other properties were observed. The buried area was computed using NACCESS [23,24]. A dataset of interfaces was thus obtained from the set of quaternary structures. Then, sequence similarity information was obtained from the sequence clusters provided by the PDB (ftp://ftp.rcsb.org/pub/pdb/derived_data/NR/). The similarity between two interfaces is defined as the highest sequence similarity between the protein chains of the interfaces. First, redundant data were removed so that there were no identical interfaces in the dataset. The resulting dataset consists of 6,545 interfaces. This dataset is referred to asDataset100, with 100 indicating that the similarity between any two pairs is below 100%. Interfaces with high similarity were removed fromDataset100 so that the similarity between any two interfaces is below 30%. The resulting dataset (referred to asDataset30) has 2,557 pairs of interacting chains. Then, all the structures having resolution >3 Å were removed from Dataset30. The resulting dataset (referred to asDataset30_3) consists of 2,310 pairs of interacting chains.

2.2 Protein Cores, Interfaces, and Non-interface Surfaces

We defined residue contacts as described in Ofran and Rost [10]: two residues are in contact if the distance between them is less than 6 Å. Interface residues of a protein are the residues that contact with residues from the interacting proteins. Protein core residues are the non-interface residues whose relative solvent accessibility (rASA) is less than 25%. Non-interface surface residues are the non-interface residues whose rASA is at least 25%. The rASA of residues was calculated using the NACCESS program [23,24]. As all the other studies, interface residues are defined based on the known interaction surfaces on PDB complexes. Some non-interface residues obtained may act as interface residues in other yet unknown interactions. To evaluate the effect of this on the analysis results, the complete knowledge of interaction sites on proteins must be known. Unfortunately, the data we have today are far from complete.

2.3 Heteromeric Interfaces and Homomeric Interfaces

An interface is a homomeric interface if the two interacting chains have a sequence identity greater than 95% and otherwise, it is a heteromeric interface. We usedDataset100 to compare the properties of heteromeric interfaces and homomeric interfaces.Dataset100 contains 3,990 homomeric interfaces and 2,555 heteromeric interfaces.

2.4 Interface Propensity (Raw Interface Propensity, RIP)

LetF_i be the number of residues of typei in the dataset, andf_i be the number of residues of typei in the interfaces,w_i =fi/Σ_mf_m andW_i =Fi/Σ_mF_m. Theinterface propensity of residuei is given by log₂ (w_i/W_i). A residue’s propensities for protein cores and non-interface surfaces are computed withw_i replaced by the fractions of residue typei in the protein cores and non-interface surfaces, respectively.

2.5 Normalized Interface Propensity (NIP)

Residues are randomly extracted from the overall residues so that the resulting set had the same relative solvent accessibility (rASA) distribution as the interface residues. The resulting set will be referred to asSetrASA, with rASA denoting that the dataset has the same rASA distribution as the interfaces. Lets_i be the number of residues of typei in the SetrASA, andS_i =s_i/Σ_m s_m. Thenormalized interface propensity of residue typei is given by log₂ (w_i/S_i), wherew_i is defined as above.

2.6 Contact Preferences

LetC_ij be the number of interface-crossing contacts formed by residues of typesi andj. The raw contact frequency between residues of typesi andj is given by (Cij/Σ_m,nC_mn) Thecontact preference between residues of typesi andj is given by log₂ ((C_ij/Σ_m,nC_mn)/(w_i ×w_j)), wherew_i andw_j are defined as above. Note that contact preference is given by the logarithm of raw contact frequency divided by the frequencies of residue typesi andj.

3 Results

3.1 Characteristics of Interfaces

Each protein is divided into three disjoint groups: protein core, interface, and non-interface surface. Interface properties including residue composition, secondary structure, sequence entropy, contact preferences, and size are analyzed usingDataset100.

3.1.1 Residue Composition

Figure 1A compares the residue compositions of protein cores, interfaces, and non-interface surfaces. Residues are placed in the order of increasing hydrophobicity based on the Kyte and Doolittle hydropathy index [25]. The comparisons show that among the three groups, protein cores have the highest fractions of hydrophobic residues (e.g., Met, Cys, Phe, Ile, Leu, and Val) and non-interface surfaces have the least. This indicates that hydrophobic residues are preferred in protein cores and disfavored for non-interface surfaces. The opposite trend is observed for hydrophilic residues (e.g., Arg, Lys, Glu, and Asp).Figure 1B shows that all residue types have opposite propensities for protein cores and non-interface surfaces, and with His, Tyr, and Gly being notable exceptions, the propensities for interfaces are intermediate between those for protein cores and non-interface surfaces.

Fig. 1 — Residue composition and residue propensities for different locations. (A) Residue compositions of protein cores, interfaces, and non-interface surfaces. (B) Residue propensities for protein cores, interfaces, and non-interface surfaces. Residues are ordered by their increasing hydrophobicity based on the Kyte and Doolittle hydropathy index [25]. The results are shown forDataset100. The figures show that hydrophobic residues are more frequent in protein cores and less common on non-interface surfaces. The opposite trend is observed for hydrophilic residues. Residue propensities for interfaces are intermediate between those for protein cores and non-interface surfaces, with His, Tyr, and Gly being notable exceptions

3.1.2 Sequence Entropy

Sequence entropy values for residues are extracted from the HSSP database (http://www.cmbi.kun.nl/gv/hssp/). The sequence entropy shows the conservation at each residue position. It is normalized over the range of 0–100, with the lowest sequence entropy values corresponding to the most conserved positions.Figure 2 compares the sequence entropy distributions of protein cores, interfaces, and non-interface surfaces. The comparisons show that among the three groups, protein cores have the highest fraction of residues in the low sequence entropy region (sequence entropy <40), and non-interface surfaces have the least. In the high sequence entropy region (sequence entropy ≥40), the opposite trend is observed. LetA»B denotesA is more conserved thanB. The results indicate that the trend of conservationis protein core residues » interface residues »non-interface surface residues. In a study based on a small set of transient protein–protein complexes, Nooren et al. [26] showed that interface residues are more conserved than surface residues. Here, consistent results are obtained for a larger dataset.

Fig. 2 — Sequence entropies in protein cores, interfaces, and non-interface surfaces. Sequence entropy values for residues are extracted from the HSSP database (http://www.cmbi.kun.nl/gv/hssp/). The sequence entropy shows the conservation at each residue position in a multiple alignment. The values have been normalized over the range of 0–100, with the lowest sequence entropy values corresponding to the most conserved positions. The results are forDataset100. The figure shows that among the three groups, protein cores have the highest fraction of residues with high conservation (less entropy values), non-interface surfaces have the smallest, and interfaces are intermediate

3.1.3 Secondary Structure

We consider eight classes of secondary structure as defined by the DSSP program [27].Figure 3 compares the secondary structure composition of protein cores, interfaces, and non-interface surfaces. The comparisons show that among the three groups, non-interface surfaces have the highest fraction of residues in S (Bend) and T (Turn), the protein cores have the smallest, and interfaces are intermediate. The opposite trend is observed for the class E (Extended strand). No obvious location preferences are observed for the other classes of secondary structure.

Fig. 3 — Secondary structure compositions of protein cores, interfaces, and non-interface surfaces. Secondary structures of proteins are defined using the DSSP program [27]: 3₁₀-helix (G), alpha helix (H), pi helix (I), helix-turn (T), extended beta sheet (E), beta bridge (B), bend (S), and other/loop (_). Each protein is divided into interface, protein core, and non-interface surface based on solvent accessibility and whether a residue is in the interface as described in Sect. 2. The results are achieved usingDataset100

3.1.4 Contact Preferences

Figure 4A shows thecontact frequencies across the interfaces given by (C_ij/Σ_m,nC_mn), whereC_ij is the number of contacts between residues of typesi andj.Figure 4B shows thecontact preferences given by log₂ ((C_ij/Σ_m,nC_mn)/(w_i ×w_j)) wherewi andw_j are the frequencies of residue typesi andj. InFig. 4B, positive preferences are shown in red, negative in blue, and neutral in green. Residues are placed in order by increasing hydrophobicity. Comparison ofFig. 4A and B shows that normalizing the raw contact frequencies by the frequencies of individual residue types makes the high preferences for hydrophobic contacts, aromatic contacts and the contacts between oppositely charged residues stand out clearly (red inFig. 4B).Figure 4B shows that the contacts between hydrophobic residues are preferred in interfaces. These highly preferred contacts correspond to the red region in the lower-right corner ofFig. 4B. The fact that Cys–Cys contacts have one of the highest preferences indicates the important role that this type of contacts has in protein–protein interactions. The contacts between residues with opposite charges (Arg–Asp, Arg–Glu, Lys–Asp, and Lys–Glu) are also preferred in interfaces. These contacts form several red entries near the upper-left corner inFig. 4B. These results are consistent with the previous claim that disulfide bonds, salt-bridges, and hydrophobic interactions represent the main forces in protein–protein interactions [6,9,10,28]. The face-to-face arrangement of two aromatic rings was reported to be favorable for interactions [9]. Here, high preferences for the contacts between different aromatic residues are observed. The interaction between a proline ring and an aromatic ring resembles the interaction between two aromatic rings [9], and this can be seen in the higher preference for the Pro–Trp (P-W) pair. Keskin et al. [12] investigated the residue contacts at protein–protein interfaces using ‘‘solvent-mediated’’ potentials and ‘‘residue-mediated’’ potentials. The abundance of the Cys–Cys contact, hydrophobic contacts, and aromatic contacts in interfaces observed in this study are consistent with the low values of the residue-mediated potentials for these contacts reported by Keskin et al. [12].

Fig. 4 — Residue contact preferences for interfaces. (A) Raw contact frequencies given by (*C_ij*/Σ*_m,n*C_mn), whereC_ij is the number of contacts between residue typesi andj. (B) Contact preferences given by log₂ ((*C_ij*/Σ*_m,n*C_mn)/(*w_i* ×*w_j*)) The results are given forDataset100. Residues are placed in order by their increasing hydrophobicity based on the Kyte and Doolittle hydropathy index [25]. FigureB shows that Cys–Cys contacts, the contacts between residues with opposite charges, the contacts between different aromatic residues, and those between hydrophobic residues are preferred in interfaces. These contacts are shown in red in FigureB. Comparison betweenA andB shows that normalizing raw contact frequencies by the frequencies of individual residue types makes the preferences for these contacts stand out more clearly

3.1.5 Interface Size

Interface size is calculated separately for each side of an interface.Figure 5 shows that interface sizes span a broad range and that the distribution has a peak in the range of 600–800 Å². The average interface size is 1,227 Å². Fourteen percent of the interfaces in the dataset have a size in the range of 600–800 Å². In a study based on a set of 75 hetero-complexes, Lo Conte et al. [29] found that most interfaces have a total buried area (that is, the sum buried area from both sides of the interfaces) in the range of 1,600 (±400) Å², which is roughly equivalent to 800 (±200) Å² for each side of the interface. Here, about 25% of the interfaces have a (one-sided) size in the range of 800 (±200) Å².

Fig. 5 — Interface size distribution. Interface size is calculated separately for each side of an interface. The results are obtained forDastaset100. The distribution has a peak at 600–800 Å². About 25% of the interfaces have a (one-sided) size in the range of 800 (±200) Å²

3.2 Are the Differences in Residue Composition, Conservation, and Secondary Structure Due to the Difference in Solvent Accessibility or the Difference in Functionality?

By our definition, protein core residues have a relative solvent accessibility (rASA) below 25%, non-interface surface residues have a rASA equal to or greater than 25%, and interface residues have a rASA ranging from 0% to 100%. The results from above have shown the differences in residue composition, conservation, and secondary structure among protein cores, interfaces, and non-interface surfaces. However, since these three groups have different accessibilities, it is unknown whether these differences are due to the differences in solvent accessibility or other reasons. To separate out the effect of solvent accessibility, we randomly extract residues from the overall residues so that the resulting residue set has the same rASA distribution as the interfaces. The resulting dataset will be referred to asSetrASA, with rASA denoting that the dataset has the same rASA distribution as interfaces. We then compare the interfaces with theSetrASA. Five differentSetrASAs were independently extracted from theDataset100. The size of eachSetrASA is about 60% of that of the overall residues.

3.2.1 Residue Composition and Interface Propensity

Figure 6 compares the residue compositions of theSetrA-SAs and the interfaces. The comparisons show that the interfaces have more aromatic residues (Tyr, Trp, and Phe) and hydrophobic residues (Cys, Met, Ile, Leu, and Val) than do theSetrASAs. Residues with intermediate hydrophobicity (Ser, Thr, Gly, and Ala) are underrepresented in the interfaces. All charged residues, except Arg, are underrepresented in the interfaces.

Fig. 6 — Comparison of residue compositions of theSetrASA and at interfaces. FiveSetrASAs are extracted fromDataset100. Mean values for theSetrASAs are displayed. The standard deviations are below 0.05 (They are shown as bars in the figure but too small to be visible). The residue types are placed in order by their increasing hydrophobicities

We calculate the interface propensities (Normalized interface propensities,NIP) of residues by comparing the residue composition of the interfaces with that of theSetrASA, that is,propensity (i) = log₂ (w_i/S_i), whereS_i is the fraction of residuei in theSetrASAs andw_i is the fraction of residuei in the interfaces. We name this propensitynormalized interface propensity (NIP), since theSetrASA can be considered as a version of the overall residues that is normalized according to the rASA distribution of the interfaces. The results are shown inFig. 7 with residue types placed in order by their increasing hydrophobicity.Figure 7 shows thatNIP reveals that interfaces have high preferences for hydrophobic residues and hydrophilic residues are not preferred at interfaces. On the right side (the hydrophobic end) ofFig. 7, residues have high propensities for interfaces and Cys has the highest propensity overall. On the left side (the hydrophilic end), residues (except Arg and His) have negative propensities. This indicates that the interfaces are more hydrophobic than expected based on their exposure.Figure 7 also shows aromatic residues to have high propensities for interfaces.

Fig. 7 — Normalized interface propensities (*NIP*) of residues. The propensities are calculated by comparing interfaces with the sets (*SetrASA*) of residues that have the same relative solvent accessibility as the interfaces. FiveSetrASAs were extracted, and mean values are displayed. The standard deviations are below 0.02 (They are shown as bars in the figure, but most of them are too small to be visible). The results show the clear trend that hydrophobic residues are preferred in interfaces and hydrophilic residues are not. Aromatic residues also have highNIP. The results are obtained usingDataset100

We compareNIP with the interface propensities (raw interface propensities, RIP) that are calculated by comparing the interfaces with all residues, which is given by log₂ (w_i/W_i), where W_i is the fraction of residuei overall, andw_i is the fraction of residuei in the interfaces.Figure 8 shows thatNIP reveals the trend that hydrophobic residues are preferred in interfaces and hydrophilic residues are unfavorable in interfaces, whereas this trend is not revealed byRIP. Many residues have opposite signs inRIP andNIP. Striking differences are seen for hydrophobic and polar residues. Ile, Val, Leu, and Met have high positiveNIP but negativeRIP values. Asn, Asp, Gln, and Glu have negative or neutralNIP, while the corresponding values ofRIP are positive or neutral. Cys and aromatic residues (Tyr, Trp, and Phe) have high positiveNIP but only weak positiveRIP. The difference in the definitions ofRIP andNIP is that inNIP interfaces are compared with a set of residues that have the same rASA distribution as the interfaces, while inRIP interfaces are compared with the overall residues whose solvent accessibility is different from that of the interfaces. The differences between the values ofRIP andNIP indicate that solvent accessibility affects the distribution of residues. Therefore, it is crucial to account for the effect of solvent accessibility when searching for the features that can distinguish interfaces from the rest of the protein.

Fig. 8 — Comparison of normalized interface propensities (*NIP*) and raw interface propensities (*RIP*).NIP are calculated by comparing interfaces with the set of residues (*SetrASA*) that has the same relative solvent accessibility as the interfaces. FiveSetrASAs are extracted, and their mean values are displayed. The standard deviations are below 0.02 (They are shown as bars in the figure, but can barely be seen).RIP are calculated by comparing interfaces with the all residues. WhileNIP reveals the trend that hydrophobic residues are preferred in interfaces and hydrophilic residues are unfavorable in interfaces, this trend is not seen in theRIP. Many residues have opposite signs inRIP andNIP. The results were obtained for theDataset100

Previous studies have drawn contradictory conclusions on interface propensities. For example, some studies showed that Ile, Val, and Leu have high positive propensities for interfaces [17,29,30], while the study of Ofran and Rost [10] showed that these residues have negative or weak positive propensities for the inter-protein interfaces. Our results show that these three residues have high positive propensities when evaluated usingNIP and negative propensities when evaluated usingRIP. In Ofran and Rost’s study, interface propensities were calculated using SWISS-PROT as background, so the results are similar to that based onRIP in this study, which is calculated using overall residues as background. In the studies by Jones and Thornton [31], Lo Conte et al. [29], and Bahadur et al. [32], interface propensities were calculated based on the accessible surface area of residues, and the results are similar to here based onNIP.

3.2.2 Sequence Entropies

Sequence entropies of theSetrASA and the interfaces are compared inFig. 9. The results show that interfaces have more residues with low sequence entropies (conserved). This indicates that interface is more conserved thanSetrASA. The results from above (SeeFig. 2) showed that protein cores are more conserved than interfaces, which in turn are more conserved than non-interface surfaces. Here,Fig. 9 shows that interfaces are more conserved than expected by their exposure.

3.2.3 Secondary Structures

A comparison of the secondary structure composition of theSetrASAs with that of interfaces is shown inFig. 10. Compared with theSetrASAs, the interfaces have slightly more residues in E (Extended strand) and H (α helix) and fewer residues in S (Bend) and T (Turn). Despite this, there are no significant differences between the interfaces and theSetrASAs in terms of secondary structure composition. Although the results in a previous section (shown inFig. 3) show some differences in secondary structure composition among protein cores, interfaces, and non-interface surfaces, here,Fig. 10 shows that interfaces do not differ much from the general situation in proteins in their secondary structure composition, after correcting for the effect of solvent accessibility. This suggests that the differences in secondary structure composition among protein cores, interfaces, and non-interface surfaces are mostly due to the differences in accessibility within the three groups rather than to different functions. Raih et al. [33] investigated the interface propensities for secondary structure types by comparing interfaces with surfaces. Their results show that _ (Loop) and S (Bend) are more frequent at interfaces. This observation may be directly attributable to the differences in the accessibilities of interfaces and surfaces.

In summary, to exclude the effect of solvent accessibility, we have compared the interfaces with residue sets (SetrASA) having the same relative solvent accessibility distribution as the interfaces. The results show that hydrophobic residues and aromatic residues have high propensities for interfaces; hydrophilic residues (except Arg and His) have negative propensities; and interfaces are more conserved than the remainder of the protein.

3.3 Are the Results Consistent Across Different Datasets?

So far, the results we have reported are obtained usingDataset100. In order to evaluate whether the results are consistent across different datasets, we analyze interface properties on three datasets with different constraints on sequence similarity and structure quality: Dataset100, Dataset30, and Dataset30_3.Figure 11 shows that the results obtained using the three datasets are consistent.

Fig. 11 — The results obtained for three different datasets are consistent. (**A–C**) Residue composition. (**D–F**) Sequence entropy distribution. (**G–I**) Secondary structure composition. (**J–L**) Interface sizes. (**M–O**) Raw contact frequencies given by (*C_ij*/Σ*_m,n*C_mn), whereC_ij is the number of contacts between residue typesi andj. (**P–R**) Contact preferences given by log₂ ((*C_ij*/Σ*_m,n*C_mn)/(*w_i* × w_j)), wherew_i is the frequency of residue typei in the interfaces.A, D, G, J, M, andP are the results onDataset100, which consists of 6,545 interfaces.B, E, H, K, N, andQ are the results on Dataset30, which consists of 2,557 interfaces. The mutual similarities among the interfaces are below 30%.C, F, I, L, O, andR are the results for Dataset30_3, which consists of 2,310 interfaces from structures having resolution better than 3.0 Å. The mutual similarities among the interfaces are below 30%

3.4 Homomeric Interfaces Compared with Heteromeric Interfaces

Some studies have shown that different types of interfaces have different characters [17,30]. We divideDataset100 into heteromeric interfaces and homomeric interfaces based on the sequence identity between the interacting chains and compare the characteristics of the two types of interfaces (Fig. 12).Figure 12A shows the normalized interface propensities of residues. The results show that hydrophobic residues (Ile, Val, Leu, Phe, Cys, and Met) have high positive propensities for both homomeric interfaces and heteromeric interfaces and hydrophilic residues (Lys, Asn, Asp, Gln, and Glu) have negative propensities. This suggests that both types of interfaces are more hydrophobic than the rest of the protein.Figure 12A also shows that Cys and aromatic residues (Phe, Trp, and Tyr) have higher propensities in heteromeric interfaces than at homomeric interfaces. Hydrophobic residues (Ile, Val, Leu, and Met) have higher propensities for homomeric interfaces than for heteromeric interfaces and the opposite is observed for charged residues (except Arg). This indicates that homomeric interfaces are more hydrophobic than heteromeric interfaces. This result is consistent with the results of previous studies [17,30].Figure 12B shows that heteromeric interfaces have more residues with low entropies (conserved) than homomeric interfaces, suggesting that heteromeric interfaces are more conserved than homomeric interfaces. This may be related to the fact that a heteromeric interface involved two different proteins, and a mutation in one protein requires a complimentary mutation in the interacting protein to restore the interaction function, while a homomeric interface involves two identical chains, one mutation will affect both sides of the interface. Thus, mutations are less tolerable at heteromeric interfaces. Comparison of secondary structure composition (Fig. 12C) shows that heteromeric interfaces have more loops (_) and extended strands (E) and fewer α-helixes (H) than homomeric interfaces.Figure 12D shows the distributions of interface sizes for heteromeric interfaces and homomeric interfaces. Both types of interfaces have a peak value in the range 600–800 Å². However, the homomeric interfaces are larger than the heteromeric interfaces: 63% of the homomeric interfaces are larger than 800 Å², while only 53% of the heteromeric interfaces are larger than 800 Å². The average size of the homomeric interfaces is 1,311 Å², and the average size of the heteromeric interfaces is 1,112 Å². This result is consistent with the conclusion of a previous study that homomeric interfaces are larger than heteromeric interfaces [30].Figure 12G–H show that the contacts between residues with opposite charges (Arg–Asp, Arg–Glu, Lys–Asp, and Lys–Glu) and the contacts between hydrophobic residues (the red regions at the lower-right corners ofFig. 12G–H) are preferred in both types of interfaces. Compared with homomeric interfaces, heteromeric interfaces have relatively more contacts involving Cys or aromatic residues (Phe, Tyr, and Trp). The columns and rows inFig. 12G for these residues are more frequent (red) than the corresponding entries inFig. 12H.

Fig. 12 — Comparisons between homomeric interfaces and heteromeric interfaces. (A) Normalized interface propensities. (B) Sequence entropies. (C) Secondary structures. (D) Interface sizes. (**E–F**) Raw contact frequencies given by (*C_ij*/Σ*_m,n*C_mn),whereC_ij is the number of contacts between residue typesi andj. (**G–H**) Contact preferences given by log₂ ((*C_ij*/Σ*_m,n*C_mn)/(*w_i* × w_j)). The results are obtained fromDataset100. Heteromeric interfaces and homomeric interfaces have been extracted fromDataset100 based on the sequence similarities between the interacting protein chains. An interface is a homomeric interface if the two interacting chains have a sequence identity greater than 95%. Otherwise, it is considered a heteromeric interface

4 Discussion of Results

In this study, we compare various properties of protein cores, interfaces and non-interface surfaces, analyze interface properties by separating out the effect of solvent accessibility, and investigate the differences between homomeric interfaces and heteromeric interfaces.

Compared with previous studies, the significance aspects of this study include: (1) use of large datasets of protein–protein interfaces; (2) confirming results by using three datasets with different constraints on sequence similarity and structure resolutions; and (3) separating out the effect of solvent accessibility in analyzing the characteristics of protein–protein interfaces.

We found that solvent accessibility affects the distribution of residues and it is crucial to account for the effect of solvent accessibility when searching for the features that can distinguish interfaces from the rest of the protein. Generally, hydrophilic residues are more frequent in the portions of proteins that are highly solvent accessible, and hydrophobic residues are more frequent in the buried portions. Because protein core residues have lower solvent accessibility than interface residues, and non-interface surface residues have higher solvent accessibility than interface residues, the residue distributions among these groups are affected not only by the different functions of these groups but also by the difference in their solvent accessibilities. To evaluate whether residues have special preferences for the interfaces because of the function, one must separate out the effect of solvent accessibility. Here, we do so by comparing protein–protein interfaces with a set of residues having the same solvent accessibilities. This allows us to separate out the effect of solvent accessibility on the distributions of residues, secondary structure, and sequence entropy. The comparison shows the trend that hydrophobic residues are preferred in interfaces and hydrophilic residues are not. In contrast, this trend is not observed when we compare interfaces with the overall residues, that is, when the effect of solvent accessibility is not separated out.

The result shows clearly that the interfaces have more hydrophobic residues and fewer hydrophilic residues. Interfaces with hydrophobic residues are critical for the stabilization of protein–protein complexes. The formation of a protein–protein complex in aqueous solution was reported to be an entropy-driven process [34]. The thought was that burial of hydrophobic surface patches yields a large entropy gain, providing a driving force for the formation of protein complexes and thus stabilizing the resulting complexes. The results also show that the interfaces are more conserved. Conserved interfaces are crucial for the maintenance of protein–protein interactions during evolution.

We found that Cys–Cys contacts, the contacts between residues with opposite charges and the contacts between hydrophobic residues are more frequent across protein–protein interfaces. Hydrophobic interactions have been widely accepted to be the main stabilizing force for two proteins to interact. Some studies have shown that interactions between charged residues also contribute to protein–protein interactions [35,6]. Bahar and Jernigan [35] showed that at close distances, interactions between pairs of hydrophilic residues are predominantly important; whereas hydrophobic interactions are important at longer distances. Cys–Cys pairs can contribute to the interactions by forming disulfide bonds [9]. The results we obtained confirm that disulfide bonds, salt-bridges, and hydrophobic interactions are the important forces in protein–protein interactions.

We also found that aromatic residues are more frequent at interfaces. Aromatic residues can form strong hydrophobic interactions between the bulky hydrophobic side chains. In addition to the hydrophobic interactions, the parallel arrangement of two aromatic rings makes further contributions by creating tighter packing with better geometric fit. The enhanced abundance of aromatic residues in interfaces might imply more precise geometric fits are achievable for these ring structures. Frequent interactions between aromatic residues are observed in this study.

Acknowledgments

This Research was supported in part by a grant from the National Institutes of Health (GM 066387) to VH, DD, and RLJ.

Abbreviations

PDB: Protein Data Bank
PQS: Protein Quaternary Structure
ΔASA: Changes in solvent accessible surface area
rASA: Relative solvent accessibility
RIP: Raw interface propensity
NIP: Normalized interface propensity

References

1.Chothia C, Janin J. Nature. 1975;256:705–708. doi: 10.1038/256705a0. [DOI] [PubMed] [Google Scholar]
2.Wodak SJ, Janin J. Adv Protein Chem. 2002;61:9–73. doi: 10.1016/s0065-3233(02)61001-0. [DOI] [PubMed] [Google Scholar]
3.Deremble C, Lavery R. Curr Opin Struct Biol. 2005;15:171–175. doi: 10.1016/j.sbi.2005.01.018. [DOI] [PubMed] [Google Scholar]
4.Ponstingl H, Kabir T, Gorse D, Thornton JM. Progr Bio-phys Mol Biol. 2005;89:9–35. doi: 10.1016/j.pbiomolbio.2004.07.010. [DOI] [PubMed] [Google Scholar]
5.Reichmann D, Rahat O, Cohen M, Neuvirth H, Schreiber G. Curr Opin Struct Biol. 2007;17:67–76. doi: 10.1016/j.sbi.2007.01.004. [DOI] [PubMed] [Google Scholar]
6.Sheinerman FB, Norel R, Honig B. Curr Opin Struct Biol. 2000;10:153–159. doi: 10.1016/s0959-440x(00)00065-8. [DOI] [PubMed] [Google Scholar]
7.Heifetz A, Katchalski-Katzir E, Eisenstein M. Protein Sci. 2002;11:571–587. doi: 10.1110/ps.26002. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Vizcarra CL, Mayo SL. Curr Opin Chem Biol. 2005;9:622–626. doi: 10.1016/j.cbpa.2005.10.014. [DOI] [PubMed] [Google Scholar]
9.Glaser F, Steinberg DM, Vakser IA, Ben-Tal N. Proteins. 2001;43:89–102. [PubMed] [Google Scholar]
10.Ofran Y, Rost B. J Mol Biol. 2003;325:377–387. doi: 10.1016/s0022-2836(02)01223-8. [DOI] [PubMed] [Google Scholar]
11.Miyazawa S, Jernigan RL. Macromolecules. 1985;18:534–552. [Google Scholar]
12.Keskin O, Bahar I, Badretinov AY, Ptitsyn OB, Jernigan RL. Protein Sci. 1998;7:2578–2586. doi: 10.1002/pro.5560071211. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Young L, Jernigan RL, Covell DG. Protein Sci. 1994;3:717–729. doi: 10.1002/pro.5560030501. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Berchanski A, Shapira B, Eisenstein M. Proteins. 2004;56:130–142. doi: 10.1002/prot.20145. [DOI] [PubMed] [Google Scholar]
15.Ben-Naima A. J Chem Phys. 2006;125:24901. doi: 10.1063/1.2205860. [DOI] [PubMed] [Google Scholar]
16.Keskin O, Mab B, Nussinov R. J Mol Biol. 2005;345:1281–1294. doi: 10.1016/j.jmb.2004.10.077. [DOI] [PubMed] [Google Scholar]
17.Jones S, Thornton JM. Proc Natl Acad Sci USA. 1996;93:13–20. [Google Scholar]
18.Peter Block JP, Hülermeier E, Sanschagrin P, Sotriffer CA, Klebe G. Proteins. 2006;65:607–622. doi: 10.1002/prot.21104. [DOI] [PubMed] [Google Scholar]
19.Chakrabarti P, Janin J. Proteins. 2002;47:334–343. doi: 10.1002/prot.10085. [DOI] [PubMed] [Google Scholar]
20.Kyu-il Cho KL, Lee KH, Kim D, Lee D. Proteins. 2006;65:593–606. doi: 10.1002/prot.21056. [DOI] [PubMed] [Google Scholar]
21.Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. Nucl Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Henrick K, Thornton JM. Trends Biochem Sci. 1998;23:358–361. doi: 10.1016/s0968-0004(98)01253-5. [DOI] [PubMed] [Google Scholar]
23.Hubbard SJ. NACCESS, department of biochemistry and molecular biology. University College; London: 1993. [Google Scholar]
24.Gutteridge A, Bartlett GJ, Thornton JM. J Mol Biol. 2003;330:719–734. doi: 10.1016/s0022-2836(03)00515-1. [DOI] [PubMed] [Google Scholar]
25.Kyte J, Doolittle RF. J Mol Biol. 1982;157:105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
26.Nooren IMA, Thornton JM. J Mol Biol. 2003;325:991–1016. doi: 10.1016/s0022-2836(02)01281-0. [DOI] [PubMed] [Google Scholar]
27.Kabsch W, Sander C. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
28.McCoy AJ, Chandana Epa V, Colman PM. J Mol Biol. 1997;268:570–584. doi: 10.1006/jmbi.1997.0987. [DOI] [PubMed] [Google Scholar]
29.Lo Conte L, Chothia C, Janin J. J Mol Biol. 1999;285:2177–2198. doi: 10.1006/jmbi.1998.2439. [DOI] [PubMed] [Google Scholar]
30.Bahadur RP, Chakrabarti P, Rodier F, Janin J. Proteins. 2003;53:708–719. doi: 10.1002/prot.10461. [DOI] [PubMed] [Google Scholar]
31.Jones S, Thornton JM. J Mol Biol. 1997;272:121–132. doi: 10.1006/jmbi.1997.1234. [DOI] [PubMed] [Google Scholar]
32.Prasad Bahadur R, Chakrabarti P, Rodier F, Janin J. J Mol Biol. 2004;336:943–955. doi: 10.1016/j.jmb.2003.12.073. [DOI] [PubMed] [Google Scholar]
33.Raih MF, Ahmad S, Zheng R, Mohamed R. Biophys Chem. 2005;114:63–69. doi: 10.1016/j.bpc.2004.10.005. [DOI] [PubMed] [Google Scholar]
34.Creighton T. Protein structures and molecular properties. WH Freeman; New York: 1997. [Google Scholar]
35.Bahar I, Jernigan RL. J Mol Biol. 1997;266:195–244. doi: 10.1006/jmbi.1996.0758. [DOI] [PubMed] [Google Scholar]

Movatterモバイル変換

PERMALINK

Characterization of Protein–Protein Interfaces

Changhui Yan

Feihong Wu

Robert L Jernigan

Drena Dobbs

Vasant Honavar

Abstract

1 Introduction

2 Materials and Methods

2.1 Selecting Structures for Dataset100, Dataset30, and Dataset30_3

2.2 Protein Cores, Interfaces, and Non-interface Surfaces

2.3 Heteromeric Interfaces and Homomeric Interfaces

2.4 Interface Propensity (Raw Interface Propensity, RIP)

2.5 Normalized Interface Propensity (NIP)

2.6 Contact Preferences

3 Results

3.1 Characteristics of Interfaces

3.1.1 Residue Composition

Fig. 1.

3.1.2 Sequence Entropy

Fig. 2.

3.1.3 Secondary Structure

Fig. 3.

3.1.4 Contact Preferences

Fig. 4.

3.1.5 Interface Size

Fig. 5.

3.2 Are the Differences in Residue Composition, Conservation, and Secondary Structure Due to the Difference in Solvent Accessibility or the Difference in Functionality?

3.2.1 Residue Composition and Interface Propensity

Fig. 6.

Fig. 7.

Fig. 8.

3.2.2 Sequence Entropies

Fig. 9.

3.2.3 Secondary Structures

Fig. 10.

3.3 Are the Results Consistent Across Different Datasets?

Fig. 11.

3.4 Homomeric Interfaces Compared with Heteromeric Interfaces

Fig. 12.

4 Discussion of Results

Acknowledgments

Abbreviations

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases