This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Protein structure" – news ·newspapers ·books ·scholar ·JSTOR(May 2018) (Learn how and when to remove this message) |
Protein structure is thethree-dimensional arrangement of atoms in anamino acid-chainmolecule.Proteins arepolymers – specificallypolypeptides – formed from sequences ofamino acids, which are themonomers of the polymer. A single amino acid monomer may also be called aresidue, which indicates a repeating unit of a polymer. Proteins form by amino acids undergoingcondensation reactions, in which the amino acids lose onewater molecule perreaction in order to attach to one another with apeptide bond. By convention, a chain under 30 amino acids is often identified as apeptide, rather than a protein.[1] To be able to perform their biological function, proteins fold into one or more specific spatial conformations driven by a number ofnon-covalent interactions, such ashydrogen bonding,ionic interactions,Van der Waals forces, andhydrophobic packing. To understand the functions of proteins at a molecular level, it is often necessary to determine theirthree-dimensional structure. This is the topic of the scientific field ofstructural biology, which employs techniques such asX-ray crystallography,NMR spectroscopy,cryo-electron microscopy (cryo-EM) anddual polarisation interferometry, to determine the structure of proteins.
Protein structures range in size from tens to several thousand amino acids.[2] By physical size, proteins are classified asnanoparticles, between 1–100 nm. Very largeprotein complexes can be formed fromprotein subunits. For example, many thousands ofactin molecules assemble into amicrofilament.
A protein usually undergoesreversiblestructural changes in performing its biological function. The alternative structures of the same protein are referred to as differentconformations, and transitions between them are calledconformational changes.
There are four distinct levels of protein structure.
Theprimary structure of a protein refers to the sequence ofamino acids in the polypeptide chain. The primary structure is held together bypeptide bonds that are made during the process ofprotein biosynthesis. The two ends of thepolypeptide chain are referred to as thecarboxyl terminus (C-terminus) and theamino terminus (N-terminus) based on the nature of the free group on each extremity. Counting of residues always starts at the N-terminal end (NH2-group), which is the end where the amino group is not involved in a peptide bond. The primary structure of a protein is determined by thegene corresponding to the protein. A specific sequence ofnucleotides inDNA istranscribed intomRNA, which is read by theribosome in a process calledtranslation. The sequence of amino acids in insulin was discovered byFrederick Sanger, establishing that proteins have defining amino acid sequences.[3][4] The sequence of a protein is unique to that protein, and defines the structure and function of the protein. The sequence of a protein can be determined by methods such asEdman degradation ortandem mass spectrometry. Often, however, it is read directly from the sequence of the gene using thegenetic code. It is strictly recommended to use the words "amino acid residues" when discussing proteins because when a peptide bond is formed, awater molecule is lost, and therefore proteins are made up of amino acid residues.Post-translational modifications such asphosphorylations andglycosylations are usually also considered a part of the primary structure, and cannot be read from the gene. For example,insulin is composed of 51 amino acids in 2 chains. One chain has 31 amino acids, and the other has 20 amino acids.
Secondary structure refers to highly regular local sub-structures on the actual polypeptide backbone chain. Two main types of secondary structure, theα-helix and theβ-strand orβ-sheets, were suggested in 1951 byLinus Pauling.[5] These secondary structures are defined by patterns ofhydrogen bonds between the main-chain peptide groups. They have a regular geometry, being constrained to specific values of the dihedral angles ψ and φ on theRamachandran plot. Both the α-helix and the β-sheet represent a way of saturating all the hydrogen bond donors and acceptors in the peptide backbone. Some parts of the protein are ordered but do not form any regular structures. They should not be confused withrandom coil, an unfolded polypeptide chain lacking any fixed three-dimensional structure. Several sequential secondary structures may form a "supersecondary unit".[6]
Tertiary structure refers to the three-dimensional structure created by a single protein molecule (a singlepolypeptide chain). It may includeone or several domains. The α-helices and β-pleated-sheets are folded into a compactglobular structure. The folding is driven by thenon-specifichydrophobic interactions, the burial ofhydrophobic residues fromwater, but the structure is stable only when the parts of aprotein domain are locked into place byspecific tertiary interactions, such assalt bridges, hydrogen bonds, and the tight packing of side chains anddisulfide bonds. The disulfide bonds are extremely rare in cytosolic proteins, since thecytosol (intracellular fluid) is generally areducing environment.
Quaternary structure is the three-dimensional structure consisting of the aggregation of two or more individual polypeptide chains (subunits) that operate as a single functional unit (multimer). The resulting multimer is stabilized by the samenon-covalent interactions and disulfide bonds as in tertiary structure. There are many possible quaternary structure organisations.[7] Complexes of two or more polypeptides (i.e. multiple subunits) are calledmultimers. Specifically it would be called adimer if it contains two subunits, atrimer if it contains three subunits, atetramer if it contains four subunits, and apentamer if it contains five subunits, and so forth. The subunits are frequently related to one another bysymmetry operations, such as a 2-fold axis in a dimer. Multimers made up of identical subunits are referred to with a prefix of "homo-" and those made up of different subunits are referred to with a prefix of "hetero-", for example, a heterotetramer, such as the two alpha and two beta chains ofhemoglobin.
An assemblage of multiple copies of a particularpolypeptide chain can be described as ahomomer,multimer oroligomer. Bertolini et al. in 2021[8] presented evidence that homomer formation may be driven by interaction between nascent polypeptide chains as they are translated frommRNA by nearby adjacentribosomes. Hundreds of proteins have been identified as being assembled into homomers in human cells.[8] The process of assembly is often initiated by the interaction of the N-terminal region of polypeptide chains. Evidence that numerous gene products form homomers (multimers) in a variety of organisms based onintragenic complementation evidence was reviewed in 1965.[9]
Proteins are frequently described as consisting of several structural units. These units include domains,motifs, and folds. Despite the fact that there are about 100,000 different proteins expressed ineukaryotic systems, there are many fewer different domains, structural motifs and folds.
Astructural domain is an element of the protein's overall structure that is self-stabilizing and oftenfolds independently of the rest of the protein chain. Many domains are not unique to the protein products of onegene or onegene family but instead appear in a variety of proteins. Domains often are named and singled out because they figure prominently in the biological function of the protein they belong to; for example, the "calcium-binding domain ofcalmodulin". Because they are independently stable, domains can be "swapped" bygenetic engineering between one protein and another to makechimera proteins. A conservative combination of several domains that occur in different proteins, such asprotein tyrosine phosphatase domain andC2 domain pair, was called "a superdomain" that may evolve as a single unit.[10]
Thestructural andsequence motifs refer to short segments of protein three-dimensional structure or amino acid sequence that were found in a large number of different proteins
Tertiary protein structures can have multiple secondary elements on the same polypeptide chain. Thesupersecondary structure refers to a specific combination ofsecondary structure elements, such as β-α-β units or ahelix-turn-helix motif. Some of them may be also referred to as structural motifs.
A protein fold refers to the general protein architecture, like ahelix bundle,β-barrel,Rossmann fold or different "folds" provided in theStructural Classification of Proteins database.[11] A related concept isprotein topology.
Proteins are not static objects, but rather populate ensembles ofconformational states. Transitions between these states typically occur onnanoscales, and have been linked to functionally relevant phenomena such asallosteric signaling[12] andenzyme catalysis.[13]Protein dynamics andconformational changes allow proteins to function as nanoscalebiological machines within cells, often in the form ofmulti-protein complexes.[14] Examples includemotor proteins, such asmyosin, which is responsible formuscle contraction,kinesin, which moves cargo inside cells away from thenucleus alongmicrotubules, anddynein, which moves cargo inside cells towards the nucleus and produces the axonemal beating ofmotile cilia andflagella. "[I]n effect, the [motile cilium] is a nanomachine composed of perhaps over 600 proteins in molecular complexes, many of which also function independently as nanomachines...Flexible linkers allow themobile protein domains connected by them to recruit their binding partners and induce long-rangeallostery viaprotein domain dynamics. "[15]
Proteins are often thought of as relatively stabletertiary structures that experience conformational changes after being affected by interactions with other proteins or as a part of enzymatic activity. However, proteins may have varying degrees of stability, and some of the less stable variants areintrinsically disordered proteins. These proteins exist and function in a relatively 'disordered' state lacking a stabletertiary structure. As a result, they are difficult to describe by a single fixedtertiary structure.Conformational ensembles have been devised as a way to provide a more accurate and 'dynamic' representation of the conformational state ofintrinsically disordered proteins.[17][16]
Proteinensemble files are a representation of a protein that can be considered to have a flexible structure. Creating these files requires determining which of the various theoretically possible protein conformations actually exist. One approach is to apply computational algorithms to the protein data in order to try to determine the most likely set of conformations for anensemble file. There are multiple methods for preparing data for theProtein Ensemble Database that fall into two general methodologies – pool and molecular dynamics (MD) approaches (diagrammed in the figure). The pool based approach uses the protein's amino acid sequence to create a massive pool of random conformations. This pool is then subjected to more computational processing that creates a set of theoretical parameters for each conformation based on the structure. Conformational subsets from this pool whose average theoretical parameters closely match known experimental data for this protein are selected. The alternative molecular dynamics approach takes multiple random conformations at a time and subjects all of them to experimental data. Here the experimental data is serving as limitations to be placed on the conformations (e.g. known distances between atoms). Only conformations that manage to remain within the limits set by the experimental data are accepted. This approach often applies large amounts of experimental data to the conformations which is a very computationally demanding task.[16]
The conformational ensembles were generated for a number of highly dynamic and partially unfolded proteins, such asSic1/Cdc4,[18]p15 PAF,[19]MKK7,[20]Beta-synuclein[21] andP27[22]
![]() | This sectionneeds expansion. You can help byadding to it.(April 2019) |
As it is translated, polypeptides exit theribosome mostly as arandom coil and folds into itsnative state.[23][24] The final structure of the protein chain is generally assumed to be determined by its amino acid sequence (Anfinsen's dogma).[25]
Thermodynamic stability of proteins represents thefree energy difference between the folded andunfolded protein states. This free energy difference is very sensitive to temperature, hence a change in temperature may result in unfolding or denaturation.Protein denaturation may result in loss of function, and loss of native state. The free energy of stabilization of soluble globular proteins typically does not exceed 50 kJ/mol.[citation needed] Taking into consideration the large number of hydrogen bonds that take place for the stabilization of secondary structures, and the stabilization of the inner core through hydrophobic interactions, the free energy of stabilization emerges as small difference between large numbers.[26]
Around 90% of the protein structures available in theProtein Data Bank have been determined byX-ray crystallography.[27] This method allows one to measure the three-dimensional (3-D) density distribution ofelectrons in the protein, in thecrystallized state, and therebyinfer the 3-D coordinates of all theatoms to be determined to a certain resolution. Roughly 7% of the known protein structures have been obtained bynuclear magnetic resonance (NMR) techniques.[28] For larger protein complexes,cryo-electron microscopy can determine protein structures. The resolution is typically lower than that of X-ray crystallography, or NMR, but the maximum resolution is steadily increasing. This technique is still a particularly valuable for very large protein complexes such asvirus coat proteins andamyloid fibers.
General secondary structure composition can be determined viacircular dichroism.Vibrational spectroscopy can also be used to characterize the conformation of peptides, polypeptides, and proteins.[29]Two-dimensional infrared spectroscopy has become a valuable method to investigate the structures of flexible peptides and proteins that cannot be studied with other methods.[30][31] A more qualitative picture of protein structure is often obtained byproteolysis, which is also useful to screen for more crystallizable protein samples. Novel implementations of this approach, includingfast parallel proteolysis (FASTpp), can probe the structured fraction and its stability without the need for purification.[32] Once a protein's structure has been experimentally determined, further detailed studies can be done computationally, usingmolecular dynamic simulations of that structure.[33]
Aprotein structure database is a database that ismodeled around the variousexperimentally determined protein structures. The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. Data included in protein structure databases often includes 3D coordinates as well as experimental information, such as unit cell dimensions and angles forx-ray crystallography determined structures. Though most instances, in this case either proteins or a specific structure determinations of a protein, also contain sequence information and some databases even provide means for performing sequence based queries, the primary attribute of a structure database is structural information, whereassequence databases focus on sequence information, and contain no structural information for the majority of entries. Protein structure databases are critical for many efforts incomputational biology such asstructure based drug design, both in developing the computational methods used and in providing a large experimental dataset used by some methods to provide insights about the function of a protein.[34]
Protein structures can be grouped based on their structural similarity,topological class or a commonevolutionary origin. TheStructural Classification of Proteins database[35] andCATH database[36] provide two different structural classifications of proteins. When the structural similarity is large the two proteins have possibly diverged from a common ancestor,[37] and shared structure between proteins is considered evidence ofhomology. Structure similarity can then be used to group proteins together intoprotein superfamilies.[38] If shared structure is significant but the fraction shared is small, the fragment shared may be the consequence of a more dramatic evolutionary event such ashorizontal gene transfer, and joining proteins sharing these fragments into protein superfamilies is no longer justified.[37] Topology of a protein can be used to classify proteins as well.Knot theory andcircuit topology are two topology frameworks developed for classification of protein folds based on chain crossing and intrachain contacts respectively.
The generation of aprotein sequence is much easier than the determination of a protein structure. However, the structure of a protein gives much more insight in the function of the protein than its sequence. Therefore, a number of methods for the computational prediction of protein structure from its sequence have been developed.[39]Ab initio prediction methods use just the sequence of the protein.Threading andhomology modeling methods can build a 3-D model for a protein of unknown structure from experimental structures of evolutionarily-related proteins, called aprotein family.
{{cite book}}
:|journal=
ignored (help)