Movatterモバイル変換

[0]ホーム

Jump to content

Protein primary structure

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromProtein sequence)

Linear sequence of amino acids in a peptide or protein

The image above contains clickable links
This diagram (which is interactive) ofprotein structure usesPCNA as an example. (PDB:1AXC)

Protein primary structure is thelinear sequence ofamino acids in apeptide orprotein.^[1] By convention, theprimary structure of a protein is reported starting from theamino-terminal (N) end to thecarboxyl-terminal (C) end.Protein biosynthesis is most commonly performed byribosomes in cells. Peptides can also besynthesized in the laboratory. Protein primary structures can bedirectly sequenced, or inferred fromDNA sequences.

Formation

[edit]

This sectiondoes notcite anysources. Please helpimprove this section byadding citations to reliable sources. Unsourced material may be challenged andremoved.(June 2025) (Learn how and when to remove this message)

Biological

[edit]

Main article:Translation (biology)

Amino acids are polymerised via peptide bonds to form a longbackbone, with the different amino acid side chains protruding along it. In biological systems, proteins are produced duringtranslation by a cell'sribosomes. Some organisms can also make short peptides bynon-ribosomal peptide synthesis, which often use amino acids other than theencoded 22, and may be cyclised, modified and cross-linked.

Chemical

[edit]

Main article:Peptide synthesis

Peptides can besynthesised chemically via a range of laboratory methods. Chemical methods typically synthesise peptides in the opposite order (starting at the C-terminus) to biological protein synthesis (starting at the N-terminus).

Notation

[edit]

Protein sequence is typically notated as a string of letters, listing the amino acids starting at theamino-terminal end through to thecarboxyl-terminal end. Either a three letter code or single letter code can be used to represent the 22 naturally encoded amino acids, as well as mixtures or ambiguous amino acids (similar tonucleic acid notation).^[1]^[2]^[3]

Peptides can bedirectly sequenced, or inferred fromDNA sequences. Largesequence databases now exist that collate known protein sequences.

22 natural amino acid notation
Amino Acid	3-Letter^[4]	1-Letter^[4]
Alanine	Ala	A
Arginine	Arg	R
Asparagine	Asn	N
Aspartic acid	Asp	D
Cysteine	Cys	C
Glutamic acid	Glu	E
Glutamine	Gln	Q
Glycine	Gly	G
Histidine	His	H
Isoleucine	Ile	I
Leucine	Leu	L
Lysine	Lys	K
Methionine	Met	M
Phenylalanine	Phe	F
Proline	Pro	P
Pyrrolysine	Pyl	O
Selenocysteine	Sec	U
Serine	Ser	S
Threonine	Thr	T
Tryptophan	Trp	W
Tyrosine	Tyr	Y
Valine	Val	V

Ambiguous amino acid notation
Symbol	Description	Residues represented
X	Any amino acid, or unknown	All
B	Aspartate or Asparagine	D, N
Z	Glutamate or Glutamine	E, Q
J	Leucine or Isoleucine	I, L
Φ	Hydrophobic	V, I, L, F, W, M
Ω	Aromatic	F, W, Y, H
Ψ	Aliphatic	V, I, L, M
π	Small	P, G, A, S
ζ	Hydrophilic	S, T, H, N, Q, E, D, K, R, Y
+	Positively charged	K, R, H
-	Negatively charged	D, E

Modification

[edit]

In general, polypeptides are unbranched polymers, so their primary structure can often be specified by the sequence ofamino acids along their backbone. However, proteins can become cross-linked, most commonly bydisulfide bonds, and the primary structure also requires specifying the cross-linking atoms, e.g., specifying thecysteines involved in the protein's disulfide bonds. Other crosslinks includedesmosine.

Isomerisation

[edit]

The chiral centers of a polypeptide chain can undergoracemization. Although it does not change the sequence, it does affect the chemical properties of the sequence. In particular, theL-amino acids normally found in proteins can spontaneously isomerize at the $\mathrm {C^{\alpha }}$ atom to formD-amino acids, which cannot be cleaved by mostproteases. Additionally,proline can form stable trans-isomers at the peptide bond.

Post-translational modification

[edit]

Additionally, the protein can undergo a variety ofpost-translational modifications, which are briefly summarized here.

The N-terminal amino group of a polypeptide can be modified covalently, e.g.,

acetylation $\mathrm {-C(=O)-CH_{3}}$

The positive charge on the N-terminal amino group may be eliminated by changing it to an acetyl group (N-terminal blocking).

formylation $\mathrm {-C(=O)H}$

The N-terminal methionine usually found after translation has an N-terminus blocked with a formyl group. This formyl group (and sometimes the methionine residue itself, if followed by Gly or Ser) is removed by the enzymedeformylase.

pyroglutamate

**Fig. 2** Formation of pyroglutamate from an N-terminal glutamine

An N-terminal glutamine can attack itself, forming a cyclic pyroglutamate group.

myristoylation $\mathrm {-C(=O)-\left(CH_{2}\right)_{12}-CH_{3}}$

Similar to acetylation. Instead of a simple methyl group, the myristoyl group has a tail of 14 hydrophobic carbons, which make it ideal for anchoring proteins tocellular membranes.

The C-terminal carboxylate group of a polypeptide can also be modified, e.g.,

amination (see Figure)

The C-terminus can also be blocked (thus, neutralizing its negative charge) by amination.

glycosyl phosphatidylinositol (GPI) attachment

Glycosyl phosphatidylinositol(GPI) is a large, hydrophobic phospholipid prosthetic group that anchors proteins tocellular membranes. It is attached to the polypeptide C-terminus through an amide linkage that then connects to ethanolamine, thence to sundry sugars and finally to the phosphatidylinositol lipid moiety.

Finally, the peptideside chains can also be modified covalently, e.g.,

phosphorylation

Aside from cleavage,phosphorylation is perhaps the most important chemical modification of proteins. A phosphate group can be attached to the sidechain hydroxyl group of serine, threonine and tyrosine residues, adding a negative charge at that site and producing an unnatural amino acid. Such reactions are catalyzed bykinases and the reverse reaction is catalyzed by phosphatases. The phosphorylated tyrosines are often used as "handles" by which proteins can bind to one another, whereas phosphorylation of Ser/Thr often induces conformational changes, presumably because of the introduced negative charge. The effects of phosphorylating Ser/Thr can sometimes be simulated by mutating the Ser/Thr residue to glutamate.

glycosylation

A catch-all name for a set of very common and very heterogeneous chemical modifications. Sugar moieties can be attached to the sidechain hydroxyl groups of Ser/Thr or to the sidechain amide groups of Asn. Such attachments can serve many functions, ranging from increasing solubility to complex recognition. All glycosylation can be blocked with certain inhibitors, such astunicamycin.

deamidation (succinimide formation)

In this modification, an asparagine or aspartate side chain attacks the following peptide bond, forming a symmetrical succinimide intermediate. Hydrolysis of the intermediate produces either aspartate or the β-amino acid, iso(Asp). For asparagine, either product results in the loss of the amide group, hence "deamidation".

hydroxylation

Proline residues may be hydroxylated at either of two atoms, as can lysine (at one atom).Hydroxyproline is a critical component ofcollagen, which becomes unstable upon its loss. The hydroxylation reaction is catalyzed by an enzyme that requiresascorbic acid (vitamin C), deficiencies in which lead to many connective-tissue diseases such asscurvy.

methylation

Several protein residues can be methylated, most notably the positive groups oflysine andarginine. Arginine residues interact with the nucleic acid phosphate backbone and commonly form hydrogen bonds with the base residues, particularlyguanine, in protein–DNA complexes. Lysine residues can be singly, doubly and even triply methylated. Methylation doesnot alter the positive charge on the side chain, however.

acetylation

Acetylation of the lysine amino groups is chemically analogous to the acetylation of the N-terminus. Functionally, however, the acetylation of lysine residues is used to regulate the binding of proteins to nucleic acids. The cancellation of the positive charge on the lysine weakens the electrostatic attraction for the (negatively charged) nucleic acids.

sulfation

Tyrosines may become sulfated on their

\mathrm {O^{\eta }}

atom. Somewhat unusually, this modification occurs in theGolgi apparatus, not in theendoplasmic reticulum. Similar to phosphorylated tyrosines, sulfated tyrosines are used for specific recognition, e.g., in chemokine receptors on the cell surface. As with phosphorylation, sulfation adds a negative charge to a previously neutral site.

prenylation andpalmitoylation $\mathrm {-C(=O)-\left(CH_{2}\right)_{14}-CH_{3}}$

The hydrophobic isoprene (e.g., farnesyl, geranyl, and geranylgeranyl groups) and palmitoyl groups may be added to the

\mathrm {S^{\gamma }}

atom of cysteine residues to anchor proteins tocellular membranes. Unlike theGPI and myritoyl anchors, these groups are not necessarily added at the termini.

carboxylation

A relatively rare modification that adds an extra carboxylate group (and, hence, a double negative charge) to a glutamate side chain, producing a Gla residue. This is used to strengthen the binding to "hard" metal ions such ascalcium.

ADP-ribosylation

The large ADP-ribosyl group can be transferred to several types of side chains within proteins, with heterogeneous effects. This modification is a target for the powerful toxins of disparate bacteria, e.g.,Vibrio cholerae,Corynebacterium diphtheriae andBordetella pertussis.

ubiquitination andSUMOylation

Various full-length, folded proteins can be attached at their C-termini to the sidechain ammonium groups of lysines of other proteins. Ubiquitin is the most common of these, and usually signals that the ubiquitin-tagged protein should be degraded.

Most of the polypeptide modifications listed above occurpost-translationally, i.e., after theprotein has been synthesized on theribosome, typically occurring in theendoplasmic reticulum, a subcellularorganelle of the eukaryotic cell.

Many other chemical reactions (e.g., cyanylation) have been applied to proteins by chemists, although they are not found in biological systems.

Cleavage and ligation

[edit]

In addition to those listed above, the most important modification of primary structure ispeptide cleavage (by chemicalhydrolysis or byproteases). Proteins are often synthesized in an inactive precursor form; typically, an N-terminal or C-terminal segment blocks theactive site of the protein, inhibiting its function. The protein is activated by cleaving off the inhibitory peptide.

Some proteins even have the power to cleave themselves. Typically, the hydroxyl group of a serine (rarely, threonine) or the thiol group of a cysteine residue will attack the carbonyl carbon of the preceding peptide bond, forming a tetrahedrally bonded intermediate [classified as a hydroxyoxazolidine (Ser/Thr) or hydroxythiazolidine (Cys) intermediate]. This intermediate tends to revert to the amide form, expelling the attacking group, since the amide form is usually favored by free energy, (presumably due to the strong resonance stabilization of the peptide group). However, additional molecular interactions may render the amide form less stable; the amino group is expelled instead, resulting in an ester (Ser/Thr) or thioester (Cys) bond in place of the peptide bond. This chemical reaction is called anN-O acyl shift.

The ester/thioester bond can be resolved in several ways:

Simple hydrolysis will split the polypeptide chain, where the displaced amino group becomes the new N-terminus. This is seen in the maturation of glycosylasparaginase.
A β-elimination reaction also splits the chain, but results in a pyruvoyl group at the new N-terminus. This pyruvoyl group may be used as a covalently attached catalytic cofactor in some enzymes, especially decarboxylases such asS-adenosylmethionine decarboxylase (SAMDC) that exploit the electron-withdrawing power of the pyruvoyl group.
Intramolecular transesterification, resulting in abranched polypeptide. Ininteins, the new ester bond is broken by an intramolecular attack by the soon-to-be C-terminal asparagine.
Intermolecular transesterification can transfer a whole segment from one polypeptide to another, as is seen in the Hedgehog protein autoprocessing.

History

[edit]

The proposal that proteins were linear chains of α-amino acids was made nearly simultaneously by two scientists at the same conference in 1902, the 74th meeting of the Society of German Scientists and Physicians, held in Karlsbad.Franz Hofmeister made the proposal in the morning, based on his observations of the biuret reaction in proteins. Hofmeister was followed a few hours later byEmil Fischer, who had amassed a wealth of chemical details supporting the peptide-bond model. For completeness, the proposal that proteins contained amide linkages was made as early as 1882 by the French chemist E. Grimaux.^[5]

Despite these data and later evidence that proteolytically digested proteins yielded only oligopeptides, the idea that proteins were linear, unbranched polymers of amino acids was not accepted immediately. Some scientists such asWilliam Astbury doubted that covalent bonds were strong enough to hold such long molecules together; they feared that thermal agitations would shake such long molecules asunder.Hermann Staudinger faced similar prejudices in the 1920s when he argued thatrubber was composed ofmacromolecules.^[5]

Thus, several alternative hypotheses arose. Thecolloidal protein hypothesis stated that proteins were colloidal assemblies of smaller molecules. This hypothesis was disproved in the 1920s by ultracentrifugation measurements byTheodor Svedberg that showed that proteins had a well-defined, reproducible molecular weight and by electrophoretic measurements byArne Tiselius that indicated that proteins were single molecules. A second hypothesis, thecyclol hypothesis advanced byDorothy Wrinch, proposed that the linear polypeptide underwent a chemical cyclol rearrangement C=O + HN $\rightarrow$ C(OH)-N that crosslinked its backbone amide groups, forming a two-dimensionalfabric. Other primary structures of proteins were proposed by various researchers, such as thediketopiperazine model ofEmil Abderhalden and thepyrrol/piperidine model of Troensegaard in 1942. Although never given much credence, these alternative models were finally disproved whenFrederick Sanger successfully sequencedinsulin^[when?] and by the crystallographic determination of myoglobin and hemoglobin byMax Perutz andJohn Kendrew^[when?].

Relation to secondary and tertiary structure

[edit]

Main article:Biomolecular structure

The primary structure of a biological polymer to a large extent determines the three-dimensional shape (tertiary structure). Protein sequence can be used topredict local features, such as segments of secondary structure, or trans-membrane regions. However, the complexity ofprotein folding currently prohibitspredicting the tertiary structure of a protein from its sequence alone. Knowing the structure of a similarhomologous sequence (for example a member of the sameprotein family) allows highly accurate prediction of thetertiary structure byhomology modeling. If the full-length protein sequence is available, it is possible to estimate its generalbiophysical properties, such as itsisoelectric point.

Notes and references

[edit]

^^a ^bSanger, F (1952). "The arrangement of amino acids in proteins". In Anson, M.L.; Bailey, Kenneth; Edsall, John T. (eds.).Advances in Protein Chemistry. Vol. 7. pp. 1–67.doi:10.1016/S0065-3233(08)60017-0.PMID 14933251.
^Aasland, Rein; Abrams, Charles; Ampe, Christophe; Ball, Linda J.; Bedford, Mark T.; Cesareni, Gianni; Gimona, Mario; Hurley, James H.; Jarchau, Thomas (2002-02-20)."Normalization of nomenclature for peptide motifs as ligands of modular protein domains".FEBS Letters.513 (1):141–144.Bibcode:2002FEBSL.513..141A.doi:10.1016/S0014-5793(01)03295-1.ISSN 1873-3468.PMID 11911894.
^IUPAC-IUB Commission on Biochemical Nomenclature (July 1968). "A One‐Letter Notation for Amino Acid Sequences: Tentative Rules".European Journal of Biochemistry.5 (2):151–153.doi:10.1111/j.1432-1033.1968.tb00350.x.
^^a ^bHausman, Robert E.; Cooper, Geoffrey M. (2004).The cell: a molecular approach. Washington, D.C.: ASM Press. p. 51.ISBN 978-0-87893-214-6.
^^a ^bFruton, Joseph S. (May 1979). "Early theories of protein structure".Annals of the New York Academy of Sciences.325 (1): xiv,1–18.Bibcode:1979NYASA.325....1F.doi:10.1111/j.1749-6632.1979.tb14125.x.PMID 378063.S2CID 39125170.

v t e Biomolecular structure
Protein	Primary Secondary Tertiary Quaternary Determination Prediction Design Thermodynamics
Nucleic acid	Primary Secondary Tertiary Quaternary Determination Prediction Design Thermodynamics
See also	Protein Protein domain Protein engineering Proteasome Nucleic acid DNA RNA Structural motif Nucleic acid double helix

Protein primary structure andposttranslational modifications

General

N terminus

C terminus

Single specificAAs

Serine/Threonine	Phosphorylation Dephosphorylation Glycosylation O-GlcNAc ADP-ribosylation
Tyrosine	Phosphorylation Dephosphorylation ADP-ribosylation Sulfation Porphyrin ring linkage Adenylylation Flavin linkage Topaquinone (TPQ) formation Detyrosination
Cysteine	Palmitoylation Prenylation
Aspartate	Succinimide formation ADP-ribosylation
Glutamate	Carboxylation ADP-ribosylation Methylation Polyglutamylation Polyglycylation
Asparagine	Deamidation Glycosylation
Glutamine	Transglutamination
Lysine	Methylation Acetylation Acylation Adenylylation Hydroxylation Ubiquitination Sumoylation ADP-ribosylation Deamination Oxidative deamination to aldehyde O-glycosylation Imine formation Glycation Carbamylation Succinylation Lactylation Propionylation Butyrylation
Arginine	Citrullination Methylation ADP-ribosylation
Proline	Hydroxylation
Histidine	Diphthamide formation Adenylylation
Tryptophan	C-mannosylation

Crosslinks between twoAAs

Cysteine–Cysteine	Disulfide bond ADP-ribosylation
Methionine–Hydroxylysine	Sulfilimine bond
Lysine–Tyrosine	Lysine tyrosylquinone (LTQ) formation
Tryptophan–Tryptophan	Tryptophan tryptophylquinone (TTQ) formation

Crosslinks between threeAAs

Serine–Tyrosine–Glycine	p-Hydroxybenzylidene-imidazolinone (HBI) formation (chromophore)
Histidine–Tyrosine–Glycine	4-(p-hydroxybenzylidene)-5-imidazolinone (HBI) formation (chromophore)
Alanine–Serine–Glycine	Methylidene-imidazolone (MIO) formation