Protein Tertiary Structure: Motifs & Domains
Protein domains and motifs are two central elements of protein tertiary structure. Motifs are defined as recurring, conserved groups of secondary structure elements found in various protein families. Domains are independent folding units and represent the fundamental fold classification units.
Protein Motifs
Short turns and longer loops are very common in protein structures. By connecting differentsecondary structure elements, such as a β-strand to a β-strand, a β-strand to an α-helix, or an α-helix to another α-helix, they create various structural units called motifs, also called super-secondary structures. Motifs consist of two to three secondary structure elements and serve as building blocks in protein domain tertiary structure. Often, they also have a specific role in protein function. They can be considered as intermediates between the secondary and tertiary structures. It should also be noted here that the presence of one of the motifs in the tertiary structure of two different protein families does not imply their evolutionary relationships.
Generally, motifs are classified into three different types (with some examples):
- α-helical motifs (helix-turn-helix, helix-loop-helix (α-hairpin), leucine-zipper, EF-hand, coiled-coil, four-helix bundle)
- β-sheet motifs (β-hairpin, Greek key, β-meander)
- Mixed α/β motifs (β-α-β, α-β-hairpin)
An example of an α-helical motif that supports protein function in organisms is the EF-hand helix-loop-helix motif. The loop in this motif coordinates a Ca2+ ion to calmodulin. Another example is the leucine zipper, a two-helix motif. In this motif, the helices are packed against each other, with a hydrophobic contact area formed by the leucine side chains spaced every seven amino acids. Its function is dimerization (formation of dimers) in some DNA-binding proteins. An additional example is the helix-turn-helix motif, which is known for its role in DNA binding. One of the helices in the structure fits into the major groove of DNA, while the second helix stabilizes the interaction.
In the group of the β-sheet motifs, a hairpin, shown in the left image below, is probably the most widespread motif. It serves as a basic unit of antiparallel β-sheets.
The β-meander motif consists of three or more anti-parallel β-strands arranged as
hairpins. AS noted in theCATH database, a large sheet may comprise sequential meanders to form the up-down beta barrel.
The Greek key motif contains four antiparallel β-strands, within which one of the strand connections (between strands 1 and 2 in the right image) is not a hairpin. It was named after a similar pattern observed in some ancient Greek pottery. It is a stable structural element found in many proteins, including immunoglobulin and γ-crystallin domains (shown on the right). More β-sheet motifs with variations in b-strand and b-sheet arrangements can be found at theCATH domain classification database.


The amino acid sequences in loop regions connecting secondary structure elements in motifs are often highly variable within a protein family. As noted in the discussion ofprotein sequence alignment, this is clearly evident inmultiple sequence alignment, where loop regions often contain many insertions and deletions, making them difficult to align. Nevertheless, sometimes, when a loop has some specific function, for example, interaction with another protein domain, the sequence may be conserved. An interesting observation to mention is that loop length in organisms living at elevated temperatures (thermophilic organisms) is usually shorter than loops in the tertiary structure of proteins from lower-temperature (mesophilic) family members. A shorter loop provides a protein domain with additional tertiary structural stability at high temperatures, preventing possible unfolding and denaturation.

The image shows eye lens gammaS-crystallin with two C-terminal domains in thecrystal asymmetric unit in a ribbon representation. The domain is composed of two 4-stranded Greek key motifs packed against each other (PDB ID1HA4).
Protein Domains
Secondary structure elements and motifs are arranged to form the basic level of protein tertiary structure called adomain. From known experimental protein tertiary structures, we have learned that a domain has a unique arrangement of secondary structure elements that gives it a specificfold (ortopology, discussed further in theCATH tutorial). With tens of thousands of known structures, we now understand that there is only a limited number of stable folds. A fold is typically conserved within a specific protein family and is often linked to a particular role and function. While the amino acid sequences within a protein family are typically variable, the tertiary structure fold is highly conserved. However, it is worth noting that similar domain folds are often found in unrelated protein families. The question that arises, then, is how to interpret this kind of similarity and classify protein folds?
The answer is that all protein structures are related to each other at some level. They are all amino acid polypeptides that form secondary structure elements like helices and strands, and they even form similar supersecondary structures (motifs). It is the secondary structure content and the connectivity between secondary structure elements that create unique folds. Based on this idea, thefirst levelof separation among structures is determined by the secondary structure. There are three general classes: alpha (containing mostly α-helices), beta (mostly β-sheets), or mixed alpha/beta proteins. This level is called aClass in theCATH domain classification database. Thesecond level,called Architecture,describes the arrangement of secondary structure elements in space, irrespective of their connectivity. As we have learned from the discussion of specific motifs, the connectivity between secondary structure elements can vary widely, yielding different structural types. When we consider the connectivity, we arrive at thethird level of classification, which we refer to as afold (orTopology in CATH). However, proteins with similar folds are not necessarily evolutionarily related. This brings us to thefourth levelof classification, thehomologous superfamily. As the name implies, proteins belonging to the same homologous superfamily are evolutionarily related and have a common ancestor. This terminology helps us group domains by specific characteristics to better understand their relationships and functions. Keeping these relationships in mind will be helpful when we discuss some domain examples here. In theCATH tutorial, we will further explore the database.
To summarize, a domain can be characterized by the following:
- Spatially distinguishable folding unit of a protein tertiary structure.
- Serves as the basic unit for fold classification.
- It may have a specific, evolutionarily conserved function associated with it.
- Multidomain proteins may contain a domain (or more) that has a sequence and/or structural resemblance to protein domains in other protein families.
Here, I first will provide examples of one-domain proteins and their structures, followed by a discussion on multidomain proteins.
The Helix Bundle Domain
One of the most common structural domains in protein tertiary structure is the helix bundle domain, sometimes even called a structural motif. The images below illustrate three helix bundles with different interhelical connectivity. On the left is a stand-alone 4-helix bundle of a de novo–designed protein (PDB ID 1MFT); in the middle is the X-ray structure of humanO6-alkylguanine-DNA alkyltransferase, which consists of two domains, an α-β domain, and a helix bundle domain (PDB ID 1QNT). On the right is the tertiary structure of thehuman hormone progesterone receptor complexed with its ligand (PDB ID 1A28).To explore the structures in 3D, click the images to open the respective PDB entries.
Generally, in a helix bundle domain, the helices are interconnected by loop regions (either short or long) and often create a hydrophobic core at the center of the bundle. The examples in the figures demonstrate that the loops are relatively short in the left image, significantly longer in the middle image, and of mixed length in the right image. We should also be able to recognise an anti-parallel 3-strand β-sheet in the image in the middle.
The image below shows two schematic examples of possible helix connectivity in a 4-helix bundle. The upper packing (blue) shows an antiparallel bundle. The red bundle shows an example where two helices are parallel to each other and antiparallel to the second pair.

The Rossmann Fold Domain
An example of thestrand-helix-strand (orbeta-alpha-beta) motif is found in the Rossmann fold domain, named after Michael G. Rossmann, a protein crystallographer who solved the tertiary structure of lactate dehydrogenase (LDH), in which this domain was identified. The Rossmann fold is the only protein fold named after the person who discovered it. This domain type is widespread and can be found in many multidomain proteins that bind nucleotide cofactors, such as NADH, FAD, FMN, ATP, and GTP. An excellent discussion of details and the history of the Rossmann fold can be found on Proteopedia. There is also a short publication byI. Hanukoglu (2015) describing the topology and some subclasses of the domain. The authors note that Proteopedia includes a list of over 1,000 PDB structures with the Rossman fold domain!
The left image below illustrates a schematic representation of the Rossmann fold. It consists of a parallel 6-stranded β-sheet flanked by α-helices. The right image shows the 6-stranded parallel β-sheet of the Rossmann fold in the enzyme liver alcohol dehydrogenase (LADH). The NADH molecule bound at the top of the β-sheet is shown as a stick model (PDB2OHX). For clarity, the two helices positioned on top of the sheet are omitted from the image.


To explore this structure and its domains in depth, we will turn to CATH. A search by PDB ID will return a list of “matching CATH superfamilies”, “matching CATH domains”, and “matching structures”. Here, we are interested in the domains (image below). The enzyme has two subunits, each containing two domains. One of the domains is the Rosmann fold domain (the lower image), while the second is an alpha-beta-type quinone oxidoreductase fold. CATH provides an opportunity to explore each of these domains in great detail. We will look at an example in the next section.

The TIM barrel domain
Triose phosphate Isomerase (PDB ID1TIM) is a single-domain protein and the first representative of the TIM barrel fold. The enzyme catalyzes the isomerization of a ketose (DHAP) to an aldose (GAP). The fold is widespread and can be found in many proteins. In the tertiary structure, the strands of the β-sheet are parallel and linked by loops and helices. Details of the mechanism and function of this protein can be found on Proteopedia. On the left image is a schematic presentation of the fold, while on the right is a ribbon presentation of the tertiary structure of the triose phosphate isomerase dimer. Clicking on the image will take you to the PDB page.
Multidomain Proteins Example: Pyruvate Kinase
This is an example of a 3-domain protein: pyruvate kinase (PDB ID 1PKN). On the right is the ribbon representation of the tertiary structure. In most organisms, the functional unit of pyruvate kinase is a tetramer (four subunits), yielding a total of 12 domains in a functional unit. The domains are well-separated and exhibit different folds. However, it may be difficult to see the details of all domains and their folds in the image, or even in the interactive PDB viewer it links to. The best way is to ask CATH for assistance again. If we enter the PDB code (1PKN) into the CATH search field and then click “Matching CATH Domains”, we get the details of all three domains of pyruvate kinase. From there, we can continue exploring each domain separately and studying its tertiary structure: fold, potential function, family, etc.
The first is the C-terminal domain. It contains a 5-stranded anti-parallel b-sheet flanked by helices on both sides. The CATH Architector is a 3-layer (αβα) sandwich. Topology is pyruvate kinase, and the Homologous superfamily is Pyruvate kinase, C-terminal domain. The second domain’s Architecture is an Alpha-Beta barrel, and the Topology is a TIM barrel, just like the above example of Triose phosphate Isomerase. The third domain has a beta-barrel Architecture, and the Topology is M1 pyruvatekinase. It is actually the kinase domain, which catalyses the reaction: ATP + pyruvate = ADP + phosphoenolpyruvate.
Of course we can continue our explorations of each domain in CATH to find out more information about the protein families the domains belong to, evolutionary relationships, and much more.
The following section provides a detailedexample of the use of the CATH database.

Concluding Remarks
Listed here are just a few common domain folds. Other broadly known domain folds include the immunoglobulin fold, OB fold (oligonucleotide/oligosaccharide binding), Zinc finger domain fold, SH2 and SH3 domains, the PDZ domain, and many others. Currently, CATH contains more than 600000 domains grouped into over 6500 superfamilies. We will explore some of them in theCATH tutorial.
We should also remember that the tertiary structure is a “product” of the protein sequence. Defining the domain within a multidomain protein is just the first step in the study process. Another essential step is to analyse the domain’s amino acid sequence to establish sequence-structure relationships and the relationships with other sequences within the protein family. An alignment will reveal conserved sequence motifs, the positions of functionally important amino acids within the tertiary structure, etc. Please see ourintoruction to protein sequence alignment andmultiple sequence alignment pages.





