Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Consensus sequence

From Wikipedia, the free encyclopedia
Most common variant of a genetic sequence across samples
This articlemay be too technical for most readers to understand. Pleasehelp improve it tomake it understandable to non-experts, without removing the technical details.(May 2023) (Learn how and when to remove this message)

Inmolecular biology andbioinformatics, theconsensus sequence (orcanonical sequence) is the calculated sequence of most frequent residues, eithernucleotide oramino acid, found at each position in asequence alignment. It represents the results of multiplesequence alignments in which related sequences are compared to each other and similarsequence motifs are calculated. Such information is important when considering sequence-dependent enzymes such asRNA polymerase.[1]

To address the limitations of consensus sequences—which reduce variability to a single residue per position—sequence logos provide a richer visual representation of aligned sequences. Logos display each position as a stack of letters (nucleotides oramino acids), where the height of a letter corresponds to its frequency in the alignment, and the total stack height reflects theinformation content (measured inbits). The most frequent residue appears at the top of the stack, preserving the consensus while also revealing subtle patterns, such as functionally important but less frequent residues (e.g., alternative startcodons ortranscription factor binding sites).[2]

Example of consensus sequence of nucleotides

Biological significance

[edit]

A protein binding site, represented by a consensus sequence, may be a short sequence ofnucleotides which is found several times in thegenome and is thought to play the same role in its different locations. For example, manytranscription factors recognize particular patterns in thepromoters of thegenes they regulate. In the same way,restriction enzymes usually havepalindromic consensus sequences, usually corresponding to the site where they cut the DNA.Transposons act in much the same manner in their identification of target sequences for transposition. Finally,splice sites (sequences immediately surrounding theexon-intron boundaries) can also be considered as consensus sequences.

Thus a consensus sequence is a model for a putativeDNA binding site: it is obtained by aligning all known examples of a certain recognition site and defined as the idealized sequence that represents the predominant base at each position. All the actual examples shouldn't differ from the consensus by more than a few substitutions, but counting mismatches in this way can lead to inconsistencies.[3]

Any mutation allowing a mutated nucleotide in the core promoter sequence to look more like the consensus sequence is known as anup mutation. This kind of mutation will generally make the promoter stronger, and thus the RNA polymerase forms a tighter bind to the DNA it wishes to transcribe and transcription is up-regulated. On the contrary, mutations that destroy conserved nucleotides in the consensus sequence are known asdown mutations. These types of mutations down-regulate transcription since RNA polymerase can no longer bind as tightly to the core promoter sequence.

Sequence analysis

[edit]

Developing software forpattern recognition is a major topic ingenetics,molecular biology, andbioinformatics. Specificsequence motifs can function asregulatory sequences controlling biosynthesis, or assignal sequences that direct a molecule to a specific site within the cell or regulate its maturation. Since the regulatory function of these sequences is important, they are thought to be conserved across long periods ofevolution. In some cases, evolutionary relatedness can be estimated by the amount of conservation of these sites.

Notation

[edit]

The conserved sequence motifs are calledconsensus sequences and they show which residues are conserved and which residues are variable. Consider the following exampleDNA sequence:

A[CT]N{A}YR

In thisnotation, A means that an A is always found in that position; [CT] stands for either C or T; N stands for any base; and {A} means any base except A. Y represents anypyrimidine, and R indicates anypurine.In this example, the notation [CT] does not give any indication of the relative frequency of C or T occurring at that position. And it is not possible to write it as a single consensus sequence e.g. ACNCCA. An alternative method of representing a consensus sequence uses asequence logo. This is a graphical representation of the consensus sequence, in which the size of a symbol is related to the frequency that a given nucleotide (or amino acid) occurs at a certain position. In sequence logos the more conserved the residue, the larger the symbol for that residue is drawn; the less frequent, the smaller the symbol. Sequence logos can be generated usingWebLogo, or using theGestalt Workbench, a publicly available visualization tool written by Gustavo Glusman at theInstitute for Systems Biology.[3]

Software

[edit]

Bioinformatics tools are able to calculate and visualize consensus sequences. Examples of the tools areJalView andUGENE.

See also

[edit]

References

[edit]
  1. ^Pierce, Benjamin A. 2002. Genetics : A Conceptual Approach. 1st ed. New York: W.H. Freeman and Co.
  2. ^Schneider TD; Stephens RM (1990)."Sequence Logos: A New Way to Display Consensus Sequences".Nucleic Acids Res.18 (20):6097–6100.doi:10.1093/nar/18.20.6097.PMC 332411.PMID 2172928.
  3. ^abSchneider TD (2002)."Consensus Sequence Zen".Appl Bioinform.1 (3):111–119.PMC 1852464.PMID 15130839.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Consensus_sequence&oldid=1304415323"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp