Using CLUSTAL for multiple sequence alignments
- PMID:8743695
- DOI: 10.1016/s0076-6879(96)66024-8
Using CLUSTAL for multiple sequence alignments
Abstract
We have tested CLUSTAL W in a wide variety of situations, and it is capable of handling some very difficult protein alignment problems. If the data set consists of enough closely related sequences so that the first alignments are accurate, then CLUSTAL W will usually find an alignment that is very close to ideal. Problems can still occur if the data set includes sequences of greatly different lengths or if some sequences include long regions that are impossible to align with the rest of the data set. Trying to balance the need for long insertions and deletions in some alignments with the need to avoid them in others is still a problem. The default values for our parameters were tested empirically using test cases of sets of globular proteins where some information as to the correct alignment was available. The parameter values may not be very appropriate with nonglobular proteins. We have argued that using one weight matrix and two gap penalties is too simplistic to be of general use in the most difficult cases. We have replaced these parameters with a large number of new parameters designed primarily to help encourage gaps in loop regions. Although these new parameters are largely heuristic in nature, they perform surprisingly well and are simple to implement. The underlying speed of the progressive alignment approach is not adversely affected. The disadvantage is that the parameter space is now huge; the number of possible combinations of parameters is more than can easily be examined by hand. We justify this by asking the user to treat CLUSTAL W as a data exploration tool rather than as a definitive analysis method. It is not sensible to automatically derive multiple alignments and to trust particular algorithms as being capable of always getting the correct answer. One must examine the alignments closely, especially in conjunction with the underlying phylogenetic tree (or estimate of it) and try varying some of the parameters. Outliers (sequences that have no close relatives) should be aligned carefully, as should fragments of sequences. The program will automatically delay the alignment of any sequences that are less than 40% identical to any others until all other sequences are aligned, but this can be set from a menu by the user. It may be useful to build up an alignment of closely related sequences first and to then add in the more distant relatives one at a time or in batches, using the profile alignments and weighting scheme described earlier and perhaps using a variety of parameter settings. We give one example using SH2 domains. SH2 domains are widespread in eukaryotic signalling proteins where they function in the recognition of phosphotyrosine-containing peptides. In the chapter by Bork and Gibson ([11], this volume), Blast and pattern/profile searches were used to extract the set of known SH2 domains and to search for new members. (Profiles used in database searches are conceptually very similar to the profiles used in CLUSTAL W: see the chapters [11] and [13] for profile search methods.) The profile searches detected SH2 domains in the JAK family of protein tyrosine kinases, which were thought not to contain SH2 domains. Although the JAK family SH2 domains are rather divergent, they have the necessary core structural residues as well as the critical positively charged residue that binds phosphotyrosine, leaving no doubt that they are bona fide SH2 domains. The five new JAK family SH2 domains were added sequentially to the existing alignment of 65 SH2 domains using the CLUSTAL W profile alignment option. Figure 6 shows part of the resulting alignment. Despite their divergent sequences, the new SH2 domains have been aligned nearly perfectly with the old set. No insertions were placed in the original SH2 domains. In this example, the profile alignment procedure has produced better results than a one-step full alignment of all 70 SH2 domains, and in considerably less time. (ABSTRACT TRUNCATED)
Similar articles
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.Thompson JD, Higgins DG, Gibson TJ.Thompson JD, et al.Nucleic Acids Res. 1994 Nov 11;22(22):4673-80. doi: 10.1093/nar/22.22.4673.Nucleic Acids Res. 1994.PMID:7984417Free PMC article.
- Combining multiple structure and sequence alignments to improve sequence detection and alignment: application to the SH2 domains of Janus kinases.Al-Lazikani B, Sheinerman FB, Honig B.Al-Lazikani B, et al.Proc Natl Acad Sci U S A. 2001 Dec 18;98(26):14796-801. doi: 10.1073/pnas.011577898.Proc Natl Acad Sci U S A. 2001.PMID:11752426Free PMC article.
- The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence.Hickson RE, Simon C, Perrey SW.Hickson RE, et al.Mol Biol Evol. 2000 Apr;17(4):530-9. doi: 10.1093/oxfordjournals.molbev.a026333.Mol Biol Evol. 2000.PMID:10742045
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ.Altschul SF, et al.Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. doi: 10.1093/nar/25.17.3389.Nucleic Acids Res. 1997.PMID:9254694Free PMC article.Review.
- Bioinformatics in protein analysis.Persson B.Persson B.EXS. 2000;88:215-31. doi: 10.1007/978-3-0348-8458-7_14.EXS. 2000.PMID:10803381Review.
Cited by
- Characterization of genes involved in cytokinin signaling and metabolism from rice.Tsai YC, Weir NR, Hill K, Zhang W, Kim HJ, Shiu SH, Schaller GE, Kieber JJ.Tsai YC, et al.Plant Physiol. 2012 Apr;158(4):1666-84. doi: 10.1104/pp.111.192765. Epub 2012 Mar 1.Plant Physiol. 2012.PMID:22383541Free PMC article.
- Evolution and functional characterisation of melanopsins in a deep-sea chimaera (elephant shark, Callorhinchus milii).Davies WI, Tay BH, Zheng L, Danks JA, Brenner S, Foster RG, Collin SP, Hankins MW, Venkatesh B, Hunt DM.Davies WI, et al.PLoS One. 2012;7(12):e51276. doi: 10.1371/journal.pone.0051276. Epub 2012 Dec 14.PLoS One. 2012.PMID:23251480Free PMC article.
- Microarray analysis of LTR retrotransposon silencing identifies Hdac1 as a regulator of retrotransposon expression in mouse embryonic stem cells.Reichmann J, Crichton JH, Madej MJ, Taggart M, Gautier P, Garcia-Perez JL, Meehan RR, Adams IR.Reichmann J, et al.PLoS Comput Biol. 2012;8(4):e1002486. doi: 10.1371/journal.pcbi.1002486. Epub 2012 Apr 26.PLoS Comput Biol. 2012.PMID:22570599Free PMC article.
- Obscurin, a giant sarcomeric Rho guanine nucleotide exchange factor protein involved in sarcomere assembly.Young P, Ehler E, Gautel M.Young P, et al.J Cell Biol. 2001 Jul 9;154(1):123-36. doi: 10.1083/jcb.200102110.J Cell Biol. 2001.PMID:11448995Free PMC article.
- Dinitroanilines bind alpha-tubulin to disrupt microtubules.Morrissette NS, Mitra A, Sept D, Sibley LD.Morrissette NS, et al.Mol Biol Cell. 2004 Apr;15(4):1960-8. doi: 10.1091/mbc.e03-07-0530. Epub 2004 Jan 23.Mol Biol Cell. 2004.PMID:14742718Free PMC article.
MeSH terms
Substances
Related information
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous