Phred (Phil'sReadEditor[1]) is a computer program forbase calling, that is to say, identifying anucleobase sequence from fluorescence "trace" data generated by an automatedDNA sequencer that useselectrophoresis and 4-fluorescent dye method.[2][3] When originally developed, Phred produced significantly fewer errors in the data sets examined than other methods, averaging 40–50% fewer errors.Phred quality scores have become widely accepted to characterize the quality of DNA sequences, and can be used to compare the efficacy of different sequencing methods.
The fluorescent-dyeDNAsequencing is amolecular biology technique that involves labeling single-strandDNA sequences of varied length with 4 fluorescent dyes (corresponding to 4 differentbases used in DNA) and subsequently separating the DNA sequences by "slab gel"- or capillary-electrophoresis method (seeDNA Sequencing). The electrophoresis run is monitored by aCCD on the DNA sequencer and this produces a time "trace" data (or "chromatogram") of the fluorescent "peaks" that passed the CCD point. Examining the fluorescence peaks in the trace data, we can determine the order of individual bases (nucleobase) in theDNA. Since the intensity, shape and the location of a fluorescence peak are not always consistent or unambiguous, however, sometimes it is difficult or time-consuming to determine (or "call") the correct bases for the peaks accurately if it is done manually.
AutomatedDNA sequencing techniques have revolutionized the field ofmolecularbiology – generating vast amounts of DNA sequence data. However, the sequence data is produced at a significantly higher rate than can be manually processed (i.e. interpreting the trace data to produce the sequence data), thereby creating a bottleneck. To remove the bottleneck, both automated software that can speed up the processing with improved accuracy and a reliable measure of the accuracy are needed. To meet this need, manysoftware programs have been developed. One such program is Phred.
Phred was originally conceived in the early 1990s byPhil Green, then a professor atWashington University in St. Louis.LaDeana Hillier,Michael Wendl, David Ficenec, Tim Gleeson, Alan Blanchard, andRichard Mott also contributed to the codebase and algorithm. Green moved toUniversity of Washington in the mid 1990s, after which development was primarily managed by himself and Brent Ewing. Phred played a notable role in theHuman Genome Project, where large amounts of sequence data were processed by automated scripts. It was at the time the most widely used base-calling software program by both academic and commercial DNA sequencing laboratories because of its highbase calling accuracy.[4] Phred is distributed commercially byCodonCode Corporation, and used to perform the "Call bases" function in the programCodonCode Aligner. It is also used by theMacVector plugin Assembler.
Phred uses a four-phase procedure as outlined by Ewinget al. to determine a sequence of base calls from the processed DNA sequence tracing:
The entire procedure is rapid, usually taking less than half a second per trace. The results can be output as a PHD file, which contains base data as triples consisting of the base call, quality, and position.[5]
Phred is often used together with another software program calledPhrap, which is a program for DNA sequence assembly. Phrap was routinely used in some of the largest sequencing projects in the Human Genome Sequencing Project and is currently one of the most widely used DNA sequence assembly programs in the biotech industry. Phrap uses Phred quality scores to determine highly accurate consensus sequences and to estimate the quality of the consensus sequences. Phrap also uses Phred quality scores to estimate whether discrepancies between two overlapping sequences are more likely to arise from random errors, or from different copies of a repeated sequence.