Pregap4 - Pregap4-Modules-Trace Difference

Trace Difference

Description: This module compares each sequence chromatogram against a "wild type" or reference chromatogram to detect point mutations. The mutations are detected by aligning and subtracting each trace from the wild type trace to produce a "difference trace". The difference trace is then analysed to identify point mutations which are written back to the Experiment File as MUTA tags. The basic idea is explained in the paper Bonfield, J.K., Rada, C. and Staden, R. Automated detection of point mutations using fluorescent sequence trace subtraction. Nucleic Acids Res. 26, 3404-3409 (1998). This implementation is the second version of the algorithm. The previous version used basecalls to do trace alignment. This led to problems when bases were called in error (often the case around mutations). The new algorithm ignores the basecalls completely and aligns the trace signals themselves, avoiding such problems. This is much more computationally intensive, but it has proved to be fast enough for interactive use. If the input files have sequenced from both strands then two wild type sequences may be given. In order for pregap4 to choose the appropriate wild type trace it needs to know the strand for each input sequence, which is typically generated using the naming convention. A simple naming scheme is provided with pregap4 (in the lib/pregap4/naming_schemes directory) called "mutation_detection.p4t". This can be loaded from the pregap4 file menu. It assumes that trace names have an 'f' or 'r' suffix, denoting the forward and reverse strands respectively. If you need something more complex, then you'll have to create and load your own naming scheme. If pregap4 cannot determine the strand, or if only one wild type is specified, then each input sequence will be compared against the +ve strand wild type. The reference or wild type traces for tracediff are specified in the see section Reference Traces and Reference Sequences.
Option: Sensitivity: This threshold is used to determine when an above/below baseline double peak in the difference trace is considered to be a mutation. It is specified in standard deviations from the mean over the analysis window. The higher the value, the more stringent the test. This value is reduced dynamically by the algorithm in the presense of mutations since small mutations near larger ones can often be missed with a uniform sensitivity setting. It's likely that some experimentation with this parameter will be required for optimal mutation detection in your data.
Option: Noise threshold: This threshold is used to filter out low level noise during the analysis phase. It is specified as a percentage of the maximum peak-to-peak trace difference value. A high threshold will lead to fewer false positives but you run the additional risk of missing low level mutations.
Option: Analysis window length: Analysis of the trace difference is done over a local region to counter the effects of non-stationarity in the trace signal. The analysis region is defined by a short window whose length is specified in bases. The window is asymmetric in that it's located to the left of the base it's positioned on. This avoids measurement problems when mutations are encountered. The window size is a tradeoff. If it's too big, low level mutations may be missed. If it's too small, there may be insufficient data to give unbiased measurements leading to many false positives.
Option: Maximum peak alignment deviation: The centres of each individual half-peak of a double peak above and below the baseline must align reasonably well for them to be considered to be real mutations. The amount of half-peak alignment deviation allowable is specified in bases by this parameter, usually as a fraction of one base.
Option: Maximum peak width: During analysis, the width of each peak is measured to avoid problems caused by gel artifacts. These often appear as broad peaks that overlay many bases. The maximum peak width is specified in bases. A lower value will lead to fewer false positives, but you run the additional risk of missing smeared mutations towards the end of a trace.
Option: Complement bases on reverse strand tags: After mutation detection and after readings have been assembled into a GAP4 database, GAP4 displays both forward and reverse readings in a single direction in the contig editor. This makes it much easier to compare sequences and traces in both directions simultaneously. When the corresponding traces are displayed, any reverse strand traces are complemented automatically such that the bases are interchanged. In this case, the original mutation tag generated by tracediff will then be of the wrong sense, so if checked, this option complements the tag base labels to match the complemented trace displayed by GAP4.
Option: Write difference traces out to disk: After trace difference analysis, the generated traces are normally discarded and not written to disk. Checking this option lets you save the trace difference files to the same directory as the original traces. The .ZTR trace format is used for this purpose. The original filename is retained and a "_diff.ztr" suffix is appended.

This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/pregap4_unix_37.html