This module compares each sequence chromatogram against a "wild type" or
reference chromatogram to detect point mutations. The mutations are
detected by aligning and subtracting each trace from the wild type trace to
produce a "difference trace". The difference trace is then analysed to
identify point mutations which are written back to the Experiment File as
MUTA tags. The basic idea is explained in the paper Bonfield,
J.K., Rada, C. and Staden, R. Automated detection of point mutations using
fluorescent sequence trace subtraction. Nucleic Acids Res. 26, 3404-3409 (1998).
This implementation is the second version of the algorithm. The previous
version used basecalls to do trace alignment. This led to problems when
bases were called in error (often the case around mutations). The new algorithm
ignores the basecalls completely and aligns the trace signals themselves,
avoiding such problems. This is much more computationally intensive, but it
has proved to be fast enough for interactive use.
If the input files have sequenced from both strands then two wild type
sequences may be given. In order for pregap4 to choose the appropriate wild
type trace it needs to know the strand for each input sequence, which is
typically generated using the naming convention. A simple naming scheme is
provided with pregap4 (in the lib/pregap4/naming_schemes directory) called
"mutation_detection.p4t". This can be loaded from the pregap4 file menu. It
assumes that trace names have an 'f' or 'r' suffix, denoting the forward and
reverse strands respectively. If you need something more complex, then you'll
have to create and load your own naming scheme. If pregap4 cannot determine
the strand, or if only one wild type is specified, then each input sequence
will be compared against the +ve strand wild type.
The reference or wild type traces for tracediff are specified in the
see section Reference Traces and Reference Sequences.
- Option: Sensitivity
This threshold is used to determine when an above/below baseline double
peak in the difference trace is considered to be a mutation. It is specified
in standard deviations from the mean over the analysis window. The higher the
value, the more stringent the test. This value is reduced dynamically
by the algorithm in the presense of mutations since small mutations near
larger ones can often be missed with a uniform sensitivity setting. It's
likely that some experimentation with this parameter will be required for
optimal mutation detection in your data.
- Option: Noise threshold
This threshold is used to filter out low level noise during the analysis
phase. It is specified as a percentage of the maximum peak-to-peak trace
difference value. A high threshold will lead to fewer false positives but
you run the additional risk of missing low level mutations.
- Option: Analysis window length
Analysis of the trace difference is done over a local region to counter
the effects of non-stationarity in the trace signal. The analysis region is
defined by a short window whose length is specified in bases. The window is
asymmetric in that it's located to the left of the base it's positioned on.
This avoids measurement problems when mutations are encountered. The window
size is a tradeoff. If it's too big, low level mutations may be missed. If
it's too small, there may be insufficient data to give unbiased measurements
leading to many false positives.
- Option: Maximum peak alignment deviation
The centres of each individual half-peak of a double peak above and below
the baseline must align reasonably well for them to be considered to be
real mutations. The amount of half-peak alignment deviation allowable is
specified in bases by this parameter, usually as a fraction of one base.
- Option: Maximum peak width
During analysis, the width of each peak is measured to avoid problems caused
by gel artifacts. These often appear as broad peaks that overlay many bases.
The maximum peak width is specified in bases. A lower value will lead to
fewer false positives, but you run the additional risk of missing smeared
mutations towards the end of a trace.
- Option: Complement bases on reverse strand tags
After mutation detection and after readings have been assembled into a GAP4
database, GAP4 displays both forward and reverse readings in a single direction
in the contig editor. This makes it much easier to compare sequences and traces
in both directions simultaneously. When the corresponding traces are displayed,
any reverse strand traces are complemented automatically such that the bases are
interchanged. In this case, the original mutation tag generated by tracediff will
then be of the wrong sense, so if checked, this option complements the tag base
labels to match the complemented trace displayed by GAP4.
- Option: Write difference traces out to disk
After trace difference analysis, the generated traces are normally discarded and not
written to disk. Checking this option lets you save the trace difference files to
the same directory as the original traces. The .ZTR trace format is used for this
purpose. The original filename is retained and a "_diff.ztr" suffix is appended.
This page is maintained by
Last generated on 22 October 2002.