- Description
-
This module determines where the sequence quality is too poor to use for
reliable assembly. It supercedes the Uncalled Base Clip module. This uses the
qclip
program which reads and writes to Experiment Files. Its default
quality evaluation is based on the range of values produced by the Estimate
Base Accuracies module (quality value 70, averaged over 100 bases). For use
with phred, try lower values such as quality value 15 averaged over 50 bases.
When quality values are not available it will use the same method as the
Uncalled Base Clip module; to analyse the base calls and count the number of
undetermined bases within a given window of sequence. Both 5' and 3' ends may
be quality clipped.
For the confidence mode of clipping the method starts from the point of
highest average quality, and then steps outwards in both directions until the
average quality is below a defined threshold.
For the sequence mode of clipping the method starts from a defined position
and steps outwards in both directions until the number of uncalled bases
within a given window length exceeds a predefined threshold. For
more details see the qclip
documentation
(see section qclip).
Note that the Phrap assembly algorithm works best without quality clipping and
it can make use of the full length of readings (due to the use of the Phred
confidence values).
- Option: Clip mode
-
This may be one of "by sequence" or "by confidence". The "by sequence" mode is
equivalent to the Uncalled Clip module. The "by confidence" mode uses
Phred-scaled confidence values to determine the quality for clipping. This
does not work with
eba
confidence values.
- Option: Minimum extent
-
The lowest allowable 5' clip position.
- Option: Maximum extent
-
The largest allowable 3' clip position.
- Option: Minimum length
-
If after quality clipping the good portion of a sequence is shorter than the
specified length, then this file will be rejected with the message "qclip:
Sequence too short".
- Option: Window length
-
The window length over which the confidence will be averaged.
This option is only relevant for the "clip by confidence" mode.
- Option: Average confidence
-
The minimum average confidence (over `window length' bases) for sequence to be
accepted as good quality.
This option is only relevant for the "clip by confidence" mode.
- Option: Start offset
-
The base number to start the 5' and 3' good quality searches from.
This option is only relevant for the "clip by sequence" mode.
- Option: 3' window length
-
The window length in which to count uncalled bases.
This option is only relevant for the "clip by sequence" mode.
- Option: 3' number of uncalled bases
-
The maximum allowed count of uncalled bases in a single window length.
This option is only relevant for the "clip by sequence" mode.
- Option: 5' window length
-
The window length in which to count uncalled bases.
This option is only relevant for the "clip by sequence" mode.
- Option: 5' number of uncalled bases
-
The maximum allowed count of uncalled bases in a single window length.
This option is only relevant for the "clip by sequence" mode.
This page is maintained by
staden-package.
Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/pregap4_unix_24.html