qclip -- an Experiment File sequence clipper
Usage when confidence values are available (default mode):
qclip
[-c
] [-vt
] [-m
minimum_extent]
[-M
maximum_extent] [-w
window_length]
[-q
average_quality]
Usage when confidence values are not available or are to be ignored:
qclip
[-c
] [-vt
] [-m
minimum_extent]
[-M
maximum_extent] [-s
start_offset]
[-R
r_length]
[-r
r_unknown]
[-L
l_length] [-l
l_unknown]
Qclip
is a simple program to decide how much of the 5' and 3' ends of a
sequence, stored as an Experiment File, should be clipped off
i.e. marked to be ignored during assembly.
The decision is made either by analysing the average confidence levels
stored in the Experiment file (or an associated trace file), or by
counting the numbers of unknown bases (eg -
or N
) found within
windows slid left to right along the sequence.
Large numbers of files can be processed in a single run and each file
argument is assumed to be a valid Experiment File. The sequence
is read from the Experiment File SQ
record and the trace is read
using the LN
and LT
identifiers; clipping is performed
and QL
and QR
identifiers are appended to the file.
For the default mode of clipping by confidence levels, the program firstly
finds the region of highest average quality. A window is then slid from this
point both rightwards and leftwards until the average quality over that
window length (specified with the -w
argument) drops below the
average_quality argument. The exact position of the clip point within that
window is determined by successively decreasing the window length.
When confidence values are not available, or when the -n
argument is
used, only the sequence base calls are analysed. In this
case the right clip position is calculated by sliding a window of
length r_length
rightwards along the sequence, starting from base
start_offset
, and stopping when a window containing at least
r_unknown
unknown bases is found.
The left clip position is calculated by
sliding a window leftwards from base start_offset
. The
algorithm used is identical to the right clip position except that the
l_unknown
and l_length
parameters are used.
The default arguments are
"-c -m 0 -M 9999 -w 30 -q 10
."
-v
-t
-c
-n
-m
extent
QL
clip value of less than
extent, use extent as the QL
value.
-M
extent
QR
clip value of more than
extent, use extent as the QR
value.
-w
-q
-s
offset
-R
length
-r
unknown
-L
length
-l
unknown
To clip a batch of sequences listed in the `fofn' file with a minimum left clip value of 20 bases use:
qclip -m 20 `cat fofn`
See section ExperimentFile(4).