first previous next last contents

Qclip

NAME

qclip -- an Experiment File sequence clipper

SYNOPSIS

Usage when confidence values are available (default mode):

qclip [-c] [-vt] [-m minimum_extent] [-M maximum_extent] [-w window_length]

[-q average_quality]

Usage when confidence values are not available or are to be ignored:

qclip [-c] [-vt] [-m minimum_extent] [-M maximum_extent] [-s start_offset] [-R r_length]

[-r r_unknown] [-L l_length] [-l l_unknown]

DESCRIPTION

Qclip is a simple program to decide how much of the 5' and 3' ends of a sequence, stored as an Experiment File, should be clipped off i.e. marked to be ignored during assembly.

The decision is made either by analysing the average confidence levels stored in the Experiment file (or an associated trace file), or by counting the numbers of unknown bases (eg - or N) found within windows slid left to right along the sequence.

Large numbers of files can be processed in a single run and each file argument is assumed to be a valid Experiment File. The sequence is read from the Experiment File SQ record and the trace is read using the LN and LT identifiers; clipping is performed and QL and QR identifiers are appended to the file.

For the default mode of clipping by confidence levels, the program firstly finds the region of highest average quality. A window is then slid from this point both rightwards and leftwards until the average quality over that window length (specified with the -w argument) drops below the average_quality argument. The exact position of the clip point within that window is determined by successively decreasing the window length.

When confidence values are not available, or when the -n argument is used, only the sequence base calls are analysed. In this case the right clip position is calculated by sliding a window of length r_length rightwards along the sequence, starting from base start_offset, and stopping when a window containing at least r_unknown unknown bases is found. The left clip position is calculated by sliding a window leftwards from base start_offset. The algorithm used is identical to the right clip position except that the l_unknown and l_length parameters are used.

The default arguments are "-c -m 0 -M 9999 -w 30 -q 10."

OPTIONS

-v
Enable verbose output. This outputs information on which files are currently being clipped.
-t
Test mode. The QL and QR information is written to stdout instead of being appended to the Experiment file.
-c
Clip by confidence levels. This is the default mode of operation.
-n
Clip by unknown base calls, even when confidence values are available.
-m extent
If the clip algorithm returns a QL clip value of less than extent, use extent as the QL value.
-M extent
If the clip algorithm returns a QR clip value of more than extent, use extent as the QR value.
-w
Only used for the confidence level clipping mode. The window length over which to compute the average confidence value.
-q
Only used for the confidence level clipping mode. The minimum average confidence in any given window for this window to be considered as good quality sequence.
-s offset
Only used for the unknown base clipping mode. Force the first window to start the calculations from position offset in the sequence. This can be useful to avoid poor data at the 5' end of a sequence.
-R length
Only used for the unknown base clipping mode. Set the length for the first rightwards window to length
-r unknown
Only used for the unknown base clipping mode. Stop sliding the first rightwards window when there are greater than or equal to unknown bases within the current window.
-L length
Only used for the unknown base clipping mode. Set the length for the second rightwards window to length. Setting this value to zero prevents the second window calculations from being performed.
-l unknown
Only used for the unknown base clipping mode. Stop sliding the second rightwards window when there are greater than or equal to unknown bases within the current window.

EXAMPLE

To clip a batch of sequences listed in the `fofn' file with a minimum left clip value of 20 bases use:

qclip -m 20 `cat fofn`

SEE ALSO

See section ExperimentFile(4).


first previous next last contents
This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/manpages_unix_13.html