Programming with Gap4 - G4Comm-get

get_consensus

get_consensus
 -io            io_handle
 -contigs       identifiers:strings
 -outfile       filename:string
?-type          type:string(normal)?
?-mask          mask:string(none)?
?-tag_types     types:string()?
?-win_size      length:integer(0)?
?-max_dashes    count:integer(0)?
?-format        format:integer(3)?
?-annotations   annotations:integer(0)?
?-truncate      truncate:integer(0)?

This command calculates the consensus sequence for one or more contigs and saves it to a file. The function returns no value but will generate a Tcl error if an error occurs.

-io io_handle

The database IO handle returned from a previous open_db call.

-contigs identifiers

This specifies the list of contigs to search. The {contig start end} syntax may be used for an identifier to search only a region of the contig, otherwise all of it is searched.

-outfile filename

Specifies the filename to write the consensus sequence too. This has no default value.

-type type

This specifies the final output type for the consensus algorithm. Valid types are:

normal

The standard consensus sequence consisting of A, C, G, T, - and *.

extended

As per normal, except the cutoff data at the ends of contigs is used to provide consensus sequence beyond the well defined contig ends.

unfinished

The consensus sequence in single stranded regions is output as a, c, g and t whilst the consensus for finished regions is listed as d, e f and i (for a, c, g and t respectively). The quality of each base is output instead of the consensus base. The base quality is listed as a single letter from the following table showing the quality of each strand independently.

a: Good Good (in agreement)
b: Good Bad
c: Bad Good
d: Good None
e: None Good
f: Bad Bad
g: Bad None
h: None Bad
i: Good Good (disagree)
j: None None

-win_size length

-max_dashes count

These are only of use during the extended consensus type. The criteria for determining how much cutoff sequence to output is selected as the portion where there are no more than count unknown ("-") bases are found within a region of length bases. The defaults are 0 for both, which implies that only used data should be output.

-format format

Specifies the output format of the file to be created. All formats can be written for all consensus types, but some may not be legal (eg Fasta files containing quality codes instead of sequence). The available formats are:

1: Staden format
2: Fasta format
3: Experiment File format

The default is 3.

-annotations annotations

This controls whether to output annotations. This is only of used in the Experiment File output format. Note that with the extended consensus type the annotation positions are still for the normal consensus; this is a bug which will only be fixed if it is considered useful. A non-zero value will output annotations. The default is 0, which is to not output annotations.

-truncate truncate

This controls whether annotations within or overlapping the cutoff data will be output. A non-zero value will not output annotations within the cutoff data. The default is 0.

-mask mask

-tag_types types

If types is a non blank list of tag types then masking or marking will be applied to the sequence covered by tags of these types from. When mask is "mask" the sequence is converted to an alternative character set (d, e, f and i for Experiment Files and Staden format and ns for Fasta format). When mask is "mark" the sequence is in lowercase. The defaults are" none" for mask and a blank string for the tag types, which disables masking and marking. Masking and marking is only used in the normal and extended consensus types.

This page is maintained by staden-package. Last generated on 1 March 2001.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/scripting_89.html