get_consensus
-io io_handle
-contigs identifiers:strings
-outfile filename:string
?-type type:string(normal)?
?-mask mask:string(none)?
?-tag_types types:string()?
?-win_size length:integer(0)?
?-max_dashes count:integer(0)?
?-format format:integer(3)?
?-annotations annotations:integer(0)?
?-truncate truncate:integer(0)?
This command calculates the consensus sequence for one or more contigs and
saves it to a file. The function returns no value but will generate a Tcl
error if an error occurs.
-io
io_handle
-
The database IO handle returned from a previous
open_db
call.
-contigs
identifiers
-
This specifies the list of contigs to search. The {contig start end}
syntax may be used for an identifier to search only a region of the
contig, otherwise all of it is searched.
-outfile
filename
-
Specifies the filename to write the consensus sequence too. This has no
default value.
-type
type
-
This specifies the final output type for the consensus algorithm. Valid
types are:
normal
-
The standard consensus sequence consisting of A, C, G, T, - and *.
extended
-
As per
normal
, except the cutoff data at the ends of contigs is used to
provide consensus sequence beyond the well defined contig ends.
unfinished
-
The consensus sequence in single stranded regions is output as a, c, g and
t whilst the consensus for finished regions is listed as d, e f and i (for
a, c, g and t respectively). The quality of each base is output instead
of the consensus base. The base quality is listed as a single letter from
the following table showing the quality of each strand independently.
- a
-
Good Good (in agreement)
- b
-
Good Bad
- c
-
Bad Good
- d
-
Good None
- e
-
None Good
- f
-
Bad Bad
- g
-
Bad None
- h
-
None Bad
- i
-
Good Good (disagree)
- j
-
None None
-win_size
length
-
-max_dashes
count
-
These are only of use during the extended consensus type. The criteria
for determining how much cutoff sequence to output is selected as the
portion where there are no more than count unknown ("-") bases are
found within a region of length bases. The defaults are 0 for both,
which implies that only used data should be output.
-format
format
-
Specifies the output format of the file to be created. All formats can be
written for all consensus types, but some may not be legal (eg Fasta files
containing quality codes instead of sequence). The available formats are:
1
-
Staden format
2
-
Fasta format
3
-
Experiment File format
The default is 3.
-annotations
annotations
-
This controls whether to output annotations. This is only of used in the
Experiment File output format. Note that with the extended consensus type
the annotation positions are still for the normal consensus; this is a bug
which will only be fixed if it is considered useful. A non-zero value will
output annotations. The default is 0, which is to not output annotations.
-truncate
truncate
-
This controls whether annotations within or overlapping the cutoff data will
be output. A non-zero value will not output annotations within the cutoff
data. The default is 0.
-mask
mask
-
-tag_types
types
-
If types is a non blank list of tag types then masking or marking will
be applied to the sequence covered by tags of these types from. When
mask is "
mask
" the sequence is converted to an alternative
character set (d, e, f and i for Experiment Files
and Staden format and ns for Fasta format). When mask is
"mark
" the sequence is in lowercase. The defaults are"
none
" for mask and a blank string for the tag types, which
disables masking and marking. Masking and marking is only used in the
normal and extended consensus types.
This page is maintained by
staden-package.
Last generated on 1 March 2001.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/scripting_89.html