copy_reads -- copies overlapping reads from a source database to a destination database
Usage:
copy_reads
[-win
] [-source_trace_dir
directory of source traces]
[-contigs_from
file of contigs in source database]
[-min_contig_len
minimum contig length]
[-min_average_qual
minimum average read quality]
[-contigs_to
file of contigs in destination database]
[-mask
masking mode]
[-tag_types
list of tag types]
[-word_length
word length]
[-min_overlap
minimum overlap]
[-max_pmismatch
maximum percentage mismatch]
[-min_match
minimum match]
[-band
use banding algorithm]
[-display_cons
display consensus alignments]
[-align_max_mism
maximum percent mismatch]
[-display_seq
display reading alignments]
source database
destination database
During large scale sequencing projects where the genome is cloned into e.g.
BACs prior to being subcloned into sequencing vectors it is generally
the case that the ends of the DNA from one BAC will overlap that of two other
BACs. Unless it is being used for quality control, it is a waste of time to
sequence the overlapping regions twice, and so most labs transfer the relevant
data between the adjacent gap4 databases. This is the function of copy_reads
which copies readings from a "source" database to a "destination" database.
The consensus sequences for
user selected contigs in each of the two databases are compared in both
orientations. If an overlapping region is found, readings of sufficient
quality are automatically assembled into the destination database. In
the source database readings which have been added to the destination
database will be tagged with a "LENT" tag and the equivalent readings in
the destination databse will be tagged with a "BORO" (borrowed) tag.
-win
-
Bring up a dialogue window
-source_trace_dir
directory of source traces
-
The location of the traces of the source database can either be
specified by giving the directory name or if this is not specified,
determined from the rawdata note (see section Trace File Location) held within the database. The program
will add the location of the source traces into the
rawdata note of the destination database. If the environment variable
RAWDATA is set, this will be taken to be the location of the destination
database traces and will also be added to the rawdata note
of the destination database. If there are no traces for the source
database, no rawdata note will be created.
-contigs_from
file of contigs in source database
-
One or more contigs from the source database can be compared. These are
selected either by providing a file containing a list of contig names
(any reading name from within that contig, typically the first reading
name). If no file is specified, all contigs will be compared.
-min_contig_len
minimum contig length
-
Only contigs in the source database over a user defined length will be
used. The default is 2000 bases.
-min_average_qual
minimum average read quality
-
A minimum reading quality can be set so that only readings with an
average quality over the specified amount will be entered into the
destination database. The default is 30.0.
-contigs_to
file of contigs in destination database
-
One or more contigs from the destination database can be compared. These are
selected either by providing a file containing a list of contig names
(any reading name from within that contig, typically the first reading
name). If no file is specified, all contigs will be compared.
-mask
masking mode
-
The consensus sequence is determined for each contig in both databases
using either the standard consensus algorithm (none) or "Mask active tags" (mask).
Masking the active tags means that
all segments covered by tags that are "active" will not be used by the
matching algorithms. A typical use of this mode is to avoid finding
matches in segments covered by tags of type ALUS (ie segments thought to
be Alu sequence) or REPT (ie segment that are known to be repeated
elsewhere in the data (see section Tag types). The default
is none.
-tag_types
list of tag types
-
A list of tag types to be used when the -mask option (above) is specified
to be in "mask" mode.
-word_length
word length
-
The consensus searching parameters are equivalent to those found in the
find internal joins algorithm (see section Find Internal Joins).
The search algorithm first finds matching words of length Word
length. Possible values are 4 or 8. The default is 8.
-min_overlap
minimum overlap
-
The search algorithm only considers overlaps of length at least
Minimum overlap. The default is 20.
-max_pmismatch
maximum percentage mismatch
-
Only alignments better than Maximum percent mismatch will be reported.
The default is 30.0.
-min_match
minimum match
-
The algorithm considers in its initial phase only matching segments of
length Minimum initial match length. However it
does a dynamic programming alignment of all the chunks between the
matching segments, and so produces an optimal alignment. The default is
15.
-band
use banding algorithm
-
A banded dynamic algorithm can be selected, but as this only applies to
the chunks between matching segments, which for good alignments will be
very short and it should make little difference to the speed. Possible
values are 0 (no) or 1 (yes). The default is 1.
-display_cons
display consensus alignments
-
This allows the alignments between the consensus sequences to be
displayed.
-align_max_mism
maximum percent mismatch
-
If a match between two consensus sequences is found, the
readings in that overlap are assembled into the destination database
using the "directed assembly" function (see section Directed Assembly). Only readings for which the maximum
percent mismatch is not exceeded, and which have an average
reading quality higher than the specified minimum, will be entered into
the database. The default value is 10.0.
-display_seq
display reading alignments
-
This allows the alignments between the source database readings and the
destination consensus to be displayed.
To copy readings from `source_db' to `destination_db' and display
the consensus match
copy_reads -display_cons source_db destination_db
This page is maintained by
staden-package.
Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/manpages_unix_3.html