Copy_reads

NAME

copy_reads -- copies overlapping reads from a source database to a destination database

copy_reads [-win] [-source_trace_dir directory of source traces] [-contigs_from file of contigs in source database] [-min_contig_len minimum contig length] [-min_average_qual minimum average read quality] [-contigs_to file of contigs in destination database] [-mask masking mode] [-tag_types list of tag types] [-word_length word length] [-min_overlap minimum overlap] [-max_pmismatch maximum percentage mismatch] [-min_match minimum match] [-band use banding algorithm] [-display_cons display consensus alignments] [-align_max_mism maximum percent mismatch] [-display_seq display reading alignments] source database destination database

DESCRIPTION

During large scale sequencing projects where the genome is cloned into e.g. BACs prior to being subcloned into sequencing vectors it is generally the case that the ends of the DNA from one BAC will overlap that of two other BACs. Unless it is being used for quality control, it is a waste of time to sequence the overlapping regions twice, and so most labs transfer the relevant data between the adjacent gap4 databases. This is the function of copy_reads which copies readings from a "source" database to a "destination" database.

The consensus sequences for user selected contigs in each of the two databases are compared in both orientations. If an overlapping region is found, readings of sufficient quality are automatically assembled into the destination database. In the source database readings which have been added to the destination database will be tagged with a "LENT" tag and the equivalent readings in the destination databse will be tagged with a "BORO" (borrowed) tag.

OPTIONS

-win: Bring up a dialogue window
-source_trace_dir directory of source traces: The location of the traces of the source database can either be specified by giving the directory name or if this is not specified, determined from the rawdata note (see section Trace File Location) held within the database. The program will add the location of the source traces into the rawdata note of the destination database. If the environment variable RAWDATA is set, this will be taken to be the location of the destination database traces and will also be added to the rawdata note of the destination database. If there are no traces for the source database, no rawdata note will be created.
-contigs_from file of contigs in source database: One or more contigs from the source database can be compared. These are selected either by providing a file containing a list of contig names (any reading name from within that contig, typically the first reading name). If no file is specified, all contigs will be compared.
-min_contig_len minimum contig length: Only contigs in the source database over a user defined length will be used. The default is 2000 bases.
-min_average_qual minimum average read quality: A minimum reading quality can be set so that only readings with an average quality over the specified amount will be entered into the destination database. The default is 30.0.
-contigs_to file of contigs in destination database: One or more contigs from the destination database can be compared. These are selected either by providing a file containing a list of contig names (any reading name from within that contig, typically the first reading name). If no file is specified, all contigs will be compared.
-mask masking mode: The consensus sequence is determined for each contig in both databases using either the standard consensus algorithm (none) or "Mask active tags" (mask). Masking the active tags means that all segments covered by tags that are "active" will not be used by the matching algorithms. A typical use of this mode is to avoid finding matches in segments covered by tags of type ALUS (ie segments thought to be Alu sequence) or REPT (ie segment that are known to be repeated elsewhere in the data (see section Tag types). The default is none.
-tag_types list of tag types: A list of tag types to be used when the -mask option (above) is specified to be in "mask" mode.
-word_length word length: The consensus searching parameters are equivalent to those found in the find internal joins algorithm (see section Find Internal Joins). The search algorithm first finds matching words of length Word length. Possible values are 4 or 8. The default is 8.
-min_overlap minimum overlap: The search algorithm only considers overlaps of length at least Minimum overlap. The default is 20.
-max_pmismatch maximum percentage mismatch: Only alignments better than Maximum percent mismatch will be reported. The default is 30.0.
-min_match minimum match: The algorithm considers in its initial phase only matching segments of length Minimum initial match length. However it does a dynamic programming alignment of all the chunks between the matching segments, and so produces an optimal alignment. The default is 15.
-band use banding algorithm: A banded dynamic algorithm can be selected, but as this only applies to the chunks between matching segments, which for good alignments will be very short and it should make little difference to the speed. Possible values are 0 (no) or 1 (yes). The default is 1.
-display_cons display consensus alignments: This allows the alignments between the consensus sequences to be displayed.
-align_max_mism maximum percent mismatch: If a match between two consensus sequences is found, the readings in that overlap are assembled into the destination database using the "directed assembly" function (see section Directed Assembly). Only readings for which the maximum percent mismatch is not exceeded, and which have an average reading quality higher than the specified minimum, will be entered into the database. The default value is 10.0.
-display_seq display reading alignments: This allows the alignments between the source database readings and the destination consensus to be displayed.

EXAMPLE

To copy readings from `source_db' to `destination_db' and display the consensus match

copy_reads -display_cons source_db destination_db

This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/manpages_unix_3.html

Copy_reads

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

EXAMPLE