 
   
   
   
   
copy_reads -- copies overlapping reads from a source database to a destination database
Usage:
copy_reads [-win] [-source_trace_dir directory of source traces]
               [-contigs_from file of contigs in source database] 
               [-min_contig_len minimum contig length] 
               [-min_average_qual minimum average read quality] 
               [-contigs_to file of contigs in destination database] 
               [-mask masking mode] 
               [-tag_types list of tag types] 
               [-word_length word length] 
               [-min_overlap minimum overlap] 
               [-max_pmismatch maximum percentage mismatch] 
               [-min_match minimum match] 
               [-band use banding algorithm] 
               [-display_cons display consensus alignments] 
               [-align_max_mism maximum percent mismatch] 
               [-display_seq display reading alignments] 
               source database
               destination database
During large scale sequencing projects where the genome is cloned into e.g.
BACs prior to being subcloned into sequencing vectors it is generally 
the case that the ends of the DNA from one BAC will overlap that of two other
BACs. Unless it is being used for quality control, it is a waste of time to
sequence the overlapping regions twice, and so most labs transfer the relevant
data between the adjacent gap4 databases. This is the function of copy_reads
which copies readings from a "source" database to a "destination" database.
The consensus sequences for
user selected contigs in each of the two databases are compared in both
orientations. If an overlapping region is found, readings of sufficient
quality are automatically assembled into the destination database. In 
the source database readings which have been added to the destination
database will be tagged with a "LENT" tag and the equivalent readings in
the destination databse will be tagged with a "BORO" (borrowed) tag.
- -win
- 
     Bring up a dialogue window
- -source_trace_dirdirectory of source traces
- 
     The location of the traces of the source database can either be
     specified by giving the directory name or if this is not specified,
     determined from the rawdata note (see section Trace File Location) held within the database. The program
     will add the location of the source traces into the
     rawdata note of the destination database. If the environment variable
     RAWDATA is set, this will be taken to be the location of the destination
     database traces and will also be added to the rawdata note
     of the destination database. If there are no traces for the source
     database, no rawdata note will be created.
- -contigs_fromfile of contigs in source database
- 
     One or more contigs from the source database can be compared. These are
     selected either by providing a file containing a list of contig names 
     (any reading name from within that contig, typically the first reading 
     name). If no file is specified, all contigs will be compared.
- -min_contig_lenminimum contig length
- 
     Only contigs in the source database over a user defined length will be 
     used. The default is 2000 bases.
- -min_average_qualminimum average read quality
- 
      A minimum reading quality can be set so that only readings with an 
      average quality over the specified amount will be entered into the 
      destination database. The default is 30.0.
- -contigs_tofile of contigs in destination database
- 
     One or more contigs from the destination database can be compared. These are
     selected either by providing a file containing a list of contig names 
     (any reading name from within that contig, typically the first reading 
     name). If no file is specified, all contigs will be compared.
- -maskmasking mode
- 
     The consensus sequence is determined for each contig in both databases
     using either the standard consensus algorithm (none) or "Mask active tags" (mask).
     Masking the active tags means that
     all segments covered by tags that are "active" will not be used by the
     matching algorithms. A typical use of this mode is to avoid finding
     matches in segments covered by tags of type ALUS (ie segments thought to
     be Alu sequence) or REPT (ie segment that are known to be repeated
     elsewhere in the data (see section Tag types). The default
     is none.
- -tag_typeslist of tag types
- 
     A list of tag types to be used when the -mask option (above) is specified
     to be in "mask" mode.
- -word_lengthword length
- 
     The consensus searching parameters are equivalent to those found in the
     find internal joins algorithm (see section Find Internal Joins). 
     The search algorithm first finds matching words of length Word
     length. Possible values are 4 or 8. The default is 8. 
- -min_overlapminimum overlap
- 
     The search algorithm only considers overlaps of length at least 
     Minimum overlap. The default is 20.
- -max_pmismatchmaximum percentage mismatch
- 
     Only alignments better than Maximum percent mismatch will be reported.
     The default is 30.0.
- -min_matchminimum match
- 
     The algorithm considers in its initial phase only matching segments of 
     length Minimum initial match length. However it
     does a dynamic programming alignment of all the chunks between the
     matching segments, and so produces an optimal alignment. The default is
     15.
- -banduse banding algorithm
- 
     A banded dynamic algorithm can be selected, but as this only applies to 
     the chunks between matching segments, which for good alignments will be 
     very short and it should make little difference to the speed. Possible
     values are 0 (no) or 1 (yes). The default is 1. 
- -display_consdisplay consensus alignments
- 
     This allows the alignments between the consensus sequences to be 
     displayed.
- -align_max_mismmaximum percent mismatch
- 
     If a match between two consensus sequences is found, the
     readings in that overlap are assembled into the destination database
     using the "directed assembly" function (see section Directed Assembly). Only readings for which the maximum
     percent mismatch is not exceeded, and which have an average
     reading quality higher than the specified minimum, will be entered into 
     the database. The default value is 10.0.
- -display_seqdisplay reading alignments
- 
     This allows the alignments between the source database readings and the 
     destination consensus to be displayed.
To copy readings from `source_db' to `destination_db' and display
the consensus match
copy_reads -display_cons source_db destination_db
 
   
   
   
   
This page is maintained by
staden-package.
Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/manpages_unix_3.html