first previous next last contents

Find Internal Joins

The purpose of this function (which is invoked from the gap4 View menu) is to use sequences already in the database to find possible joins between contigs. Generally these will be joins that were missed or judged to be unsafe during assembly and this function allows users to examine the overlaps and decide if they should be made. During assembly joins may have been missed because of poor data, or not been made because the sequence was repetitive. Also it may be possible to find potential joins by extending the consensus sequences with the data from the 3' ends of readings which was considered to be too unreliable to align during assembly i.e. we can search in the "hidden data".

If it has not already occurred, use of this function will automatically transform the Contig Selector into the Contig Comparator. Each match found is plotted as a diagonal line in the Contig Comparator, and is written as an alignment in the Output Window. The length of the diagonal line is proportional to the length of the aligned region. If the match is for two contigs in the same orientation the diagonal will be parallel to the main diagonal, if they are not in the same orientation the line will be perpendicular to the main diagonal. The matches displayed in the Contig Comparator can be used to invoke the Join Editor (see section The Join Editor) or Contig Editor. See section Editing in gap4. Alternatively, the "Next" button at the top left of the Contig Comparator can be used to select each result in turn, starting with the best, and ending with the worst. When this is in use, users can find the match in the Contig Comparator which corresponds to the next result by placing the cursor over the Next button. The plotted match and the contigs involved will turn white.

[picture]
(Click for full size image)

A typical display from the Contig Comparator is shown in the figure above.

To define the match all numbering is relative to base number one in the contig: matches to the left (i.e. in the hidden data) have negative positions, matches off the right end of the contig (i.e. in the hidden data) have positions greater than that of the contig length. The convention for reporting the positions of overlaps is as follows: if neither contig needs to be complemented the positions are as shown. If the program says "contig x in the - sense" then the positions shown assume contig x has been complemented. For example, in the results given below the positions for the first overlap are as reported, but those for the second assume that the contig in the minus sense (i.e. 443) has been complemented.

Possible join between contig   445 in the + sense and contig   405
Percentage mismatch after alignment =  4.9
       412        422        432        442        452        462
    405  TTTCCCGACT GGAAAGCGGG CAGTGAGCGC AACGCAATTA ATGTGAG,TT AGCTCACTCA
          ::::::::: : ::::::::  ::::: ::: :::::::::: :::::::::: ::::::::::
    445  *TTCCCGACT G,AAAGCGGG TAGTGA,CGC AACGCAATTA ATGTGAG*TT AGCTCACTCA
      -127       -117       -107        -97        -87        -77
       472        482        492        502        512
    405  TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT
         :::::::::: :::::::::: :::::::::: :::::::::: ::
    445  TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT
       -67        -57        -47        -37        -27
Possible join between contig   443 in the - sense and contig   423
Percentage mismatch after alignment = 10.4
        64         74         84         94        104        114
    423  ATCGAAGAAA GAAAAGGAGG AGAAGATGAT TTTAAAAATG AAACG*CGAT GTCAGATGGG
         :::: ::::: :::::::::: :::::::::: ::::::  :: ::::: :::: :::::::::
    443  ATCG,AGAAA GAAAAGGAGG AGAAGATGAT TTTAAA,,TG AAACGACGAT GTCAGATGG,
      3610       3620       3630       3640       3650       3660
       124        134        144        154        164
    423  TTG*ATGAAG TAGAAGTAGG AG*AGGTGGA AGAGAAGAGA GTGGGA
         ::: :::::: :::::::::: :: :::::::  ::: ::::: :: ::
    443  TTGGATGAAG TAGAAGTAGG AGGAGGTGGA ,GAG,AGAGA GTTGG*
      3670       3680       3690       3700       3710

first previous next last contents
This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_unix_98.html