The purpose of this function (which is invoked from the gap4 View menu) is to use sequences already in the database to find possible joins between contigs. Generally these will be joins that were missed or judged to be unsafe during assembly and this function allows users to examine the overlaps and decide if they should be made. During assembly joins may have been missed because of poor data, or not been made because the sequence was repetitive. Also it may be possible to find potential joins by extending the consensus sequences with the data from the 3' ends of readings which was considered to be too unreliable to align during assembly i.e. we can search in the "hidden data".
If it has not already occurred, use of this function will automatically transform the Contig Selector into the Contig Comparator. Each match found is plotted as a diagonal line in the Contig Comparator, and is written as an alignment in the Output Window. The length of the diagonal line is proportional to the length of the aligned region. If the match is for two contigs in the same orientation the diagonal will be parallel to the main diagonal, if they are not in the same orientation the line will be perpendicular to the main diagonal. The matches displayed in the Contig Comparator can be used to invoke the Join Editor (see section The Join Editor) or Contig Editor. See section Editing in gap4. Alternatively, the "Next" button at the top left of the Contig Comparator can be used to select each result in turn, starting with the best, and ending with the worst. When this is in use, users can find the match in the Contig Comparator which corresponds to the next result by placing the cursor over the Next button. The plotted match and the contigs involved will turn white.
A typical display from the Contig Comparator is shown in the figure above.
To define the match all numbering is relative to base number one in the contig: matches to the left (i.e. in the hidden data) have negative positions, matches off the right end of the contig (i.e. in the hidden data) have positions greater than that of the contig length. The convention for reporting the positions of overlaps is as follows: if neither contig needs to be complemented the positions are as shown. If the program says "contig x in the - sense" then the positions shown assume contig x has been complemented. For example, in the results given below the positions for the first overlap are as reported, but those for the second assume that the contig in the minus sense (i.e. 443) has been complemented.
Possible join between contig 445 in the + sense and contig 405 Percentage mismatch after alignment = 4.9 412 422 432 442 452 462 405 TTTCCCGACT GGAAAGCGGG CAGTGAGCGC AACGCAATTA ATGTGAG,TT AGCTCACTCA ::::::::: : :::::::: ::::: ::: :::::::::: :::::::::: :::::::::: 445 *TTCCCGACT G,AAAGCGGG TAGTGA,CGC AACGCAATTA ATGTGAG*TT AGCTCACTCA -127 -117 -107 -97 -87 -77 472 482 492 502 512 405 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT :::::::::: :::::::::: :::::::::: :::::::::: :: 445 TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCGGCTCGT AT -67 -57 -47 -37 -27 Possible join between contig 443 in the - sense and contig 423 Percentage mismatch after alignment = 10.4 64 74 84 94 104 114 423 ATCGAAGAAA GAAAAGGAGG AGAAGATGAT TTTAAAAATG AAACG*CGAT GTCAGATGGG :::: ::::: :::::::::: :::::::::: :::::: :: ::::: :::: ::::::::: 443 ATCG,AGAAA GAAAAGGAGG AGAAGATGAT TTTAAA,,TG AAACGACGAT GTCAGATGG, 3610 3620 3630 3640 3650 3660 124 134 144 154 164 423 TTG*ATGAAG TAGAAGTAGG AG*AGGTGGA AGAGAAGAGA GTGGGA ::: :::::: :::::::::: :: ::::::: ::: ::::: :: :: 443 TTGGATGAAG TAGAAGTAGG AGGAGGTGGA ,GAG,AGAGA GTTGG* 3670 3680 3690 3700 3710