first previous next last contents

Ordering and Joining Contigs

After the initial rounds of assembly it is likely that the data for a sequencing project will still not be contiguous. In order to minimise the number of experiments required to finish the project it is useful to be able to get as much from the existing data as possible. The functions described in this section can help to get the current set of contigs into a consistent left to right order, can discover joins between contigs which were missed or overlooked by the assembly engines, and can help in the analysis of repeats which may cause problems for assembly. It is one of the strengths of gap4 that the results from several of these independent types of analysis can be combined in a single display (see section Contig Comparator), and where they are seen to reinforce one another, users can feel more confident in their decisions.

[picture]
(Click for full size image)

A typical Contig Comparator display is shown in the figure above. It is showing results from other functions, as well as the ones described in this section.

The first function (see section Order Contigs) automatically orders contigs based on read-pair data. The orderings found can be examined in the Template Display (see section Template Display)

The next function (see section Find read pairs) also examines read-pair data, but instead of automatically ordering the contigs, plots out their relationships in the Contig Comparator, from where the user can invoke the Template Display to check them, and use the Contig Selector to reorder them.

Sometimes assembly engines will miss or regard some weak joins as too uncertain to be made. The Find Internal Joins function (see section Find Internal Joins), compares contigs, including their hidden data, to find matches between the ends of contigs. Again results are presented in the Contig Comparator, and users can invoke the Contig Joining Editor (see section The Join Editor) to examine and make joins.

Whereas Find Internal Joins makes sure that alignments between contigs continue right to their ends, another search, Find Repeats (see section Find Repeats) finds any identical segments of sequence, wherever they lie in the consensus. This has several uses. It gives another way of finding potential joins, and it provides a way of anotating (tagging) repeats so that their positions are obvious to users, and can be taken into account by other search procedures. Again results are presented in the Contig Comparator, and users can invoke the Contig Joining Editor (see section The Join Editor) to examine and make joins.


first previous next last contents
This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_unix_93.html