After the initial rounds of assembly it is likely that the data for a sequencing project will still not be contiguous. In order to minimise the number of experiments required to finish the project it is useful to be able to get as much from the existing data as possible. The functions described in this section can help to get the current set of contigs into a consistent left to right order, can discover joins between contigs which were missed or overlooked by the assembly engines, and can help in the analysis of repeats which may cause problems for assembly. It is one of the strengths of gap4 that the results from several of these independent types of analysis can be combined in a single display (see section Contig Comparator), and where they are seen to reinforce one another, users can feel more confident in their decisions.
A typical Contig Comparator display is shown in the figure above. It is
showing results from other functions, as well as the ones described
in this section.
The first function
(see section Order Contigs)
automatically orders contigs based on read-pair data. The orderings
found can be examined in the Template Display
(see section Template Display)
The next function
(see section Find read pairs)
also examines read-pair data, but instead of automatically ordering the
contigs, plots out their relationships in the Contig Comparator, from
where the user can invoke the Template Display to check them, and use
the Contig Selector
to reorder them.
Sometimes assembly engines will miss or regard some weak joins as too
uncertain to be made. The Find Internal Joins function
(see section Find Internal Joins),
compares contigs, including their hidden data, to find matches between
the ends of contigs.
Again results are presented in the Contig
Comparator, and users can invoke the Contig Joining Editor
(see section The Join Editor)
to examine and make joins.
Whereas Find Internal Joins makes sure that alignments between contigs
continue right to their ends, another search, Find Repeats
(see section Find Repeats)
finds any identical segments of sequence, wherever they lie in the
consensus. This has several uses. It gives another way of finding
potential joins, and it provides a way of anotating (tagging) repeats so
that their positions are obvious to users, and can be taken into account
by other search procedures.
Again results are presented in the Contig
Comparator, and users can invoke the Contig Joining Editor
(see section The Join Editor)
to examine and make joins.
(Click for full size image)
This page is maintained by
staden-package.
Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_unix_93.html