first previous next last contents

Summary of Gap4's Functions

The tasks which gap4 can perform can be roughly divided into assembly, finishing, and editing. But gap4 contains many other functions which can help to complete a sequencing project with the minimum amount of effort, and some of these are listed below.

Readings are entered into the gap4 database using the assembly algorithms. In general these algorithms will build the largest contigs they can by finding overlaps between the readings, however some, perhaps more doubtful, joins between contigs may be missed, and these can be discovered, checked and made using Find Internal Joins, Find repeats and Join Contigs. Find Internal Joins compares the ends of contigs to see if there are possible overlaps and then presents the overlap in the Contig Joining Editor, from where the user can view the traces, make edits and join the contigs. Find Repeats can be used in a similar way, but unlike Find Internal Joins it does not require the matches it finds to continue to the ends of contigs.

Read-pair data can be used to automatically put contigs into the correct order, and information about contigs which share templates can be plotted out. The relationships of readings and templates, within and between contigs can also be shown by the Template Display which has a wide selection of display modes and uses.

Problems with the assembly can be revealed by use of Check Assembly, Find repeats, and Restriction Enzyme mapping. Check Assembly compares every reading with the segment of the consensus it overlaps to see how well it aligns. Those that align poorly are plotted out in the Contig Comparator. Find Repeats also presents its results in the Contig Comparator, so if used in conjunction with Check Assembly, it can show cases where readings have been assembled into the wrong copy of a repeated element. At the end of a project the Restriction Enzyme map function can be used to compare the consensus sequence with a restriction digest of the target sequence. Problems can also be found by use of the various Coverage Plots available in the Consistency Display. These plots will show regions of low or high reading coverage, places with data for only one strand, or where there is no read-pair coverage. Errors can be corrected by Disassemble Readings and Break Contig which can remove readings from contigs or databases or can break contigs.

The general level of completeness of the consensus sequence can be seen diagrammatically using the Quality Plot, and the confidence values for each base in the consensus sequence can be plotted.

The most powerful component of gap4 is its Contig Editor. which has many display modes and search facilities to enable very rapid discovery and fixing of base call errors.

If working on a protein coding sequence, the consensus can be analysed using the Stop Codon Map, and its translation viewed using the Contig Editor.

The final result from a sequencing project is a consensus sequence.


first previous next last contents
This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/mini_unix_5.html