The tasks which gap4 can perform can be roughly divided into assembly (see section Assembly Introduction), finishing (see section Finishing Experiments), and editing (see section Editor introduction). But gap4 contains many other functions which can help to complete a sequencing project with the minimum amount of effort, and some of these are listed below.
Readings are entered into the gap4 database using the assembly algorithms (see section Assembly Introduction). In general these algorithms will build the largest contigs they can by finding overlaps between the readings, however some, perhaps more doubtful, joins between contigs may be missed, and these can be discovered, checked and made using Find Internal Joins (see section Find Internal Joins), Find repeats (see section Find repeats) and Join Contigs (see section The Join Editor). Find Internal Joins compares the ends of contigs to see if there are possible overlaps and then presents the overlap in the Contig Joining Editor, from where the user can view the traces, make edits and join the contigs. Find Repeats can be used in a similar way, but unlike Find Internal Joins it does not require the matches it finds to continue to the ends of contigs.
Read-pair data can be used to automatically put contigs into the correct order (see section Ordering Contigs), and information about contigs which share templates can be plotted out (see section Find Read Pairs). The relationships of readings and templates, within and between contigs can also be shown by the Template Display (see section Template Display) which has a wide selection of display modes and uses.
Problems with the assembly can be revealed by use of Check Assembly (see section Checking Assemblies), Find repeats (see section Find repeats), and Restriction Enzyme mapping (see section Plotting Restriction Enzymes). Check Assembly compares every reading with the segment of the consensus it overlaps to see how well it aligns. Those that align poorly are plotted out in the Contig Comparator. Find Repeats also presents its results in the Contig Comparator, so if used in conjunction with Check Assembly, it can show cases where readings have been assembled into the wrong copy of a repeated element. At the end of a project the Restriction Enzyme map function can be used to compare the consensus sequence with a restriction digest of the target sequence. Problems can also be found by use of the various Coverage Plots available in the Consistency Display (see section Consistency Display). These plots will show regions of low or high reading coverage (see section Reading Coverage Histogram), places with data for only one strand (see section Strand Coverage), or where there is no read-pair coverage (see section Read-Pair Coverage Histogram). Errors can be corrected by Disassemble Readings (see section Disassembling Readings) and Break Contig (see section Breaking Contigs) which can remove readings from contigs or databases or can break contigs.
The general level of completeness of the consensus sequence can be seen diagrammatically using the Quality Plot (see section Quality Plot), and the confidence values for each base in the consensus sequence can be plotted (see section Confidence Values Graph).
The most powerful component of gap4 is its Contig Editor (see section Editor introduction). which has many display modes and search facilities to enable very rapid discovery and fixing of base call errors.
If working on a protein coding sequence, the consensus can be analysed using the Stop Codon Map (see section Stop Codon Map), and its translation viewed using the Contig Editor (see section Status Line).
The final result from a sequencing project is a consensus sequence (see section The Consensus Calculation).