One of our main objectives with the new program was to provide many more visual clues as to the current state of a sequencing project and to allow the users to interact in more intuitive ways with their data. We were particularly interested in the problems of dealing with repetitive sequences, and wanted to supply tools to display and manipulate the various types of data that might help to solve difficult assemblies. To this end we have introduced new displays and a new gap data item the "contig order". The new displays are the "contig selector", the "Contig Comparator", the "template display", the "restriction enzyme map" and the "stop codon map". We have also made it possible to have any number of contig editors and contig joining editors running simultaneously. The same contig can be viewed in several editors simultaneously, hence allowing repetitive regions to be compared.
In previous versions of our assembly programs the user had no control over the relative order of contigs during processing and, even had it been possible, there was no functionality to make use of it. The new gap stores the "contig order" in its database and through a new type of display, the "contig selector" this information is always visible while the program is running. The "contig order" is simply the relative positions of the contigs. In the "contig selector" all contigs are shown, each being represented by a horizontal line proportional to its length. The left to right order of these lines defines the contig order. Users can reorder the contigs by dragging the lines that represent them around inside the contig selector display. The contig selector can also be used to select contigs for processing. Tags can be displayed in the contig selector window.
The Contig Comparator is used to display the results of comparing contigs. It is our solution to the problem of displaying multiple types of data about the possible relationships between contigs. It can currently show the results of searches for templates that have readings in more than one contig, the results of the old "find internal joins" function, the results of searches for repeats and the results of "Check Assembly". These searches reveal information about the possible relative order of the contigs, or the positions of problems, and the Contig Comparator allows all of their results to be displayed and manipulated together. When any of these types of search is performed the contig selector automatically converts to a Contig Comparator by duplicating itself in the vertical direction. Results are plotted in the rectangular display created in this process. Furthermore the manual contig shuffling procedure outlined above can still be performed and the plotted results associated with any dragged contig will move along with it to its new location in the display. As is explained below this greatly facilitates contig ordering and can help users understand difficult assemblies and plan experiments. The Contig Comparator can also be used to invoke the join editor, the contig editor and the template display.
The template display shows a schematic of all the readings and templates for a single contig. Each is represented by a horizontal line proportional to its length. Colour coding shows strandedness and arrows indicate the direction of the reading. Selected tags can be plotted as can the quality plot (now colour coded) that was available in the previous programs and a new restriction enzymes display. Templates that appear in more than one contig are also colour coded. This display can also be used to select readings for processing.
For those who employ restriction enzyme mapping data to aid their assembly projects we have added functions to locate and display the positions of restriction sites. Selected sites can be converted to tags that can be displayed in all the usual ways.
A stop codon plot is available to display stop codons in three or six reading frames. It can be linked to the contig editor to reflect the edits made.
The contig editor contains several selectable status lines to display information about the readings contributing to the consensus and for displaying translations in any of the six reading frames.
A further new feature of gap is its ability to create and use "lists". Users of our package will be familiar with the idea of "files of file names" and know their value for processing batches of data. For the new gap we have extended this concept so that many of its commands operate on lists of items. To facilitate this mode of work we have provided routines to create and manage lists.