Assembling and Adding Readings to a Database

Assembly is performed by selecting one of the functions from the Assembly menu. The options available are:

The data for a project is stored in an assembly database (See section Gap Database Files.) All modes of assembly except CAP2, CAP3 and FAKII can either assemble all the readings for a project in a single operation or can add batches of new data as they are produced. CAP2, CAP3 and FAKII can only be used to assemble all the data for a project as a single operation.

For all modes the names of the readings to assemble are read from a list or file of file names, and the names of readings that fail to be entered are written to a list or a file of file names. If only a single read is to be assembled the "single" button may be pressed and the filename entered instead of the file of filenames.

Now that a sufficient number of readings to get close to contiguity can be obtained quite quickly, and that more repetitive genomes are being sequenced it is sensible to use a "global" algorithm for assembly, such as Cap2, Cap3, FakII or Phrap. These algorithms compare each reading against all of the others to work out their most likely left to right order and so have a better chance of correctly assembling repetitive elements than an algorithm that only compares readings to the ones already assembled.

There is no limit to the length of the individual readings which can be assembled. Hence reference sequences for use in mutation studies or for use as guide sequences can be assembled.

Note that Normal shotgun assembly (see section Normal Shotgun Assembly), Assemble independently (see section Assembly Independently), Assembly into single stranded regions (see section Assembly Single), Screen only (see section Screen Only), Put all readings in separate contigs (see section Assembly new), may require the parameters maxseq and maxdb to be set beforehand (see section Set Maxseq). The maxseq parameter defines the maximum length of consensus that can be created, and the maxdb parameter the maximum number of readings and contigs that the database can hold (i.e. number of readings + number of contigs).

This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_unix_79.html