home up

Phrap and Gap4 Integration


The 1998.0 and later releases of Gap4 includes changes that make it easy to combine Phil Green's Phrap assembly engine with the Gap4 editing and finishing tools. This includes providing a graphical user interface from within gap4 that gives easy use of Phrap, and also additions to the display and search functions in the contig editor to make further use of confidence values. For Phrap and Gap4 integration, a modified version of Phrap is required. This is named "gcphrap".

Obtaining gcphrap

To obtain gcphrap, the following procedure may use. This is only relevant for Phrap version 0.98* and newer.

  1. Obtain Phrap from Phil Green and unpack the Phrap source code.
  2. Download the phrap_extras.tar.gz file and place this in the phrap source code directory.
  3. Extract the phrap_extras.tar.gz file, with "gzip -cd phrap_extras.tar.gz | tar xvf -".
  4. Build the new phrap using make gcphrap.

    Phrap from Pregap4

    The Pregap4 program contains interfaces to phred, gcphrap and cross_match.

    The "Phred" module in Pregap4 will recall the trace bases and will store the new base calls and confidence values in the SCF file. The "Cross_match" module will compare against a fasta file of vector sequences and will process the output to produce SL and SR lines in the Experiment File. The "Phrap assembly" module uses gcphrap with the Experiment Files to produce a new set of aligned experiment files in Gap4 "Directed Assembly" format. The "Enter Assembly" module will enter the phrap assembly into a Gap4 database. See the Pregap4 documentation for further details.

    Phrap from Gap4

    Gap4 includes an interface to assemble using Phrap. It works in a similar way to other assembly engines available from within Gap4 in that you provide a file or list of Experiment File filenames, any parameters you wish to pass to phrap, and then Gap4 handles the rest of the communication. Gap4 does not have interfaces to make use of either phred or cross_match as these are considered part of the sequence pre-processing steps. For large assemblies it is probably wise to use the Phrap module in Pregap4 (whilst running Pregap4 in batch mode). This avoids the necessity of a graphical display. Gap4 also has a command named "Phrap Reassemble". This allows for a particular set of readings to be assembled in isolation using Phrap. Edits are preserved, including base changes, confidence value changes, and annotations (tags).

    There are many other features of Gap4 which allow for improved editing speeds when using Phred and Phrap. Specifically the use of Phred quality values may be used by the consensus algorithm to produce consensus confidence values, which in turn may be automatically scanned to bring up the traces in the lowest confidence regions. See the Gap4 documentation for full details.

    Using gcphrap on the command line

    The gcphrap command works in much the same way as the standard Phrap command. Indeed as it is the same source code, it is just the input and output formats which will differ. The key difference is that gcphrap automatically detects the difference between a file containing fasta sequences and a file containing a list of Experiment File filenames. The extra "-exp" option may be used (regardless of input format) to output the assembly as a series of Experiment Files suitable for assembly into Gap4 using the "Directed Assembly" option. For example:

    	$ gcphrap seqs_fasta.screen -exp efiles > phrap.out

    The above command will produce a directory named efiles containing one Experiment File per sequence. These Experiment Files have 'AP' lines added to them to define the assembly positions.

    home up