Version-1998.0 Release Notes
Gap4
The most significant changes in Gap4 include an interface to phrap and a
corresponding consensus algorithm to use phred style confidence values, an
entirely new Find Internal Joins function, and an increased limit of 32
characters in reading names (was 16).
Phred produces values on the -10log(error_rate) scale. With these confidence
values Gap4 can produce the most probable consensus along with confidence
values for these consensus bases. The consensus output functions can now strip
the pads from the consensus sequence. This means that it is not necessary to
spend time removing columns of pads when it is obvious to both users and Gap4
that there really is no base there.
The Contig Editor has also been improved to a provide considerable increase in
speed of editing when dealing with the new 'confidence' consensus. The
consensus can be shaded to indicate its confidence. Coupled with the
"consensus quality" search it provides a means of detecting the consensus
bases that are most likely to be incorrect. Gap4 can also provide an
estimation of the number of errors in the consensus and an estimated error
rate. This allows editing up to a desired error rate and no further. The List
Confidence function gives a quick summary of the number of bases that would
need to be checked in order to achieve your desired error rate.
The Find Internal Joins function has been completely rewritten. The most
significant change is that it now only finds joins that reach the ends of both
contigs - ie real potential joins instead of internal repeats. If you find
that the old functionality of Find Internal Joins was desirable (in that it
found good matches in contigs that did not extend to the ends) then we suggest
the use of Find Repeats instead (which has been sped up).
Sip4
The most significant changes in Sip4 include a significant speed up in
"find matching words", especially when comparing a sequence against itself,
and the ability to send sequences from Gap4 and Nip4 to Sip4 and link their
respective cursors together to allow synchronous scrolling between programs.
Nip4
The most significant changes in Nip4 include the additional functions:
find start codons, find open reading frames, change the genetic code and
count the dinucleotide frequencies.
Gap4 Changes
Main changes
-
New 'confidence' consensus algorithm. This uses Phred like quality
values (in the scale of -10log(error_rate)) to produce the most
probable consensus base along with a consensus confidence value in the
same scale. The Contig Editor has search methods for finding poor
quality consensus bases. A List Confidence command (in both the main
Gap4 menus and in the editor) gives estimates of the expected number of
errors and frequency charts for the confidence values. Note that all
the consensus algorithms now use IUB codes instead of the Staden codes.
-
Phrap assembly interface to allow assembly of new sequences or
reassembly of existing sequences. Requires newest version of Phrap.
-
The consensus can now be output with pads removed. This includes
adjusting the positions and lengths of annotations to ensure that they
still mark the correct sequence fragment.
-
The maximum reading name length is increased from 16 characters to 32
characters.
-
New Find Internal Joins algorithm. The key feature is that it now
checks that all matches are real joins rather than a possible internal
repeat.
-
Added a Contig Editor information line to display information on
readings, the consensus, and annotations when the mouse pointer is
moved over the relevant object. The contents and format of the
information displayed is configurable.
-
New Difference Clip and Quality Clip commands. These adjust the cutoff
data based on disagreements with consensus or average quality values.
Minor updates
-
The Editor Highlight Disagreements mode can now also highlight by
foreground colour and can be switched into a case insensitive mode.
-
The Contig Editor can now adjust individual confidence values and
allows control over the default confidence values used for replaced
or inserted bases.
-
Improved support for compressed files in the Trace Display. Files do
now not need to be renamed after compression as Gap4 will
automatically search for .gz, .bz, .Z and .z extensions.
-
It is now easier to assemble, disassemble or extract a single reading
as this can now be done without creating a file of filenames or a
list.
-
Contig Editor discrepancies search now works in conjunction with
auto-display traces.
-
New View menu in Restriction Enzyme plot for easier access to the
textual output functions.
-
A sequence base confidence of 100 now forces the consensus base type
to be that sequence base type. Two conflicting bases of confidence 100
will set the consensus to '-'.
-
No longer do a Check Database when opening a database in read-only
mode on the command line. Two new command line switches, -check and
-nocheck, also control this behaviour.
-
Sped up Find Repeats and Auto Assemble.
-
The Join Contig command can now be used in read-only mode to view
contig joins.
-
Renamed the Editor "verify &" and "Verify |" searches to "Evidence for
Edits (1)" and "Evidence for Edits (2)".
-
Added a popup menu (containing complement, edit, etc) to the consensus
ruler in the Quality, Restriction Enzyme and Stop Codon plots.
-
Directed assembly now reallocs space for consensus on-the-fly hence
removing the need to set maxseq before hand.
-
The contig selector now displays the number of readings in a contig in
its information line.
-
Added a TRACE_DISPLAY.DIRECTION definition which can be either top or
bottom (the default). Controls where new traces are displayed.
-
Editor "Next Problem" search is now "Next Search" and can be used with
any type of search.
Bugs fixed
-
Ran "purify" and fixed the detected memory leaks or bad memory
accesses.
-
The Contig Editor sometimes would corrupt the annotation lists
resulting in 'annotation neither used or free' type messages. This
could result in corrupted databases.
-
Fixed the "Locking an inuse record" error.
-
BUSY files were incorrectly handled when opening several databases in
succession without exiting the program.
-
Check assembly incorrectly calculated the percentage mismatch.
-
Colours on systems using CDE should now be correct.
-
Better error recovery with dialogues that enter the 'busy mode' -
especially for Find Internal Joins.
-
Joining contigs with the multi-contig template display running was
crashing.
-
Search by annotation (in the Contig Editor) crashed on Solaris
-
Saving tags from Find Repeats often crashed, or saved no tags.
-
Over enthusiastic clicking to bring up traces was sometimes resulting
in a crash.
-
Occasional contig ordering crash fixed.
-
Tags containing comments with very long lines were truncated when
written to an Experiment File.
-
The manual Join Contigs command now brings up the displays at the
requested reading names rather than the contig starts.
-
The old consensus algorithm dealt with dashes on the reverse strand
incorrectly when in 'compare strands' mode.
-
The stop codon plot now takes note of the start and end ranges in the
dialogue.
-
Improved the Trace Display code; better resize handling and faster.
-
Editor search 'by file' mode now handles reading numbers as well as
names.
-
Finding names in the restriction enzyme map by moving the cursor over
the cut line now works correctly.
-
The restriction enzyme dialogue box would "hang" if the contig
identifier was deleted and the mouse left the contig identifier
entry box.
-
If the quality plot or stop codon plot windows were not positioned
quickly enough, the scaling of the results would go wrong.
Sip4 Changes
Main changes
-
Sped up "find matching words" and especially the case of comparing
a sequence against itself.
-
Sequences can be sent to Sip4 from Gap4 and Nip4
-
Added cursors to the sip plot and sequence display.
-
Added the ability to select the sequence and sequence range to each of
the comparison routine dialogue boxes. The sequence range is
remembered for the next use of that sequence.
-
The sip plot now contains the features of the nip plot ie the ability
to move plots around, and double clicking on the raster with the
middle mouse button will invoke a sequence display.
-
Added the ability to create a new sequence over a specific range in the
sequence manager.
Minor updates
-
If a sequence is entered which has the same name as one already loaded
its name is changed by the addition of '#number" where 'number' is a
unique identifier.
-
Previously Sip4 insisted that at least 2 sequences were loaded but
now if only 1 sequence is loaded, the comparison functions will compare
this sequence against itself.
-
Sip4 now saves the results from "find best diagonals" and
"find matching words" and treats them the same as the other
comparison functions. Previously the results were only drawn to the
sip plot and any changes to the plot, such as zooming, would destroy
the results.
-
Print out the amino acid composition if a protein sequence is entered.
Nip4 Changes
Main changes
-
Added ability to select the sequence as well as the sequence range
to each of function dialogue boxes. The sequence range is remembered
for the next use of that sequence.
-
Added a function to search for start codons.
-
Added a function to search for open reading frames and save as
a feature table or save the amino acids in fasta file format.
-
Added a function to change the genetic code used internally within
the program.
-
Added a function to count the dinucleotide frequencies.
-
Added ability to create a new sequence over a specific range in the
sequence manager.
Minor updates
-
If a sequence is entered which has the same name as one already loaded
its name is changed by the addition of '#number" where 'number' is a
unique identifier.
-
Added a results menu to the restriction enzyme plot, and added the
"Output enzyme by enzyme" and "Output ordered on position" commands.
-
Start and stop codons are now plotted automatically onto an
existing gene search plot.
-
Added the "busy" mechanism.
Bugs fixed
-
Fixed several scaling bugs which occurred when moving plots around.
-
Saving to a file from the sequence manager now deals with ranges
correctly.
-
If all the results have been removed from a plot, it will be closed.
-
String searching on the reverse strand now works correctly
Trev
Minor features
-
Trace colours are now configurable in .tk_utilsrc.
-
The font adjustment mechanism is now consistent with the other
programs.
Bug fixes
-
Shutting down using the window manager instead of the Exit command now
terminates the process correctly.
-
Saving an ABI or ALF file to an Experiment File was giving invalid
files. Saving from SCF worked.
Others
-
Added a -normalise option to makeSCF to remove trace background and to
scale the peak heights.
-
Extract_seq -good_only now gives the correct region.
-
Improvements in reading ALF files (from any program). If the "ALF
Processed data" cannot be found the "ALF Raw data" is read instead.
-
Pregap now adds (if known) chemistry information during the Augment
stage.
-
The vector clipping program now has a fail safe (when "-m -1" is used)
where the left sequencing vector cutoff is set to the distance between
the primer and cut sites when no match was found. Used in pregap.
-
Added ability to search for word* and *word* on entrynames in the
sequence library searching dialogue.