Version-1999.0 Release Notes
Note: A patch for the 1999.0 version was released on 17th May 1999.
This contains several bug fixes. Please see the
end of this document for more details.
Summary
This release includes a complete rewrite of the pregap program, now known as
pregap4. This now has both a graphical interface and a batch mode. Pregap4
is used for preparing sequences for assembly, and includes all the old pregap
tasks (trace file conversion, vector clip, quality clipping, etc) and also
interfaces to new modules, such as E.Coli screening, repeat masking, and
batch-mode assembly (via programs such as phrap, fakII, cap2, cap3 and gap4).
Several other programs have been updated in the production of pregap4,
including vector_clip (which has more sensitive searching capabilities) and
screen_seq (a new program).
The Gap4 program has had several major additions. The most significant are an
increase of the maximum single sequence fragment length from 4096 bases to
30000 bases; a "notes" data type to store notebook style comments attached to
sequences, contigs, or just the database as a whole; an interface to cap3;
a better trace display showing any number of traces and with an improved
'lock' mode; and numerous other additions. See the below text for full
details.
Programs using the sequence library browser (Nip4, Sip4, and Slim) can now
optionally read SRS indices. This allows for integration with a local SRS
setup and therefore extends the library searching capabilities to any SRS
search. This also means that these programs can read more file formats than
before. At present accessing SRS results over the WWW is not possible with
these programs (although they can read local files, and so saving a WWW SRS
search result to disk will work).
New version of vector_clip. It includes many new methods to improve accuracy,
and in some cases speed. For the 3' sequencing vector it now uses dynamic
programming (was hashing). The cloning vector clip now calculates the expected
score for a given probability for each length of diagonal. Vector_clip also
includes a new way to define the sequencing vector and primers. Vector clip
can search multiple vector/primer pairs against each sequence (reading this
data from the vector/primer pair file rather than the lines in the experiment
files). The experiment file is updated to reflect which vector/primer pair
gave the best (and acceptable) match.
Pregap4
This is a completely new program designed to replace pregap. For
documentation, see Pregap4 - Table of
Contents.
Screen_seq
A new program for filtering out readings that contain the sequences of
contaminants such as E. coli. Batches of readings can be compared to sets
of possible contaminant sequences of any length. The search is very fast.
Pregap4 includes an interface to screen_seq.
Gap4
Main changes
Minor updates
- The Find Internal Joins dialogue now has an extra option called
"minimum diagonal score". This may need adjusting when looking at very
weak matches.
- Added a pull down menu from editor names display, allowing "remove
reading" and "list notes".
- The unpadded base position can be found in the contig editor by
pressing the middle mouse button on a base. The editor attempts to
cache the consensus sequence, when possible, so this operation may be
slow for the first time, but should be faster thereafter.
- The main text output window now has a "Bell" button to turn off the
ringing of the bell whenever errors are displayed (such as failed
assemblies).
- Added a Control-Q binding to the editor for rapid toggling of the tag
display. Pressing it turns off the displaying of all tag types in the
editor, allowing the quality codes underneath them to be seen.
Pressing Control-Q again reveals the tags once more.
- Clicking on a reading or consensus base in the contig editor now
displays the confidence and probability values for this base in the
editor information line. This is also updated after editor searches.
- Removed the "maximum sequence length" limitation (500,000 bases) from
the stop codon and restriction enzyme plots.
- The "Difference" clipping method now uses a simple quality clip on the
first and last readings to tidy up the low quality contig edges.
- The addition of DIFF tags from Difference Clip is now optional (and by
default is off).
- The editor search dialogue remembers the last used value for that
search (within a single run of the program). Defaults may also be
individually specified for each search type.
- Copy Database in garbage collection mode (ie the copy_db utility) now
has more error checking and auto-fixing code.
- Improved error handling. Unix signals are trapped and added to the log
files. Upon crashes, we also output the program stack trace on Solaris
machines, and a stack hex dump on Alphas.
- Large speedups to the Template Display when dealing with templates
containing many readings.
- A new gaprc option allows for the Quality Plot to be automatically
displayed when the Template Display is first displayed. For speed and
display size constraints, this only works when showing a single
contig.
- New gaprc options allow for the editor status lines to be displayed by
default.
- The editor manual "align" command now inserts pads with a confidence
automatically calculated from their surrounding bases. Previously
these were added with 100% confidence.
- The Gap4 FakII interface has now been tested to work with the latest
FakII release (4.2).
- The default confidence for sequences with no confidence values is now
2 (was 99). This corrects problems when mixing phred base calls with
other base calls. Also confidence value 0 now really means "ignore",
rather than just "probably not this base".
- The BUSY file now contains the machine name and process id. This is
useful for testing whether the database is really busy (eg after a
possible crash).
- The "Sort Matches" command for the Find Internal Joins, Find Oligos,
Find Repeats and Check Assembly plots is now applied automatically.
This means that the Contig Comparator "Next" button will also start on
the most significant match.
- The differences line in the join editor no longer displays "!"s in
the regions beyond the ends of contigs.
- The contig editor menus are now specified in the
$STADENROOT/tables/gaprc file. This allows for adding new commands to
the editor, or removing ones which you never use.
Bug Fixes
- Attempting to assemble twice in a row with the output file from the
previous assembly was crashing.
- Saving tags changes from the contig editor would sometimes corrupt the
tag list.
- Plotting restriction enzymes on a contig range (instead of the whole
contig) was plotting in the wrong place.
- Improved protection from quitting editors before they've finished
starting up. This fixes problems with excessive clicking on the
contig comparator "Next" button.
- Editor consensus quality and discrepancy searches now round to the
nearest integer value, rather than downwards to the previous integer
value. Hence the searches now agrees with the list confidence output.
- The highlight disagreements mode of the editor with case insensitivity
enabled was still case sensitive when viewing in "by dots" mode.
- Improved robustness of Extract Readings (only really an issue when
salvaging corrupted databases).
- It was possible to (attempt to) adjust the confidence value for the
consensus sequence. Now blocked.
- Fixed complementing of IUB codes - Y to R was missed out.
- The contig editor Highlight disagreements by colour works better when
cutoff data is shown.
- Fixed a problem with consensus tag comments being truncated to 40
characters when splitting or joining contigs.
- Shuffle pads when used in conjunction with the confidence consensus
algorithm would sometimes errornously delete columns when they were
not 100% pad (however they still needed to be mostly pads).
- Adjusting the colour schemes now updates the trace and editor
displays.
- The repeat search option when given a single segment of a contig could
report matches in the wrong position.
- Consensus tags are now visible when the consensus quality is shown.
Nip4
Main changes
- (Also applies to Sip4).
Nip4 can now make use of local SRS indices (although at present it
cannot make use of remote SRS servers). This therefore will allow the
databases to be in any format that SRS can parse, including GCG.
The options menu allows choice of "srs" and "staden" local formats.
- (Also applies to Sip4).
The dialogue to reading in sequences has been split into several
dialogues, allowing for Simple database fetch, Complex database search
and fetch, and personal files.
Minor updates
- Removed the "maximum sequence length" limitation (500,000 bases) from
the restriction enzyme plot.
- (Also applies to Sip4).
Added a busy cursor when searching for hits and entrynames.
- Added the ability to translate over a specific region.
- (Also applies to Sip4).
Added a rotate option to both the sequence manager and the main menus.
- (Also applies to Sip4).
Command from the sequence manager popup menu are now also availabe
from the main menus.
Bug Fixes
- Calculating the coden content table for the reverse strand did not
work correctly.
- Protect against loading of protein sequences.
- (Also applies to Sip4).
The scramble function didn't create a unique name if the same sequence
was scrambled more than once.
- (Also applies to Sip4).
Fixed a problem with using sequence translate and rotate after
sequences had been deleted.
- Dragging plots with the middle mouse button sometimes produced Tk
errors.
- (Also applies to Sip4).
Fetching sequences from the EMBL library with names that are prefixes
of other names (eg SC086, prefix of SC0863) sometimes fetched the
wrong sequence.
- Controlling the sequence display from the restriction enzyme map
sometimes failed.
Sip4
Main changes
(See also the Nip4 changes)
- Added a local alignment algorithm, based on Huang's SIM.
Minor updates
- Sped up the "Find best diagonals" search.
Bug Fixes
- Fixed a minor problem with the naming of personal files. Reading in a
file named 'seq' would internally name the sequence as "seq]".
- Setting sequence ranges now operates on the correct sequence (it was
picking the first loaded sequence).
- The graphical plot cursors now update when moving the sequence display
cursor using the ">>" buttons.
- Creating plot number 10 (and onwards) was giving Tcl errors.
Trev
Main changes
- Trev now handles multiple files when started with (for example)
"trev *.scf". It adds "Next", "Prev" and "Goto" buttons to allow quick
switching from file to file.
- Postscript output of traces, including multiple pages if required.
- An "Undo Clipping" command may be used for undoing any clip
adjustments. (This does not work for sequence edits.)
Bug Fixes
- SCF traces containing no sequence data (only traces) can now be
viewed.
- Fixed handling of "Cancel" from the Save window.
- Fixed a bug with the WM_DELETE_WINDOW handler (clicking on the 'x' on
the window manager, or using the window manager to select "delete
window").
- Adjusting the colour schemes now updates the trace display.
Vector_clip
Main changes
- Improved the speed of cloning vector searches.
- We now handle data with large sections of rubbish at the 5' end before
the primer. (Tested on 100 "bad" sequences with ~60 bases of 5' junk
and the 100 "testpackage" sequences with 100% success rate.)
- A new "probability mode" allows for finding the most probable matches.
This solves problems when small matches in the corner of a dot-plot
matrix were considered better than weak, but long, matches elsewhere.
- Vector clip can now scan each sequence against a set of vectors and
primers. These are listed in a single file.
Bug Fixes
- Fixed a vector_clip "out by one" error when finding the right hand
sequencing vector clip point.
- Primer walks are now treated correctly - we do not attempt to find any
vector at the 5' end.
Misc
- Easier zooming mechanism on many plots. We now have "+10%" and "+50%"
zooming buttons. The old mechanism of Control right mouse button still
works.
- The get_scf_field program can now read compressed SCF files.
- Io_lib change to allow programs to read (for example) just the
sequence from a trace file. This gives huge speed ups for programs
such as init_exp.
- Further speedups to the file browser. It is now approximately twice as
fast to start up.
- The GCG file format has changed again. Nip4 and Sip4 now support the
newer format.
- Enforce IUBC codes in io_lib when converting from any trace file
format to the Experiment File format.
- Extract_seq now copes with sequences containing no SQ line, instead of
just crashing.
- Reading ABI files (and conversion of them) now also reads the Run
Date, Run time, Comments, Lane, Matrix File, and base caller (SPAC 2)
attributes. These are added to the SCF comments and can be viewed from
Trev and Gap4.
- Increased the maximum file name length in repe from 80 to 1024.
- The restriction enzyme map now has a configurable font, thus making
it possible to configure how many enzymes can be displayed in a plot
without needing the scrollbar.
- When loading an Experiment File referencing a trace file via LN/LT
lines we now first search for the trace in the local directory
containing the Experiment file.
- Trace file accessing now also auto-recognises bzip2 (in addition to
the earlier bzip program).
1999.0p1 Patch
This is an update to the 1999.0 release. Attempting to add this update to
earlier releases will cause problems. The update fixes several problems
related directly to the 1999.0 release, but also some bugs which existed in
earlier package releases.
The patch may be downloaded by anonymous ftp to
ftp://ftp.mrc-lmb.cam.ac.uk/pub/staden/patches/1999.0/1999.0p1.tar.gz.
This will download the 1999.0p1.tar.gz file which should be placed in
the staden home directory ($STADENROOT).
To install the update, follow the following steps:
-
Change to Staden Package home directory.
cd $STADENROOT
-
If not already done, extract the 1999.0p1.tar.gz file in this directory.
gzip -cd 1999.0p1.tar.gz | tar xvf -
-
Execute the install_update script with the update.1999.0p1.tar file as an
argument:
./install_update update.1999.0p1.tar
-
If required, search for files ending in ".1999.0p0". These are files which
the install_update script has detected as being locally modified since the
main 1999.0 release. Any local modifications will need to be manually applied
to the updated copies.
find . -name "*.1999.0p0" -print
Bugs fixed
New features/Changes
- Gap4: The trace display can now display multiple columns of traces.
This is not yet documented, but it should be self evident. The "save
layout" button will store the current row and column settings to be
used in future Gap4 sessions. Traces may now also be brought up using
Double-Left button.
- Gap4: We now complement the smallest contig when displaying join
editors, contig editors or template displays from the contig
comparator plots.
- Gap4: Due to disabling the consensus caching (a bug fix - see above),
the unpadded base position report in the contig editor is now slower.
To reduce problems caused by this, the binding to report this
information has been moved from the middle mouse button to the Enter
key.
- Repe: The maximum reading length has been increased from 2000 to 4096.