Version-1996.2 Release Notes
This release sees many bug fixes, but few truely new features. The most
noticable gap4 changes are the considerable speed improvements to gap4 and two
new assembly engines (forthcoming). In addition new features include a new text
based contig selector to supplement the existing graphical display, an
improved check assembly function and the removal of a few size limitations.
The new assembly engines are Xiaoqiu Huang's CAP2 and Gene Myer's FAKII
programs. For both of these we have the written the first versions of the gap4
interface. However we are not distributing either the Cap2 or FakII binaries
and hence the Gap4 options are currently "greyed out". Both assembly systems
have been modified to support Experiment Files. We hope to make a cap2 binary
available ourselves soon. The mechanism of obtaining FAKII will be to obtain
it direct from the authors, however our modified version is not yet available.
Announcements for both will appear on the
bionet.software.staden newsgroup
when we are ready.
The second new assembly engine is Gene Myer's FAKII program. Once again, the
Gap4 section is complete (first draft). We have made changes to the FAKII
source code to output Experiment File format. At some (unspecified) later date
the modified FAKII program will be available direct from Gene Myers.
For sip4 the most notable improvements, bug fixes excluded, are the ability to
display alignments in a graphical fashion, the ability to now use personal
sequence files of any length and to search them as libraries (entryname only),
and a new exact match search in the sequence browser.
The trial versions of the programs that examine traces to determine where to
clip readings based on their quality (trace_clip and scale_trace_clip) have
been substantially altered.
Finally, several new programs are available. These are:
copy_db
| Copies and garbage collects gap4 databases.
|
extract_seq
| Extracts the sequence component from trace files or
experiment files.
|
getABIstring
| Displays arbitrary string fields from an ABI trace file.
|
getABIcomment
| Displays the comments from an ABI trace file. Equivalent to
getABIstring CMNT.
|
getABIdate
| Displays the run date from an ABI trace file.
|
get_scf_field
| Extracts data from the SCF comment section.
|
scf_info
| Displays details stored in the header of an SCF file.
|
scf_dump
| Displays the entire SCF file contents in a human readable
format.
|
scf_update
| Converts between SCF file versions (2 to 3 and vice versa).
|
toad
| Relabels lanes in SCF traces.
|
Several programs have been removed. These are:
bap, cop, cop-bap, dap, sapf, sipl,
splitp1, splitp2, xbap, xdap and xsap.
The lookup command has been renamed to pregap_lookup.
Here follows a detailed, but not entirely complete, list of changes. The more
important ones are displayed in bold text.
Speed and memory improvements
-
Opening databases is considerably faster - it's a different complexity
and so for larger databases the speed up is more significant.
(gap/xgap too)
-
Reduced memory overhead within the IO component of the programs
when opening a database by ~25% to ~40%. (gap/xgap too)
-
General IO throughput speed ups and tidy up - approx. 5 - 10%
increase. (gap/xgap too)
-
Gap4 specific IO speedups gained by delaying data flushes, use of
data caching, and various other changes. Speedups vary from 17% faster
for assembly to over 900% for enter tags. Disk I/O should also be
reduced. See News: Gap4 Speed Improvments
for a detailed
summary.
-
Suggest Probes is now around 60% faster.
Contig Editor
-
Attempting to view an experiment file as a trace in gap4 no longer
crashes.
-
Possible memory corruption fixed in editor when resizing the editor to
a large (over double the original size) width.
-
Automatically quits the editor when we attempt to edit this contig by
another means and the editor has not yet been used to make changes.
-
Stop codons in the translation status lines are now shown in red.
-
Improved dialogue for 'join contigs'. It's now possible to easily
select both contigs using the contig selector by clicking on the
appropriate box in the join contigs dialogue.
-
The "Verify Or" search missed certain cases. Fixed.
-
Traces can now be displayed with line widths greater than 1.
-
Complemented traces now also have their numbers complemented.
-
The tag editor sometimes had problems displaying the comments (non
NULL terminated). Fixed.
-
Now harder (impossible?) to create tags off the ends of readings.
-
Searching for tags that missed tags that overlap the cutoff data at
the left end when reveal cutoffs was not set.
-
The temporary tags created on the consensus by the Select Primer
routine caused in-memory tag list confusion. Either the editor would
crash later in routines such as align, or another consensus tag would
be completely lost.
-
Join highlighting of reading names/lines between the editor and the
template displays was causing harmless warnings when highlighting the
consensus line.
-
Fixed odd font issue under Solaris.
-
Undoing cutoff adjustments in the join editor now maintains lock
position properly.
Misc
-
Removed beta status.
-
Added Huang's CAP2 assembly function. The cap2 binary isn't available
yet though.
-
Joins made during assembly would sometimes corrupt the contig order.
Now fixed. (NB: to recover from contig order corruptions simply select
"Reset contig order" from the commands menu of Doctor
Database.
-
Fixed bug with the editor cursor as displayed in the template display.
It had problems with negative cursor positions and didn't always draw
itself until the editor was used to move the cursor.
-
We can now double middle click on the template display (including the
quality plot) to bring up a contig editor at a specific point or to
move the existing editor to the new point.
-
The oligo selection dialogue in both the manual (editor) and automatic
modes has extra parameters available.
-
Removed the limitation of 100,000 records per database (also fixed for
gap/xgap).
-
Removed tags and disassembled readings are now reclaimed in the
database properly for subsequent usage.
-
Default unix permissions for gap databases are now to write with
whatever the user's umask value requests. Previously we blocked
read-write access for group and others even when umask stated
otherwise.
-
Directed assembly no longer places complemented sequences without 3'
poor quality data 1 base out of position.
-
Fixed 'database open' detection code. It crashed when we closed a
database by using 'new' (and cancel it) followed by 'open'. We also
now close any current database before using the file browser (was
after) to fix busy file confusion.
-
Fixed an uninitialised variable in "assemble all readings in one
contig" which caused the routine to get stuck on SGI machines
-
Changed the alignment penalties to 1 (gap open) and 8 (extend).
-
Directed assembly improvements and crash fixes when dealing with poor
alignments, "assembles beyond tolerance" cases
-
Check assembly now uses a window to slide along the used data when
checking for mismatch percentages. This means that long, good
matching, readings with diverging sequences at one end are easily
detectable.
-
Disassemble readings and directed assembly both crashed when the
database after the actions is blank (either removed all, or assembled
none). Fixed.
-
The output window can now display the parameters used to produce
certain types of output (right button).
-
Uninitialised variable caused rare case where the confidence values
were not read from SCF files during assembly.
-
The Suggest Probes function now has a user interface to select which
oligos to write to a file and tags to create.
-
Alignment output when joining two contigs now displays the percentage
mismatch instead of a score.
-
Automatic scaling of the stop codon and restriction enzyme plots now
occurs when specifying the ranges in the contig to display.
-
Contigs outputted by calculate consensus (and other functions) are now
generated in the order shown in the contig selector.
-
Added a new form of contig selector that allows selection from a
textual list rather than the existing graphical plot.
-
Enter Preassembled data was incorrectly using the vector information
for determining 'good' regions of sequence. In rare cases these may be
different from the QL/QR lines and hence used to cause misalignments
after assembly.
-
Removed the limitation on the maximum size of tags (was 1024 letters).
-
Extract readings no longer outputs zero length tags. These don't exist
under normal circumstances anyway.
-
Contigs of length 1 base no longer crash the template display or
the contig selector.
-
Removed problems when attempting to 'move' a contig in the contig
selector when it's the only one present.
-
Annotations outputted using Calculate Consensus experiment file format
had the "all annotations" and "annotations except in hidden" switched.
-
Extract readings can now handle reading numbers as well as names in
the input file/list.
-
Enter tags would happily modify data that an editor is using (without
complaining), didn't update the contig selector window, and would also
crash when tags were entered for unknown reading names.
-
Various minor fixes to the restriction enzymes dialogue, including
fixes for badly formatted files (these use to crash gap4) and better
updating of the selection list when using the file browser.
-
Removed memory corruption in OSP (primer searching) code. Symptoms
were typically hangs in OSP or crashes when quitting a contig editor.
-
Force dumping of core when receiving a fatal database error. This is
deliberate to aid debugging.
-
Assembly now handles the distinction between filenames and reading
names correctly. Previously, when checking if a reading was already
entered, it used the filename to compare against the list of currently
entered readings
-
Assembly into one contig now works on non-blank databases.
-
Suggest Probes on extremely short contigs used to produce incorrect
results.
-
Added a 'remove all' button to the 2D sip plot.
-
Several minor cross-hair and cursor bugs fixed for the sip plot.
-
Listing alignments with sequence names greater than 15 characters now
truncates the names rather than ruin the output.
-
Can now read in experiment files written out by the Gap/Gap4
'output consensus' option.
-
Now supports multi-sequence 'staden format' files (as produce by
gap4).
-
New function to interconvert T and U in sequences
-
Fixed memory corruption in Find Similar Spans.
-
Align sequences now plots the alignment in the sip plot.
-
Removed limitation of sequence length on personal files (was 100,000
bases).
-
The sequence display window now correctly identifies matching letters
when different cases.
-
We can now search personal files (only for entry names) to treat them
as a library.Note that this is case sensitive.
-
Selecting more than 1 entryname for "Accept" in the library browser no
longer causes errors.
-
A new search option of "word" is available in the library browser to
find exact matches.
-
Right mouse button in the sip plot used to cause errors.
-
Sequences written to disk are now folded into lines of 60 characters
or less.
-
User now has control over match and mismatch scores for alignments.
-
The entry names associated to a particular result could become
switched when an error occured during an 'accept'.
-
When comparing dna versus protein we now make sure that the horizontal
and vertical sequences selected are identical to the ones before the
comparison.
-
Improved error handling and user-input buffer overflows in various
places.
-
Find similar spans no longer crashes with extremely short sequences.
-
"Hide All" in the sip plot now always works correctly. Added a Reveal
All.
-
Comparing DNA against protein using Find Best Diagonals now plots all
three translations (it used to only plot one).
-
Improved error checking when all input files fail.
-
Now gracefully handles cases where sample names contain the
slash (/) character. They are replaced by underscores.
-
SCF files with no "NAME=" comment now produces correct experiment file
filenames.
-
Now blocks against attempting to edit sequences when the edits line is
- not displayed.
The default edit mode is now 'right cutoff'.
-
Attempting to edit the sequence or position the cursor when no trace
has been loaded no longer causes errors.
-
Sped up redrawing the cursor in the trace display. This speeds up
the gap4 display too.
-
Added -editscf command line option to allow writing of the SCF format.
Disabled by default as it can be dangerous without appropriate
knowledge.
-
Fixed help.
Io_lib
-
The extract_seq program is now distributed as a binary (we only
supplied source before). It now also accepts parameters for clipping
good quality data from experiment files.
-
The experiment file output of io_lib (used by trev and gap4 'extract
readings' amongst other things) could write tags incorrectly. The code
now also knows which line types should be written only once and which
could be written many times.
-
Writing to plain sequence format (io_lib) no longer produces an error
message. It worked before, but claimed otherwise.
-
Added back code to update the private data in the SCF file format.
This was 'lost' sometime after io_lib v1.2.
-
Removed a small memory leak in 'deallocating' SCF structures.
-
Experiment file output sometimes produced too much output for ON and
AV lines when they ended exactly on a new line boundary. This affected
the Gap4 extract readings command.
-
Better error recovery where the comments in an SCF file are missing.
This used to crash.
Misc
-
Can now search for text in the output window for both gap4 and sip4.
-
The time output format of the output windows (gap4 and sip4) is now
more similar to the Unix 'date' command - it contains week days as
well as day of month.
-
The 'scale' controls in gap4, sip4, etc, now take up less room on the
screen. They also handle key bindings better. This means that the
larger dialogues, such as Gap4's Find Internal Joins have now shrunk
and should fit better on small screens.
-
New copy database (copy_db) program. This allows copying of single
databases with some database garbage collection, and also allow simple
cases of merging databases.
-
Large revamp of the source building mechanism (Makefiles). General
tidy up of code too.
-
Alfsplit now only uses the name field in the ALF files when it
contains printable characters with no white space or slash characters.
Otherwise the filename is used as the reading name.
-
Sip (NB: not sip4) was crashing due to an uninitialised variable.
-
Added staden2exp conversion script (written in perl) to convert the
old gel reading format into experiment files.
-
Various user interfaces now support ~ and $VAR in the filenames.
-
New versions of the trace_clip and scale_trace_clip programs.
Version-1996.2.1 Update
This consists of a minor update to 1996.2 containing a few bug fixes:
-
Shuffle pads had a memory overflow that could occasionally corrupt
the data within the editor. This does not directly corrupt the database
permanently, but will do so if the data in the editor is then saved.
-
Tags added to experiments files prior to assembly could be placed
in the wrong location (by a few bases, depending on the number of pads
to the reading). This only affects the normal shotgun assembly mode.
-
The default font for the tk dialogue boxes does not exist on
Solaris systems without the optional SUNWxwoft package installed.
Changing the font from -Adobe-Times-... to -*-Times-... has hopefully
fixed the problem.