Version-1997.1 Release Notes
This is only our second release for 1997 but it contains important
additions to the package and some new features that illustrate future
growth areas in our software. (Note that we anticipate that the time to
the next release will be much shorter, and expect it to contain a proper
gap4 interface to Phil Greens phrap assembly engine (for which we
recently obtained the sources), and an improved version of our mutation
detection program.) Of particular importance, the 1997.1 release
contains a new program, Nip4. Changes to the existing programs consist
mainly of a steady stream of improvements and bug fixes, but there are
also some new functions. The more important updates are listed below,
with the list of bugs fixed at the end of these release notes.
All the newer graphical programs have been updated to use Tcl/Tk version 8.0.
The most obvious change to the user is that the fonts may appear to be
different. However Gap4, Nip4, Sip4, and Trev also have font and colour
configuration menus which can be used to permanently change the styles for the
current user. Your comments on this configuration interface will be useful for
future work on extending it to other program options.
SunOs 4
We are now supporting the package on SunOs 4.x, Solaris 2.x, Digital
Unix 3.0 to 4.0, Irix 5 and Irix 6 and Linux. The number of users of all
operating systems except SunOs 4 is increasing. It would help us to
reduce the number of machines (and the heat they generate) if we could
phase out SunOs 4 in the near future. A version exists for Solaris x86
(Solaris on Intel PCs). Please contact us if there is interest in us
supporting this.
Nip4, interprogram communication and request for algorithms
The most important addition this release is a new program nip4 which
will be replacement for our old nucleotide sequence analysis program
nip. In its present form it contains a variety of the analytical
functions that would be expected from a DNA analysis program, but by no
means all. In this first release the functions implemented have been
chosen to cover a reasonable range of the algorithms needed and to
enable us to establish the main components of the user interface. Now
that this has largely been finished we expect to rapidly increase the
range of options available in the program, and we hope users will
contact us with suggestions as to what their own priorities for new
functions are.
That over the last few months we have spent time on nip4 should not be
taken to imply that we think our work on gap4 is any less important than
in the past. On the contrary, we see nip4 as a means of increasing the
usefulness of gap4, and why this is so is explained below.
A great deal of sequence has been assembled and edited using gap4 and
its predecessors. These consensus sequences are written out by gap4 and
analysed using external programs. Often there will be queries about the
accuracy of the sequence at particular places where, for example, the
analysis suggests the possibility of a frame shift error. At present it
is quite laborious to then go back into gap4 to check the evidence for
the consensus sequence. Ideally we would like to check the evidence
(i.e. look at the aligned readings and trace data) from within the
analytical program, and this is what can be done from nip4: all the
cursor positions, both graphical and textual in nip4 and gap4 can be
linked so that they move together. Gap4 can send a consensus to nip4,
which can perform various forms of analysis, and if there is a query
about a section of the sequence, the contig editor and trace display
cursors in gap4 can be moved by cursors in nip4 so that the original
data covering the precise base in question can be checked. We believe
this will save a great deal of time, both during the original analysis,
and later when new results call the sequence into doubt.
Obviously this is most useful if nip4 can show results from the very
best analytical methods, and it is our hope that either by encouraging
others to write their algorithms so that they can be used from within
nip4 (see below), or by us enabling nip4 to import the results of
external programs, this will be the case.
For us the main tasks are 1: to provide an interface to the sequence
libraries that is independent of the EMBL CDROM indexing systems that we
have used for the last several years; 2: to provide a feature table
creator, editor, parser and display routine; 3: to define and provide
methods to import the results of other analytical programs and to use
them inside nip. Of particular interest are BLAST and FASTA results and
those from programs that search for unknown genes in genomic DNA.
Importing results from other programs is one way of enabling nip to grow
but there is another perhaps more important route which is open, and
which results from the way we have designed and programmed all of our
new programs gap4, sip4 and nip4. The programs' user interface is
provided by Tcl and Tk and their algorithms are written in C. The link
between Tcl and C is made by adding the algorithms to the Tcl language -
ie by extending the commands that can be understood by the Tcl
interpreter. In conjunction with conventions for defining the items to
appear in the program menus this organisation of the code makes it
straightforward for other groups to add their own algorithms to the
programs. Coupled with our use of dynamic loading this means that these
additional functions can be made available as compiled code by their
authors, hence allowing them to retain complete control of their
distribution. We believe this will be an attractive option for many
people developing new algorithms: they can concentrate on the algorithms
without worrying about the user interface, and they can make them
available as an additional option to a (we wish!) widely used program.
Gap4
Main changes
-
An automatic contig ordering function is now available. It uses the
read pair information, and should be regarded as an alpha release, so
please send comments and report any problems.
-
In the contig comparator, there's now a 'next' button. This provides
much of the old xgap find internal joins functionality. Basically it
steps through each match in turn. It works for any of the 2D plot
types. To switch 'next' from one set of results to another, select the
"use for 'Next'" item in the results manager.
The next scheme remembers any matches that have been previously
examined either by itself or by manually double clicking, and will
skip these. To clear this 'visited' information select "reset 'Next'"
in the results manager.
The "sort results" option (results manager) orders results based on
their order of importance. Eg you can view the best joins first.
If desired, use this before using "next".
Any comments on this system are most welcome.
-
Integration of the bugs fixed in the 1997.0patch1 update. These
include a serious tag corruption problem.
-
More gap4 scripting language updates, including the ability to
register tcl functions with changes of data or movement of editor
cursors to allow for inter-program synchronisation.
-
New editor search methods: "consensus quality" finds places where the
quality of the consensus is less than a specific threshold;
"discrepancies" finds places where two or more bases disagree, with
both having a quality score above a particular threshold; "file" reads
a 'hits' file of the format "reading_name position optional_message"
and skips to the next or previous position in turn.
-
Fonts are now adjustable in the options menu. They update on the fly,
but at present the editor font (sheet_font) and trace font
don't work until you restart the editor. You may also find that the
fonts aren't the same as before. This is due to the switch to tcl8.
They're now specified in point sizes instead of pixel sizes. Whether
they differ from before simply depends on your X server. It's possible
to permanently adjust them by editing the tables/tk_utilsrc file.
Minor updates
-
Major rewrite of cursors. Cursors are now visible when any two plots
of the same template are shown. For example the template display can
control a stop codon plot without needing the editor running.
Movement of cursors when a plot is zoomed up now ensures that the plot
auto-scrolls when the cursors move off the edge of the screen.
Finally, a mechanism has been set up for cursors to communicate with
external programs so that Gap4 can control foreign plots.
-
The traces from the pair of editors in a join editor now appear in the
same window.
-
The 'select tag types' system has been completely rewritten. Each
option now has it's own tag list, settable in gaprc or temporarily
within gap4 itself. The default is now to turn tags off for the contig
selector, but on for most other things.
-
Sped up the consensus calculation by between 8 and 16%.
-
Doctor Database now has a mechanism to change check database to report
all errors as minor, thus allowing operations to fix databases (such
as break contig) to work, if they can, on corrupted databases.
-
The results manager is now also available as an on-the-fly generated
menu from the contig comparator window.
-
Gap4 can now save the consensus trace to disk.
-
The xgap 'trace-cursor-flash' system is now in gap4. This is where
double clicking on a reading to bring up a trace which is already
being displayed now briefly flashes the cursor to indicate which trace
it is.
-
The editor search functions (eg next problem) always centre the editor
screen, even when the found base was already on the screen.
-
Dump Contig (in the editor) can now output at any line length.
-
The maxseq parameter is adjustable from within gap4 itself. See the
options menu.
-
There's a "Show Edits" editor setting to show edited bases in a tag
like manner. This is done on-the-fly and does not require tags.
-
The contig editor now reads the colours for the quality grey scales
from the gaprc file. This allows user or site control over the quality
colouring system.
-
Editor searching using Control-S is now possible. We can add a
control-R binding, but this conflicts with an already assigned
keypress. For now, escape+control-s does a reverse search. Your
comments are welcome on this.
There's also a save binding (control-x control-s).
-
The editor now shows emacs style "-**-" edit status info. Useful to
know when it's finished saving.
-
The editor undo information now only takes up about half the memory to
store as the previous release. It's still terribly hungry and this is
on our list of things to rewrite.
-
There's a 'change directory' command on the File menu. Gap4 now
switches to the directory containing the opened database.
-
A "group readings by template" setting in the editor allows visible
readings from the same template to be displayed vertically adjacent to
each other. This is mainly designed for use with people resequencing
the same segment many times.
Somewhat experimental. In particular the up and down cursor does not
behave as expected.
-
Complementing contigs with many consensus tags has been greatly sped
up.
-
Sped up the file browser on large directories by approximately two
fold.
-
Disabled tk_focusFollowsMouse as this plays havoc with the default CDE
setup whereby it was auto-fronting whichever window you moved the
mouse over. However this now requires click-to-type on all systems
when selecting different (eg) entries in the same window.
-
The tag editor now remembers the last tag type used, hence making
creating large numbers of tags of the same type easier.
-
New editor key binding: control-d to delete the base under the
cursor (like Emacs).
-
Moving a single contig in the Contig Selector no longer requires
it to be selected first.
Bug fixes
-
Fixed minor memory leaks (found by purify)
-
Bringing up Template Displays from the read pair plots now orders them
correctly.
-
Trace display 'trace_draw_numbers' accessed beyond array boundary.
Reported crashes.
-
Array bounds read overflow when searching for restriction enzymes.
Potential crash. Never seen.
-
OSP oligo searching - array read underflow. Potential crash. Never
seen.
-
Array read overflow in common code used by suggest primers and double
stranding. Potential crash. Never seen.
-
Array read overflow for masked assembly with tags overlapping ends of
the consensus. Reported crashes.
-
Wild pointer read in defunct code during editor startup. Could not
corrupt memory or data. Reported crashes.
-
Added a check for attempting to use break contig within the editor
when the cursor is on the consensus sequence.
-
Reset TCL_LIBRARY and TK_LIBRARY before the Tcl_Init call. This allows
us to force use of $STADLIB/{tcl,tk} when using the scripting language
without needing to reset them in a shell script (eg the "gap4" one)
first.
-
Added a funcheader to the change directory text output. It also now
changes the default directory for the file browser.
-
Added some experimental 'grab release' code to the file browser. On
some window managers (Sanger Centre reported twm and fvwm) it was
hanging all of X while starting up, which can take some time.
-
The tag editor, when saving, didn't 'unnormalise' the direction of the
tag. Hence editing a tag on the complementary strand would reverse its
order.
-
Assembling readings with and ID line containing several words now
works properly. Only the first word is taken as the reading name.
-
Fixed a bug in the vTcl_SetResult function to handle results greater
than 8K. This mainly affects directed assembly, which was crashing at
the end of assembling large sets of data.
-
Restriction enzymes gives tcl error for reading in personal enzyme
file.
-
If you highlight a reading in the contig editor with the template
display up, the editing cursor in the template display gets positioned
to the start of the contig. This is due to a missing "break" statement
in the case REG_HIGHLIGHT_READ.
-
In the template display, there were missing readings on spanning
templates due to freeing t_changes in the wrong place.
-
Template display: uninitialised variable (i) in display_single_ruler
for the contig order tag (c_) which in this case is always 1 (only
ever display a single ruler).
-
No longer lose contig selector / comparator zooming info when edit a
contig (eg in the contig editor, joining etc).
-
Find internal joins "probe with single segement" failed when trying to
use a segment more than 4Kb into a contig.
-
Added check for read-only database to the "create tags" function
within the stand alone restriction enzyme plot. The Edit menu is
disabled if the database is read-only.
-
Fixed a tag editor bug where a newline was always added to tags when
editing, hence disallowing us from completely removing (and
deallocating) tag comments.
-
Find next/prev edit in contig editor fix. It would sometimes position
the cursor incorrectly.
-
Enter tags was entering the comment for the last tag twice.
-
The vertical contig ruler in the contig comparator did not always
resize itself correctly. Added an extra "resize_canvas" when calling
the ContigComparator function in tcl.
-
Made template display more robust in dealing with databases with no
template information.
-
The padding introduced by the join editor align command was sometimes
in the incorrect (by close) places.
-
Fixed the "information" output of the gap4 readpair plot. It was
displaying the direction of readings wrongly.
-
Fixed memory crash with displaying the ruler in the quality plot.
-
Selecting "Information" on templates in the template display could
crash if it contained large (> 14) numbers of readings.
-
Better handling of large number of tag types. Tag menus now
automatically cascade when too long.
-
Adjusting cutoffs of sequences exactly 4096 bases long could crash
the editor.
-
Cut and pasting a consensus sequence from the editor can now
support any length. Previously it stopped at 3999 bases.
Copy_db
Bug fixes
-
More robust when finding unreadable annotations. These are reset to
blank annotations.
-
Corrects for readings greater than 4Kb, but does not check whether
this introduces 'holes' into the contig. Also resets the reading
sequence_length paramater to be end-start+1, hence correcting any
such inconsistencies.
Sip4
Updates
-
Added extra option to plot or not to plot the second half of plots of
the same sequence. By default, only half the plot is shown.
-
Added scrollbars to sequence display.
-
Added GCG reading code (missed out of last release).
Bug fixes
-
Removed ability to do reverse compare similar spans (which used to
complement the second sequence) because of the difficulties in
displaying the data correctly.
-
Wouldn't read in personal files.
-
The sequence library component (the standalone 'slim' program)
couldn't save sequence information to disk.
-
"Find nearest match" in the sequence display could fail with large
sequences.
-
Removed minor memory leak in SipTranslateSeq, and a larger one in the
plotting code.
-
Pressing button 1 in the raster plot with the sequence display up will
position the sequence display at the position indicated by the cursor
(instead of only positioning due to motion events).
-
Fixed a crash in the sequence display when scrolling beyond the
left end of a sequence, when locked to another sequence.
-
Turning off crosshairs in the 2D plot no longer disables the
auto-scrolling of a connected sequence display.
-
Bringing up two sequence displays from a 2D plot no longer produces
Tcl error messages.
-
Zooming with a zoom box partially off the left margin no longer hangs.
Pregap
Updates
-
Changed vector clip minimum score parameter from 0.35 to 0.2.
Bug fixes
-
Fixed a bug with the find_scf_name function of pregap for determining
the entry name from an SCF file. This was done earlier (11th April
1996) but was somehow lost along the way.
-
Now uses /bin/rm instead of "rm" as this can cause problems when using
bash with an exported function named rm to do (eg) rm -i.
-
Optional sort by quality option. Turn on by adding "do_sort=Yes" to
the .pregaprc file.
-
Better handling of abi samplenames containing forward slashes.
Init_exp
Updates
-
Now writes out an AQ line containing average quality.
Io_lib
(Used by all programs accessing trace files and experiment file formats.)
Bug fixes
-
Added a new ALF file type checker to the type detection routine.
Previously some valid ALF format files were not auto-detected as ALF
format.
-
Writing to plain format files from io_lib now only writes out the good
(clipped for quality and vector) sequence. This is a more useful
action than all the sequence - use experiment file format if all is
needed.
-
The exp2read function produced invalid rightCutoff values (INT_MAX)
when no QR line is present. It now correctly sets it to 0.
Vector_clip
Updates
-
Changed the 5' vector search score from 15% to 50%. We know where the
sequence is, so any poor homology is almost certain due to quality and
not accidental.
-
Now prints "." for passes and "!" for fails so that it integrates
better with the pregap output.
Bug fixes
-
Fixed bug in vector_clip when the cloning site is exactly at the end
(or start) of the sequence (a rotate_seq() bug).
Repe
Bug fixes
-
Readings too short to scan are now classified as errors. Previously
they were neither listed in the pass or fail files.
Trace_diff
Updates
-
Now has a -c argument to specify that the start and end ranges for
finding differences should be taken as the innermost of the user
defined start and end and the experiment file QL and QR values. This
helps cases where the same fixed region of sequence is present, but is
not always at the start of the sequence file.
-
New -a argument to request outputting all differing bases regardless
of whether the base call itself has changed.
-
Allow use of consensus traces containing pads.
-
Now outputs MUTN tags (was MUTR).
Bug fixes
-
Trying to difference a trace against itself was giving division by
zero errors.
-
Fixed a couple of memory corruptions.
Trace_clip and scale_trace_clip
Bug fixes
-
They both had variables MIN_LEFT and MAX_RIGHT which were not
reinitialised for each reading processed and so they slowly changed!
Similiar problem with START_POINT.
-
Fix an invalid memory free which caused problems on Linux systems.
Mep
Bug fixes
-
Two bugs in mep from time that masking was introduced. Results for
masked motifs will have been incorrect.
Nipl
Updates
-
Increased ther maximum number of divisions to 40. This allows the
latest EMBL release to work with nipl.