Version-2002.0 Release Notes, October 2002
James Bonfield, Kathryn Beal, Yaping Cheng, Mark Jordan and Rodger Staden
The most visible additions to this release are our improved
methods for
automatic mutation detection and for the visualisation and
reporting of mutations within
Gap4. For more information on this topic we have prepared a separate web
page demonstrating the
new mutation features. This
effects both Gap4 and Pregap4. (Be sure to remove (or edit) any
pregap4.config or config.pg4 files to see the new modules listed within
Pregap4.)
Other key changes include (i) there was a reading length limit
of 30,000 bases in gap4, this has been removed so that "readings"
of any length can now be assembled (as before the length of a contig
is effectively unlimited); (ii) the ability to assemble EMBL
format sequence files and their feature tables into gap4
databases;
(iii) new program copy_reads for copying
useful finishing data from one gap4 database to another; (iv) new
program polyA_clip for trimming polyA/T from readings.
A more detailed list of the program changes made is given below.
This is also the first official release of the MacOSX version (10.2) of
the package. Initially we created and made available an Aqua
version, but we felt this was too buggy (although many users
said that having the functionality of our package available on
the Mac made the bugs seem insignificant), so this version
requires X11. See the
Apple XFree86 site to download X11.
As far as we have checked, the MacOSX-X11 version appears to be as bug-free
as the other versions of the package. Many of the bugs
in the Aqua version were in the Tk libraries and it looked as
though fixes would require greater knowledge of the Mac than we
possess. We would be interested to receive comments
about fixing these bugs and the need for an Aqua version. As the Mac
version was built on 10.2 we are not sure if it will work correctly with
10.1.
Our experiment suggestion / finishing program, now named
"prefinish", is being used to help complete around 500 clones
per month at the Wellcome Trust Sanger Institute. The program is
still under development, but is clearly very useful, and we
welcome requests from others interested in trying it.
A large number of bugs have been fixed in this release (some are
listed
below), many with the help of
purify
and valgrind.
Program version numbers
- Gap4 4.7
- Spin 1.1
- Pregap4 1.3
- Trev 1.7
Operating systems
The binaries for this beta release have been created in the following build
environments. Typically newer environments for the same operating system
should work fine, but not necessarily older systems. (For example, the
binaries will not run under RedHat Linux 5.x, but will run on RedHat Linux
7.x)
- Digital Unix 4.0E
- RedHat Linux 7.1
- Solaris 8
- Windows 2000
- MacOSX 10.2
Demo data sets
The course (see course/*_docs/*.pdf) may be run when in demonstration mode (ie
without needing a licence). Specifically all demonstration data files are
considered as valid sequences and so are exempt from the licence
restrictions.
All pathnames listed below are relative to the installation root for the
package.
Here is a list of sequences which may be loaded into spin:
- userdata/5H1E_HUMAN.seq
- userdata/atpase.seq
- userdata/cemyo1.seq
- userdata/ECAE129.seq
- userdata/ecoli.0*
- userdata/lambda.seq
- userdata/mysd_caeel.seq
- userdata/mysa_human.seq
- userdata/mysa_drome.seq
- userdata/spin_dna.seq
- tables/vectors/lorist6.seq
- tables/vectors/m13mp18.seq
- tables/vectors/m13mp7.seq
- tables/vectors/pBs.seq
- tables/vectors/pgem3zfm.seq
- tables/vectors/pgem3zfp.seq
- tables/vectors/pgem5zfm.seq
- tables/vectors/psc194.seq
- tables/vectors/puc18.seq
For a good example of protein-protein similarity plots, try using
mysa_drome.seq and mysa_human.seq.
For dna-protein plots, try using cemyo1.seq against mysa_caeel.seq.
To see how spin handles large sequences try using ecoli.00003 and
lambda.seq. This is a large comparison: 250Kb against 48.5Kb. Hence
the slower searches, such as Find Similar Spans, will take a long
time. We suggest searching with Find Matching Words using a word
length of 12.
Gap4 in demonstration mode allows access to:
- demo/gap4/traces.tar
A tar of trace files base called with phred. Gap4 will automatically
read the files directly from the tar file (via the traces.tar.index
lookup file).
- demo/gap4/DEMO.0*
This database is a section from a c.elegans cosmid, still in several
contigs. You may try joining and editing contigs. The full
functionality of Gap4 is available except for assembling or
disassembling sequences.
- course/data/shotgun_data/* trace and experiment files
- course/data/long_reads/* trace and experiment files (long reads)
- course/data/ABI_Data/* original ABI files
- course/data/phred_data/* trace files base called with phred
- course/data/mutations/* scf files for mutation studies
Pregap4 in demonstration mode allows access to the same files listed for
Gap4.
Change log
Here is a list of changes since the 2001.0 release.
Gap4
Changes
-
Removed the maximum single-sequence length. This was 30000, but now we
can handle sequences from any length (implying that split_seq is now
largely redundant).
-
Added Report Mutations function to tabulate mutations (from MUTA/HETE
tags or from differences to the reference sequence).
-
Greatly improved "reference sequence" base numbering in the editor.
-
Can now right-click on sequence names to specify sequences as a
reference sequence or trace.
-
We can now assemble EMBL files. The features get turned into (sometimes
several) tags.
-
The Y-scaling on difference traces now zooms centred on the baseline.
-
New associated program: copy_reads. This is used with gap4 databases for
overlapping clones (eg two BAC clones). It copies the overlapping
sequences from one database into the other.
-
More chemistry types now supported and listed appropriately: BigDye v3
and MegaBACE ET.
-
The editor "show reading quality" command now does so regardless of
whether the quality cutoff is -1.
-
Re-enabled the clear button in the list editor for the "readings" list.
-
Under Unix BUSY files are now detected as real "in-use" files versus old
files left after an abort. Gap4 will override the BUSY file in such
cases.
-
Disabled automatic execution of OPEN and CLOS notes unless the
"-exec_notes" command line switch is used.
-
New "Save Settings" option in the editor.
-
The buttons around traces are now hidden in popup menus. This provides
more room for traces (especially when in 4 columns mode). The old-style
look is still available for those who prefer it.
-
Improved tag selector window to cope with having many more tag types.
-
Pads are now stripped from the sequence string search options (both in
the contig editor search tool and the main gap4 menus).
-
The Template Display now has an option to turn off auto contig
positioning (based on read-pair overlaps).
-
The default consensus algorithm is now to use confidence values (as EBA
has been recalibrated).
-
New "List Contigs" window to replace the old one. This has
several columns showing name, length and number of
sequences. Clicking on a column sorts the list.
-
Tweaked the Template Display colours so that they are easier
for screen projection and hopefully easier for colour-blind users.
-
Zero cutters are now listed separately in the restriction
enzyme "Output enzyme by enzyme" output.
-
After assembly we now detect contigs that have the vast
majority of fwd/rev reading pairs aligned in the same
orientation (such as would be the norm for a mutation
detection study). Having found such cases we complement these
if necessary to "correct" the orientation.
-
All restriction enzymes may now be selected by using the
Control-A binding.
Bug fixes
-
Fixed out-by-one error in reporting of the length of spanning
read-pairs.
-
Fixed determination of (in)consistent status for spanning templates when
shown in the template display.
-
Contig selector now remembers "display diagonal" setting after a clear
command.
-
Removed a potential crash in Find Read Pairs.
-
The scrollbar arrows in the trace display now work properly.
-
Fixed crash in Find Internal Joins when sequences at the end of the
contig contained more than 2050 base pairs of hidden data.
-
Fixed a rare crash in the trace display.
-
Reporting of probability values for consensus bases in the contig editor
now works correctly (instead of sometimes reporting zero).
-
The "Clear All" command of the contig selector no longer loses the
status line.
-
Fixed various out-by-one errors in directed assembly.
-
Fixed crashes in saving the consensus as experiment file format when
tags exist without comments.
-
Improved handling of long filenames during assembly when the sequences
being assembled are in one directory and the program was started from
another directory.
-
Contig selector: fixed a crash in "list contigs" when tags have been
selected to draw in the contig selector.
-
Prevent attempts to update the contig order using the template display
when opening databases in read-only mode.
-
Better checking of maxseq before shotgun assembly starts (to
avoid a crash).
-
Disassemble readings now corrects maxseq if it requires increasing.
-
Quality Clip can no longer adjust the contig length. This
cures shifting of consensus tags.
-
Disabled use of sequence ranges when selecting a single contig
for Suggest Primers. This case was bugged, but the code is
being superseded anyway.
-
Disabled use of sequence ranges for Double Stranding (as it
was buggy).
-
Fixed reading confidence values from non-SCF files that are
referenced from an Experiment file TN record when the
Experiment file does not have confidence values itself. These
would sometimes incorrectly get set to "2".
-
Fixed the positioning of plots in Strand Coverage when
contig sub-ranges were used.
-
Fixed a rare crash in the manual primer selection code (in the
contig editor).
-
Many plots now only zoom in X when Y zooming is inappropriate
(eg consistency plots).
-
Fixed a few bugs where zoom or selection drag-out boxes could
be left on the screen.
-
Restriction enzymes that have recognition sequences within a
sequence range, but have their cut site outside the sequence range
are now visible in the graphical plots.
-
Fixed a corruption of sequence "notes" when disassembling the
last reading in the database.
Trev
Changes
-
"Information" now displays the number of bases, number of samples, the
baseline (used for difference traces) and the maximum trace amplitude.
-
Minor menu reorganisation. File->Save As is now a cascading
menu including the File Type (to avoid problems with selecting
this on some operating systems). The View->Display menu is now
part of the main View menu. It is also no longer possible to
turn off the trace portion (which was somewhat pointless).
Bug fixes
-
Some page printing parameters could cause divide-by-zero errors.
-
Fixed a bug caused by attempts to perform edit operations before loading
a trace.
-
Confidence values are now loaded from experiment files instead of the
associated trace file. Fixed problems in saving them too.
Spin
Changes
-
May now select multiple files from the sequence load function.
-
Merged the "Configure max number of matches" and "Configure default
number of matches" dialogues into "Configure matches".
-
The sequence editor search command now gives the same output as string
search when finding results on the bottom strand.
-
Increased the size of entry names for fasta files from 20 to 50. Also
added a scrollbar to the personal file browser.
-
Can now set the sequence structure (linear or circular) via the
Sequences menu.
-
Added "Change Directory" dialogue to File menu.
-
Removed size constraints on the sequence display window.
-
Now uses drop-down boxes for selecting sequence names.
-
Can now also use the left mouse button to drag plots around (in addition
to the existing middle button binding).
-
Changed the allocation of colours for plots. These are now specified in
the 'spinrc' file.
-
Renamed "Count base composition" to be "Count sequence composition" to
reflect that it is not DNA specific.
-
Allow use of the left mouse button for dragging cursors. This
should make things much easier for Mac users.
Bug fixes
-
Dropping a plot onto non-plot type window (eg the main text output
window) no longer produces Tk errors.
-
Better handling of tabbed notebooks in the auto-generated EMBOSS
interface (eg for primer3). Updated prebuilt interface files to
EMBOSS 2.5.1.
-
The "all" key box in plots now only brings up a menu for results in that
window (rather than all results).
-
Fixed several bugs relating to dragging and dropping plots.
-
Aligning a short protein against a long protein now correctly reads in
the new aligned sequences, instead of possibly confusing it with a DNA
sequence.
-
Fixed problems with multiple sequence displays showing multiple,
different, lists of enzymes.
-
Fixed a rare graphics display bug of the Author Test whereby a
horizontal line could be drawn across a plot.
-
Fixed the "rescan matches" option of Find Similar Spans. It produced
an incorrect plot when a sequence sub-range had been chosen.
-
String Search now works if a sequence sub-range is used.
-
The Restriction Enzyme Map now works if a sequence sub-range is used.
-
Fixed a problem where temporary "lists" were not deleted when using
backslashes in windows pathnames.
-
Fixed some zooming problems in the consistency displays.
-
Linear/Circular status is now inherited when new sequences are
created (eg by complementing).
Pregap4
Changes
-
New modules: reference traces, trace difference and heterozygous
scanner. These form part of the new mutation detection methods. (See
elsewhere for details.) This replaces the old trace_diff module.
-
New module: polyA clip.
-
The Phred module now allows for additional phred arguments to be
specified. It also behaves better when finding non-trace files.
-
Extra suffixes on Sanger Centre naming scheme to support BigDyeV3 and
MegaBACE ET chemistries.
-
Estimate Base Accuracies (eba) now produces values that are
scaled to have equivalent magnitude to phred scores. (The old
scale is still available as an option.)
-
Removed the non-compact (separate) window layout style.
-
We can now read FASTA files into pregap4. The Initialise
Experiment file module automatically splits them into separate
experiment files.
-
Added Select All Modules and Deselect All Modules options
-
Rationalised the menus a bit. It may take a bit of getting
used to, but now things like Run are in the Modules menu and
the various Save options are in File.
Bug fixes
-
The Edit Experiment File Line Types window is now modal to prevent
corruptions by attempting to do other things at the same time.
-
Disabled the vector clipping modules rejection of sequences shorter than
16 base pairs (as this is a task for the quality clipping module).
-
Fixed the "del_temp_files" error message from the convert trace
module.
-
Fixed a rare crash in screen against vector.
Misc
Changes
-
NT experiment file line type (Gap4 "NoTe"s) is now read by the assembly
functions.
-
Retired the Repe program - we recommend use of RepeatMasker instead.
-
Upgraded the Tcl/Tk versions used to 8.4.0. This should not
have any visible effect except perhaps for a few graphical bug
fixes on Windows (ticks in menu, for example).
Bug fixes
-
Ran all the programs through Purify and Valgrind and removed any bugs
spotted (mostly small memory leaks).
Prefinish
-
There is still no graphical interface to this, but considerable
improvements have been made in the experiment suggestions. Please
contact jkb@mrc-lmb.cam.ac.uk if you are interested in testing this
component.