 
 
Version-2000.0 Release Notes
James Bonfield, Kathryn Beal, Matthew Betts, Mark Jordan and Rodger Staden
This release has been slow to come out because we wanted it to
coincide with, and be identical to, the first Windows 9X/NT release of
the package. Achieving this has taken longer than planned, but has
left us with the same source code for all systems, and so in a strong
position for future developments. We have also given the manual a
major overhaul and produced separate versions for UNIX and
Windows. The different manuals include their system specific
screendumps and we have also introduced alternatives for users with
two button mice. (The middle mouse button may now be simulated by
using the Alt key with the left mouse button.) A new "mini-manual",
giving a quick (45 page) introduction to the programs, has also been
written.  Finally, we now also include some notes and data for a
course in using the sequence assembly tools. These may be found in the
'course' subdirectory.
Note that only the "modern" programs have been ported to Windows
(i.e. pregap4 and its  ancilliary programs like vector_clip,
screen_seq,..., trev, gap4, nip4 and sip4.) To complete the
equivalence between the UNIX and Windows versions of the package we
have removed all the "old" (mostly FORTRAN) programs from the UNIX
release. These old programs (such as nip, sip,...) are now freely
available to anyone as Digital UNIX, IRIX, Solaris and Linux binaries
from our ftp site and will not be upgraded in the future. The Windows
version will only be available through commercial distributors, but
the UNIX versions will still be available directly from us at LMB.
A major change is the addition of a software licencing system. This
means that the package can be made available to anybody in
demonstration mode, and can be switched to full functionality with a
key specific to that copy of the package on that machine. We can also
provide temporary licence keys that will enable full functionality for
a limited period. We plan for these temporary licences to be made
available to applicants who simply fill in a web form.
To accompany the demo versions of the package we have included a set of data
files. In demonstration mode, only these files can be loaded into the programs.
Program version numbers
Gap4	4.5
Nip4	1.2
Sip4	1.3
Pregap4	1.1
Trev	1.5
Operating systems
The binaries have been created in the following build environments. Typically
newer environments for the same operating system should work fine, but not
necessarily older systems. (For example, the binaries will not run under
RedHat Linux 5.x)
- Digital Unix 4.0E
- Irix 5.3
- RedHat Linux 6.1
- Solaris 2.6
Demo data sets
All pathnames listed below are relative to the installation root for the
package.
Here is a list of sequences which may be loaded into Sip4.
- 	userdata/atpase.embl
- 	userdata/blue.seq
- 	userdata/cemyo1.seq
- 	userdata/ecoli.0*
- 	userdata/lambda.seq
- 	userdata/lorist6.seq
- 	userdata/m13mp18.seq
- 	userdata/m13mp7.seq
- 	userdata/mysa_drome.embl
- 	userdata/mysa_human.embl
- 	userdata/mysd_caeel.seq
- 	userdata/pjb8.seq
- 	userdata/puc18.seq
	For a good example of protein-protein similarity plots, try using
	mysa_drome.embl and mysa_human.embl.
	For dna-protein plots, try using cemyo1.seq against mysa_caeel.seq.
	To see how Sip4 handles large sequences try using ecoli.00003 and
	lambda.seq. This is a large comparison: 250Kb against 48.5Kb. Hence
	the slower searches, such as Find Similar Spans, will take a long
	time. We suggest searching with Find Matching Words using a word
	length of 12.
For Nip4, the following sequences are accepted.
- 	userdata/atpase.embl
- 	userdata/blue.seq
- 	userdata/cemyo1.seq
- 	userdata/ecoli.0*
- 	userdata/lambda.seq
- 	userdata/lorist6.seq
- 	userdata/m13mp18.seq
- 	userdata/m13mp7.seq
- 	userdata/pjb8.seq
- 	userdata/puc18.seq
Gap4 in demonstration mode only allows access to one data set at present.
This has been base called with phred. The trace files may also be viewed in
Trev.
	This database is a section from a c.elegans cosmid, still in several
	contigs. You may try joining and editing contigs. The full
	functionality of Gap4 is available except for assembling or
	disassembling sequences.
For Pregap4 there is no demonstration mode. Pregap4 may be started and
configured, but it will not process any data.
Sequence Library Access Using SRS
We have not upgraded our sequence library interface to SRS 6. and
are aware of some problems in our use of SRS 5. We hope to include
a Web based interface in a later release.
Linux, Gnome and Enlightenment
There is a known problem with the Gap4 contig editor when using Enlightenment
as the window manager. This is the default for Gnome, but it is not known
whether the problem arises when using Enlightenment in other environments. The 
symptom is that program will terminate instantly upon starting the contig
editor with a complaint about X_ConfigureEvents.
The solution is to change window managers, which may be adjusted using the
Gnome control panel.
Change log
Here is a list of changes since the 1999.0 (patched) release.
Gap4
Changes
- 	New plot: confidence values.
- 	New plot: reading coverage.
- 	New plot: readpair coverage.
- 	Primer suggestion from within the editor now ignores padding
	characters.
- 	Searching by sequence from the editor now ignores padding characters
	and searches both strands. Optionally it may also find non-exact
	matches (mismatches, but not insertions or deletions) and can search
	top, bottom or both strands.
- 	An extra editor search method is available: unpadded position
	search. This makes it possible to jump to a specific unpadded position
	within a padded consensus.
- 	The contig editor base numbers may now be shown as unpadded
	positions. Note this may be slow on very large projects as it will
	require frequent recalculation of the complete consensus.
- 	Confidence values can now be plotted in the Trace Display.
- 	The trace display now lists the original base confidence in the trace
	display information line.
- 	Primer and Chemistry information is now visible as single character
	codes in the trace display.
- 	The full reading name is now shown in the trace display. It is
	superimposed over the top-left corner of the trace.
- 	The trace file search path is now adjustable from the options menu
	(although the RAWDATA environment variable may still be used if
	desired). This writes a database note (of type RAWD) and so the search
	path will be remembered.
- 	New commands in the option menu: set alignment scores and set
	genetic	code.
- 	The alignment weight tables for Gap4 are now configurable; stored in
	$STADTABL/nuc_matrix. The alignment gap open and gap extention
	penalties may also be changed. The matrix file and penalties are
	stored in tables/gaprc as the ALIGNMENT.MATRIX_FILE,
	ALIGNMENT.OPEN.COST and ALIGNMENT.EXTEND.COST variables.
- 	Improved Find Internal Joins alignment algorithm. This is now a banded
	alignment, and so is much faster. Also improved the "extended
	consensus" calculation such that there are no more discontinuities in
	the sequence (caused by padding). This does not fix any bugs, but the
	change improves the sensitivity of find internal joins.
- 	The template display now separates out the plotting of consistent and
	inconsistent templates.
- 	The colours in the Template Display for reverse and custom-reverse
	primers have been changed from shades of grey to orange and
	orange-red.
- 	The consensus algorithm now has a "Display IUB codes in consensus"
	mode. The exact definition depends on the consensus algorithm being
	used.
- 	The consensus algorithm may now be chosen from within the contig
	editor.
- 	The consensus, when produced in fasta format, now uses the contig
	identifier as a fasta sequence name instead of a reading number. The
	reading number is included as a fasta comment after the contig
	identifier.
- 	Gap4 now has "user levels"; currently only "beginner" and
	"expert". Gap4 will start in beginner mode. Use the Options menu to
	change to expert mode. Beginner mode has several of the less-often
	used functions removed.
- 	The contig editor now has the notion of a reference sequence. This is
	the sequence to which base numbering should be applied. The sequence
	numbering can optionally start from an offset other than 1 and may
	wrap-around at a predefined length. All these details are written into
	the contig REFS note.
- 	Pressing the middle mouse button on the reading names in the editor
	will now 'copy' the name to the paste buffer, allowing for easy
	cut-n-paste.
- 	The control-h keybinding to remove readings from within the contig
	editor now also works when the mouse pointer is above the
	sequence names sub-window.
- 	Using the popup-menu (right mouse button) in the contig editor now
	also moves the editor cursor. This makes displaying of tag information
	(for example) more intuitive.
-  Added a gaprc parameter to control whether the Delete key should act 
        in a Motif/Windows style or an Emacs style. Defaults on for Windows, 
        off for Unix.
- 	Added an assembly interface to cap3. (Cap3 is not included.)
- 	Multiple files for assembly may be picked directly from the assembly
	dialogue without needing to first create a file of filenames.
- 	Added a new gap4 command line switch "-menu_file". This may be used to
	customise local menu configurations. Try "gap4 -menu_file mito" for
	example; see $STADENROOT/tables/gaprc_menu_mito.
Bug fixes
- 	Specifying ranges for masked consensus was masking the wrong
	regions. (This worked correctly when all the consensus was output.)
	This effected both saving of masked consensus and the masked assembly
	options.
- 	Attempting to align sequences in the join editor with an overlap of
	greater than 8Kb could crash Gap4. Now works with any length (although
	it may be slow).
- 	Removed a problem with the "lock" mode of the join editor: by
	extending cutoffs it was possible to `break' the lock position.
- 	The colours used for highlight disagreements in the join editor have
	now been fixed.
- 	Fixed problems with unneeded cursors being left in various graphical
	plots after joining contigs.
- 	Removed a problem with the "lock" mode of the join editor: by
	extending cutoffs it was possible to `break' the lock position.
- 	The trace display could crash when scrolling traces immediately after
	quitting an individual trace display.
- 	Diff against consensus trace (contig editor) was sometimes crashing.
- 	Fixed crash when trying to display multiple diffs between the same
	pair of sequences.
- 	Using "Quality Clip" on databases that were not base called with a
	phred-scaled confidence system would incorrectly clip.
- 	Fixed a crash with searching by reading name within the editor.
- 	Changed the focus bindings for the contig editor. This should prevent
	the auto-raising problems that some systems have.
- 	Added back the "lost" editor menu item: group readings by template.
- 	Fixed a problem with the editor "search by file" mechanism and
	specifying reading names when there are more than 10,000 sequences
	assembled.
- 	Dump Contig (contig editor) using line lengths of > 300 would
	sometimes crash gap4. It now supports up to 1000 (the limit allowed by
	the dialogue).
- 	The day field of dates listed in the Note Selector window was
	incorrect. (NOTE: This was not a year 2000 problem, but simply a mixup
	of the date formatting.)
- 	Fixed a crash in Find Repeats when it found many copies of a
	repeat on both strands.
- 	The contig comparator now ensures that all items are visible (even if
	it means moving them). Previously there were some cases where joins
	found in the cutoff data of the right-most contig were not visible
	(although the "next" button worked).
- 	Fixed bug with the crosshairs in contig selector/comparator, which
	when zoomed up displayed the position of the crosshairs incorrectly.
- 	Fixed a problem when trying to use the popup-menus from the
	restriction enzyme plot.
- 	The restriction enzyme selector sometimes had problems updating the
	enzyme list when using the filebrowser to pick a "personal" enzyme
	file.
- 	Removed a crash in the restriction enzyme plot, triggered when leaving
	the plot displaying while making a contig join.
- 	Fixed window resizing problems in the restriction enzyme plot.
- 	The default confidence value for sequences without confidence values
	is now 2, instead of 99.
- 	Improved the error messages when attempting to assemble non-existent
	files.
- 	The debug "log file" was switched after a copy database command:
	subsequent logs went to the copies log file.
- 	Local GTAGDB files did not correctly override tags found in the tables
	directory.
- 	The show relationships output was sometimes poorly tabulated.
- 	The difference clip function no longer quality clips the extreme start
	and end sequences. This depended on correct confidence values and
	sometimes failed when they were not present.
Nip4
Changes
- 	Added a general-purpose weight matrix search function.
- 	A new program, make_weights, is available to produce weight matrices
	from a set of aligned sequences.
- 	The weight matrix searching functions now use log-odds scores.
- 	The splice junction search is improved for human data - using a
	better weight matrix (which is now also user-definable). The new
	search uses log-odds scores.
- 	The gene search functions now all use log-odds scores, which means
	they may be compared to one another easily. The codon preference
	search has been generalised; it now has the ability to make use of
	coding and non-coding tables (automatically generating non-coding
	tables from the coding table if it is missing), and hence now the base
	preference search is a sub-option of the codon preference search.
	There are now also options to normalise to average amino acid
	composition and to use the amino acid composition alone.
- 	Stop codons may now automatically be plotted when requesting a gene
	search plot.
- 	The author test gene search now requests percentage error instead of
	window length.
- 	Filenames may now be specified on the command line.
- 	Codon tables may now be appended to existing tables. This concatenated
	codon table file is required for some of the newer search options.
- 	Codon tables may now be written to the output window without needing
	to save them to disk.
- 	Improved the genetic code selection.
- 	The string search options can now distinguish between iub codes and
	"literal" characters.
- 	Changed the default scoring for the trna search and improved the use
	of conserved bases.
Bug fixes
- 	Fixed bug in the stop codons plot where the determination of the
	reading frame was incorrect if the starting position was not 1.
- 	Fixed window resizing problems in the restriction enzyme plot.
- 	Fixed bug with base bias search if all the results are 0 - this caused
	the program to crash.
- 	Fixed miscellaneous bugs dealing with moving and superimposing plots.
Sip4
Changes
- 	Filenames may now be specified on the command line. The last two will
	be the horizontal and vertical sequences.
- 	Added a "nearest dot" option to the sequence display which moves to
	the visibly-nearest point in the 2D dotplot, rather than the
	mathematically nearest match.
Bugs fixed
- 	The local alignment code now properly handles mixed case alignments.
- 	If a comparison function was performed on a sequence which doesn't
	have a library eg an aligned sequence, sip4 would crash.
- 	Switched the default values for the gap start and gap extend penalties
	(they were the wrong way around).
- 	Both sip4 and nip4 no longer complain when the default sequence
	library cannot be found.
Pregap4
Changes
- 	New module: qclip (Quality Clip) which replaces the old clip module.
	This can clip a sequence based on the average confidence values
	assigned by eba, phred, ATQA or other similar tools.
- 	Added an ATQA module.
-  Added an extract_seq module.
 
-  Wrote a new program find_renz which searches for a restriction 
        enzyme site in a vector file, and added a corresponding hook to 
        pregap4 so that we can now type in (eg) SmaI to the cut-site box 
        and pregap4 will work things out accordingly.
 
- 	Included a more complete vector-primer file. The sequencing vector
	clip module now has an interface to select subsets from this file.
- 	Multiple files may now be picked for processing directly from the
	pregap4 GUI without needing to first create a file of filenames.
- 	Improved handling of files contained within remote directories.
	Pregap4 now has an output directory, allowing the results from
	processing data spread over several directories to be stored within a
	single output directory.
- 	The E.Coli genome used by Screen Sequences has now been split up into
	several small chunks, which slightly speeds up the search and uses
	much less memory.
- 	The blast screen module can now output tags if desired. This tags all
	matches, regardless of whether the sequence meets the specified "match
	fraction". Hence by setting the match fraction to greater than 1.0
	this module may be used to add arbitrary match tags to sequences
	without rejecting any of them.
- 	The E value is now adjustable in the blast module.
- 	The cross_match module now works with the newer cross_match releases
	(which no longer support experiment file format).
- 	The cross_match module now has a gap_size parameter (adjustable only
	from the pregap config files). Matches found within $gap_size of the
	end of the sequence or within $gap_size of another match are stitched
	together. gap_size is initially set to 15 bases.
- 	Added an "other arguments" interface to the phrap module.
- 	We now report simple status information such as "module x needs
	configuring" in the blank space between the Run and Help buttons at
	the bottom of the configure window.
- 	The "save all parameters" buttons have had their names changed to
	(hopefully) become clearer.
- 	A new function has been added, named "Load Naming Convention". This
	is identical to "Include Config Component" except for the default
	location for the file-browser.
Bugs
- 	Removing a module with the Add/Remove modules command no longer gives
	Tcl error messages when the module being removed is the currently
	active module.
- 	The saving of the sequencing vector clip module parameters now
	correctly saves the vector-primer file details.
- 	Fixed problems when dealing with configuration files produced on
	different systems with different $STADENROOT settings.
- 	Improved error handling when dealing with third-party tools which have
	not been installed on the local system (eg Phred, RepeatMasker, etc).
- 	Improved error handling in phred module.
Trev
Changes
- 	Now displays sequence confidence values.
- 	Can now handle multiple files from within the "open" command.
- 	Printed trace files may now have a title, which defaults to the trace
	file name.
- 	Saving in plain-text format now only writes out the good
	quality region.
Bugs fixed
- 	Trev failed to correctly write edited confidence values back to
	experiment files.
- 	The "unable to configure trace options" message no longer appears when
	failing to load a file.
- 	The "undo clipping" command failed when the previous right-hand clip
	point was the very end of the sequence.
- 	Certain "corrupt" trace files would cause crashes in the trace
	printing code. Specifically when base calls are positioned beyond the
	end of the 'sample' data.
Misc
Changes
-  We now include a full tutorial on using the sequence
	assembly tools, including documentation and data. This may
	be found in the "course" subdirectory, but also please check 
	for newer versions at
	  
	    ftp://ftp.mrc-lmb.cam.ac.uk/pub/staden/course/
	
- 	Vector_clip now has a minimum length of 5' sequence to match when
	using vector_primer files. This allows us to over-specify the vector
	sequence so that we can use the same vector_primer file line for
	multiple primer sites (with the same cut site).
- 	Vector_clip now writes SF and PR records to the experiment files
	(derived from the matches found). If no match is found, no PR record
        is written. If the vector rearrangement search finds there is no SF
        record in an experiment file it writes a CC record to that effect,
        but writes the reading name to the passed file of file names.
- 	Vector_primer file format simplified: the numeric values included
	incase they were needed in the future have been dropped.
- 	Screen_seq will now add tags to the failing sequences.
- 	Extract_seq can now handle multiple files on the command line . It
	also has a -fasta_out option which, when combined with specifying
	multiple files, provides a handle way of converting many Experiment
	files into a single fasta file.
- 	Init_exp now has a "-conf" option to force the confidence values to be
	written to the Experiment Files (even when no edits have been made).
- 	The order of SCF comments written by makeSCF has been sanitised. The
	textual date format has also been fixed (it was two months out,
	although the numerical date output was correct).
- 	Online help now automatically looks for Netscape for viewing HTML
	documents.
Bugs fixed
- 	Gap4, Sip4 and Nip4 work better with small screen sizes (eg 800x600)
	although this still isn't perfect. We recommend a screen resolution of
	at least 1024x768.
- 	The pathname expansion functions no longer expand environment
	variables unless a dollar symbol preceeds them.
- 	In vector_clip, default 5' positions that are higher than the sequence
	length was causing crashes.
- 	Fixed a bug with the cosmid clipping in extract_seq. It worked find
	with CS lines, but was incorrect for Experiment files using a CL and
	CR notation.
- 	The sequence format recognition in Nip4 and Sip4 would fail when a
	protein sequence in plain text format started with "SQ".
- 	More robust handling of ABI format data. We make up data with 0 level
	samples when the base data extends beyond the stored sample data.
- 	Improved the sequence format recognising code used within nip4 and
	sip4.
- 	MakeSCF no longer adds duplicate MACH fields to the SCF comments.
- 	Fixed a trace rescaling bug in MakeSCF. Only causes problems when
	using the -normalise command line switch.
- 	Using non-default colours now works better. (Certain dialogue
	components were ignoring the user-defined colours.)
 
