Version-1.4 (aka 2004.0) Release Notes, February 2003

This release marks the first Staden Package release made available since Rodger Staden's group disbanded at MRC (due to funding issues). Since then MRC released the package under an Open Source licence and the package has migrated to SourceForge.net.

One important change is the focus of development. The main body of changes here have been implemented by myself (James Bonfield) at the Wellcome Trust Sanger Institute, with the exception of the Contig Navigation function (by Mark Griffiths, also of WTSI). It is clear that ALL changes apply to Gap4 and Prefinish. This is as a direct consequence of the group disbanding and my new position at WTSI. If there are areas of the package that you feel are now being neglected (which there certainly are!) and you are in a position to help, please email me using 'jkbonfield at users.sourceforge.net'.

One other important change is the level of support available. Basically no support is guaranteed, and no support should be expected. However PLEASE do submit any bugs you find (even if it's in part of the package no longer being actively worked on) to the SourceForge bug tracking system. It may be found at: https://sourceforge.net/tracker/?group_id=100316&atid=627058

Currently the package has been built for most Unix systems we previously supported (possibly all - check the SourceForge downloads links to check), however MS Windows support is likely to be lagging behind. This is primarily because I do not yet have the resources available to maintain a set of MS Windows binaries. Again, if you can assist in this area please contact me.

Now, on to the changes themselves. The full list is rather long and only included below incase you are a glutton for punishment. A summaried version follows.

Gap4 Tag Editing

This has had quite a major overhaul. The most obvious change is that the Edit Tag and Delete Tag functions have now become cascading menus showing all the tags under the current editing cursor position. This solves the problem of how to manipulate tags that are not the top-most displayed ones, but it does mean editing is slower.

Therefore to speed up editing the F11 and F12 keys are bound to edit (F11) and delete (F12) the top-most visible tag. Note that you will need to enable F12 in the Edit Modes menu as by default it is disabled.

Tag Macros are another new feature. Using Shift-F1 to Shift-F10 a tag editor window will appear. From here you can set the tag type, strand and comment and then use "Save Macro" to remember those settings. They are remembered for this session only, so use the editor Settings -> Save Macros to store these to disk. Once defined, a tag macro may be applied by underlining the region you wish to tag and then pressing the appropriate function key, F1 to F10.

By positioning the editor cursor above a tag and using Control-F1 to Control-F10 you can take a copy of the tag underneath the cursor and store it in the appropriate tag macro (without bringing up the macro editor). Combined with F1 to F10 this then provides an equivalent to tag cut-and-paste.

The Tag Editor window itself has also undergone some changes. The existing Save command works just as before, but there are now two new ways to Save; Move and Copy. To use these firstly underline a new region that you wish to tag. Move then moves the tag to that region while Copy creates a duplicate at that region. The underlined region does not need to be within the same sequence, or even the same contig editor. (Although it does need to be an editor within the same Gap4!)

Contig Navigation

At the editor level right clicking on a sequence name now shows a submenu "goto...". This contains a list of the readings sharing that template. The sub-menu can be torn off if you wish.

Hyperlinks have also been added to the reading list produced in the editor using Settings -> Set Active Readings. Just double click on a name in the list to goto that reading in the contig editor. Similarly hyperlinks are now on lists loaded in via the main Lists menu.

home

However by far the most flexible way of navigating through contigs using an external file is the Contig Navigation function (in main gap4 View menu). This needs a filename, which should contain a series of lines containing the reading name, position, and then an arbitrary text comment. Once loaded a new dialogue allows stepping forwards and backwards along the list. This is a replacement for the original contig editor Search By File method.

Reading selection and Disassemble Readings

The Disassemble Readings function has been completely rewritten (along with Break Contig too). It is now much much faster, but the key change for functionality is the "Move readings to new contigs" and "Split into single-read contigs" options. Moving readings to new contigs will try to keep the assembly together as far as is possible. So for example it is now possible to select all members of a repeat falsely assembled and to disassemble the repeat to form one new contig (rather than one per reading as in the previous Gap4, and as per the "split into single-read contigs" option).

From this it follows that disassembling a reading and all readings to its right is directly equivalent to the old break contig function. Indeed Break Contig has now been written to do just this, but it is still available as before.

This means that production of the lists of readings to pass into disassemble readings has undergone a much needed improvement.

Manual generation of lists of readings from within the editor has been simplified. There is no longer a difference between clicking using the left mouse button and the middle mouse button. Additionally the popup menu (via the right mouse button) on the reading names panel contains options to "Select this reading and all to right" and "Deselect this reading and all to right". (It is recommended that you use these while in the editor "sort by positions" mode so that the consequences will be obvious.)

Together these functions serve an easy way of selecting all readings within designated regions. Furthermore, once selected the list can be manually adjusted in the usual way by clicking on/off the highlight for individual sequences.

home

Template information in the editor

The Contig Editor name panel has a coloured character (between the reading number and name) to encode the consistency of the template. It is white when consistent, so only inconsistent templates will have a colour associated with them. The template consistency is also shown in the status line of the editor when hovering the mouse over a reading name.

The meanings are:

D / Light Grey: Length of template ("D"istance). The size is computed either from using the position of forward and reverse readings or from the SVEC vector tags on a single reading.
S / Red: Strand. Either two forward (or two reverse) sequences are on opposing strands, or a forward and reverse read pair are on the same strand.
P / Blue: Primer position. If the start or end of the template is computed using multiple sequences (eg via two forward readings) and the start or end is not consistently determined within 100bp then this is flagged.
? / Dark Grey: More than one of the above inconsistencies.

When making a join in the join editor, using quit/join to make the join will now inform you how many read-pairs span this contig, and how many of those are consistent or inconsistent. (This information is shown before you accept the join.)

- - -

And finally... the bionet/usenet newsgroup (bionet.software.staden) seems very underutilised, but it can also be hard to post there sometimes. The SourceForge forums are available at: https://sourceforge.net/forum/?group_id=100316

If you wish to be kept up to date with new package releases and any announcements, please either subscribe to the staden-package mailing list held at sourceforge here: https://sourceforge.net/mail/?group_id=100316

or create a SourceForge account and add a monitor to the 'staden' package. ~ James Bonfield

Full Change List

Gap4

New features

Added Contig Navigation to the main view menu. This is a GUI to the GapNav (Sanger) tool, but equally well can take input from any similar formatted file. (It is the same format that Search by File in the editor uses.)
Added Tag Editor "Move" and "Copy" buttons. To use these underline the new tag location in the contig editor and then hit the appropriate button in the tag editor. This may be used to move tags to different sequences, the consensus, or even different contigs.
Tag Macros have been added to the contig editor. Pressing Shift-F1 to Shift-10 brings up the macro editor. Then underlining a region and pressing F1 to F10 generates the appropriate tag type. (Your window manager may block some of these keys, so experimentation will be called for.) Tag Macros can also be generated by highlighting a tag and pressing Control-F1 to Control-F10. This copies the tag underneath the editing cursor to the appropriate macro number, and so can be considered as a rudimentary cut and past.
Fast editing and deletion of tags via the F11 (edit) and F12 (delete) keys. Note that F12 needs to be enabled in the Edit Modes menu first.
The Contig Editor names window now has a "goto..." submenu allowing jumping to other readings from the same template (whether or not they are in the same contig).
Added a "Group by" submenu (of settings) to the Contig Editor. This allows the reading names to be sorted by position, name, template and strand.
Rewrote disassemble readings. It is now much faster and is also more flexible, allowing the marked readings to be moved en-mass to new contigs while keeping their overlaps intact. (Break contig is now just a special case of disassemble readings, and has been rewritten accordingly.)
Improved reading selection in the Contig Editor names window. In addition to the usual left-click to highlight a reading name, there is now a right-click menu containing a variety of selection steps, such as all readings on this template and all readings to the right of this point (useful for a more fine-grained break contig via the disassemble readings function). The 'set active list' output has been improved too, so that this list now has "hyperlinks" when viewed.
Rewrote the Options->Set Fonts dialogue.
Template status is shown in the editor name panel and the status line.
The count of consistent and inconsistent contig-spanning templates is reported by the Join Editor before making the join.
The Edit Tag command in the contig editor is now a cascading menu listing all tags underneath the editing cursor. Similarly for Delete Tag.
The X11-based Tk file browser is now MUCH faster (between 80 and 160 fold).
New command "N-base clip" in the main edit menu. This is to work around a bug of phrap where it commonly adds long runs of - or Ns just to include one extra base that, by chance, agrees with the consensus.
"Find sequence" (main view menu) now removes pads from the query sequence before searching. This function also now allows searching for matches within the reading sequences in addition to within the contig sequence.
The tables/*rc file loading now looks for *rc.local too. This makes upgrading from one release to another easier.
Save consensus now has the option to use the left-most template name instead of left-most reading name as the contig identifier. This is useful primarily for cDNA based projects.
Reading numbers in the contig editor now have an explicit sign (ie "+10" instead of "10").
Added a GC_Clamp, self_any, self_end, max_poly_x and max_end_stability options (from Primer3) to the oligo selection dialogue and it now also remembers the users inputs for other parts of this dialogue.
The List Load function now automatically adds hyperlinks to the input reading names meaning that it is a very useful way of stepping through a list of sequences to inspect/edit. (However see Contig Navigation for an ever better way.)
add_tags and enter_tags API functions now allow for tags to be loaded with unpadded sequence coordinates.
Improved output to the database .log file. It now contains user name and hostnames.
The contig identier component of dialogues now automatically has focus and selection, allowing for quicker overtyping of the contig name.
Right-clicking on a tag in the contig selector now allows for "edit contig at this tag".
Remember the editor Search window values (within a single session) to speed up searching for the same thing over multiple contigs.
Check Database now checks that all sequences contain printable characters. This slows it down by approximately 25%.
The RAWDATA search path now allows for traces to be fetched directly from the ensembl trace repository or via any specified URL (via the wget utility).
The List Contigs window now has a save button to save the contig order (useful after sorting by column headings).

Bug fixes

The consensus algorithm was treating bases in columns with pads a little differently than bases in columns without pads. This has now been sanitised to be consistent, although it may yield confidence values 1 or 2 lower than before in some cases.
Improved error reporting from disassemble readings and break contig.
Improved error reporting of tag input.
Various template calculations (consistency, coordinates) have been bug-fixed. This is primarily a fix for prefinish, but the bug was in gap4 and this also improves other parts of gap4 too. Also the orientation for inconsistency templates is now set to the most likely orientation rather than "?".
The trace display should now load faster due to removal of the file format checking (which caused every trace to be loaded twice).
Find Internal Joins was missing some matches. Additionally there are tighter constraints on it uses up lots of memory and/or CPU time. It also depads sequences prior to searching for alignments, which helps with very deep alignments.
Fixed a problem causing database files to gradually grow unnecessarily.
Sped up the output for extract_readings when saving in directed assembly format.
Restriction enzyme map: corrected the textual output to count in unpadded base coordinates (was padded).
Removed a buffer overflow in the tag-drawing code of the contig selector.
Fixed generation of the mutation report when the assembly contained sequences without traces or with missing traces.
Tag searches (and maybe others) in the editor could miss hits when Group Readings by Templates was enabled.
The "Select All" and "Clear All" buttons in the tag selector windows are back. (They were removed by accident.)

Prefinish

Changes/New features

Apply a cost increase to experiments that do not connect to at least one end of the problem region. The reason being that a problem from 1-1000 with reading length 600 can be solved in 2 experiments, but chosing the first experiment (due to minor reasons like better primer) from 200-800 requires 2 extra experiments to solve it.
Added a two-pass method to generate_experiments. Pick the most appropriate end for single-stranded experiments (as before), but if that fails we now work outwards from the other end too. This helps to pick experiments when our contig is covered entirely by a single-stranded region.
New option -skip_fake_templates to reject templates that do not contain at least one reading referring to a trace file. This is designed to be an easy way of filtering out assembled consensus sequences.
Added a notion of a result type having a desired number of solutions to use. This is distinct from the number of items in a group. Eg we may want to pick 2 primers, and use 3 templates for each. The main purpose for this though is for picking more than 1 resequence experiment for each problem. Added -reseq_nsolutions, -long_nsolutions and -pwalk_nsolutions as configuration parameters.
Improved the score of experiments that have large groups. Eg more templates for a primer-walk are better than one.
Added the filter_words algorithm into prefinish so that poly-A, GT-rich, etc can be filtered on. These are now also defined classification types and we check for different characters other than '#' in the finish_walk algorithms too.
Added a bonus score for experiments that fix all mandatory problems within a +/- 100 base pair region. This gives an encouragement to not leave tiny problems which require another experiment.

Bug fixes

Template start and end coordinates for inconsistent templates are not computed via the min/max observed locations rather instead of the computed ranges, as the computation is often wrong with inconsistent data.
Fixed a bug in the use of strand vs sense. We now check sense for resequencing experiments as it doesn't matter if it's fwd or reverse sequence, just that it heads in the right direction.
Speed up by moving the dust filtering to before primer picking instead of after. This removes the generation of lots of false primers.
Added code to correctly score sequences that extend the contig (if extending is required) even though the problem being solved may not be the contig-extend problem. In this case it also correct clears the CONTIG_LEFT_END and CONTIG_RIGHT_END classification flags.

Mutscan

Changes/New features

Filters clusters of tags as these are invariably due to alignment failures.

Bug fixes

Correct for divide by zero and log(0) in various places.

Io_lib

Added LG (Ligation - a combination of LI and LE) to the experiment file format.
Added a -fofn option to extract_seq.
Better error checking for writing compressed files.
Protect against the base spacing being listed as a negative number in the ABI file.
Added support for reading phred-style confidence values from ABI files.
io_lib can now fetch traces directly via a URL or from the ensembl trace repository using either URL=%s or ARC=%s:%d (host:port) syntax.

Misc

New program "stops". It searches for likely 'stop' regions within trace files.