Pregap4 - Pregap4-Database-Simple

Simple Text Database

This option allows interrogation of a very simple format text database with one line per sequence. The sequence identifier is the first word of a line with one or more additional columns of information relating to specific information about that sequence. All columns in the database file must have the same format and only one database file may be used at any one time.

For example, we may wish to store the primer type, primer site, template name and the number of strands on the template for each sequence. This corresponds to the PR, SP, TN and ST Experiment File line types. We could then create a text database looking something like the following:

# ID            PR      SP      TN        ST
xb54a3.s1       1        41     xb54a3    1
xb54b12.s1      1        41     xb54b12   2
xb54b12.r1      2       -24     xb54b12   2
xb54b12.r1L     2       -24     xb54b12   2

(The first line, starting with # is just a comment. Pregap4 does not use this; it is purely so that we know which information is in which column.)

We can then direct pregap4 to extract the information from each of these four columns for each reading being processed and to store this information in the Experiment File. This information can then be utilised by the vector clipping and assembly modules.

[picture]

The Simple Text Database interface consists of an entry box to specify the database file name, add and delete buttons, and a line type selector for each column in the database (excluding the reading name column). The above picture contains the database setup for extracting the primer type, primer position, template name and number of strands as described in the above example.

The "Add column" button adds a new line type selector at the bottom of the window. This contains an option menu which can be clicked to choose a new Experiment File line type and a label indicating the column number. The "Delete column" button removes the bottom-most line type selector.

The "Ok" button will accept this configuration and will also write the details to the current pregap4 configuration file. To disable a previously setup Simple Database Configuration press delete until there are no line types listed and then press Ok once more.

The simple text database does not need to include a record for every reading and special characters can be used to encode names so that readings produced in similar ways can be grouped. For example, if the first 6 letters of the name encode a "plate" name, and all the sequences on that plate have been sequenced using the same vector then we could create a database file as follows.

# ID            SF              SC      SP
6abz91*         m13mp18.seq     6249    41/-24
6aca68*         puc18.seq       248     40/-28
6aca69*         puc18.seq       248     40/-28
6aca70*         puc18.seq       248     40/-28
6acb21*         m13mp18.seq     6249    41/-24
6acd49*         puc18.seq       248     40/-28
6acd51*         puc18.seq       248     40/-28

The sequence identifier (ID) is searched for using a pattern matching rule (as dictated by the Tcl string match command). The pattern matching uses special characters as follows:

*: Matches any sequence of characters in the reading identifier, including an empty string.
?: Matches any single character in the reading identifier.
[chars]: Matches any character in the set given by chars. If a sequence of the form r-v appears in chars, then any character between r and v, inclusive, will match (rstuv).
\x: Matches the single character x. This provides a way of avoiding the special interpretation of the characters *?[]\ in the reading identifier.

This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/pregap4_unix_55.html