It is important to note that the assembly program gap4
(see section Gap4 introduction)
will not operate to
its full effect if it is not given all the necessary data. For example
gap4 contains many functions that can analyse the positions and relative
orientations of readings from the same template in order to check the
correctness of the assembly and determine the contig order. However if
the records that name templates and their estimated lengths, and define
the primers used to obtain readings from them are missing, none of these
valuable analyses can be performed reliably. One way to ensure that all
the necessary fields are present is to use the program pregap4
(see section Pregap4 introduction).
In the descriptions below records containing * are those read into the
database during normal assembly; those with ** are extra items required when
entering pre-assembled data; those with *** are read from SCF files
(after the experiment file has been read to obtain the SCF file name);
(see section SCF introduction)
the record marked **** is an extra item required for Directed Assembly.
The order of records in the file is not important. They are listed
here in alphabetical order with, where possible, reasons for the
origin of their names. Several are redundant and no group is likely
to make use of them all. Obviously others can be added in the future.
Initially they might be of local use but if their use becomes wider they
can be added to the standard set. Standard EMBL records such as FT are
assumed to be included.
- AC
-
ACcession number
- AP
-
Assembly Position ****
- AQ
-
AVerage Quality for bases 100..200
- AV
-
Accuracy values for externally assembled data **, ***
- BC
-
Base Calling software
- CC
-
Comment line
- CF
-
Cloning vector sequence File
- CH
-
Special CHemistry
- CL
-
Cloning vector Left end
- CN
-
Clone Name
- CR
-
Cloning vector Right end
- CS
-
Cloning vector Sequence present in sequence *
- CV
-
Cloning Vector type
- DR
-
Direction of Read
- DT
-
DaTe of experiment
- EN
-
Entry Name
- EX
-
EXperimental notes
- FM
-
sequencing vector Fragmentation Method
- ID
-
IDentifier *
- LE
-
was Library Entry, but now identifies a well in a micro titre dish
- LI
-
was subclone LIbrary but now identifies a micro titre dish
- LN
-
Local format trace file Name *
- LT
-
Local format trace file Type *
- MC
-
MaChine on which experiment ran
- MN
-
Machine generated trace file Name
- MT
-
Machine generated trace file Type
- ON
-
Original base Numbers (positions) **
- OP
-
OPerator
- PC
-
Position in Contig **
- PD
-
Primer data (the sequence of a primer)
- PN
-
Primer Name
- PR
-
PRimer type *
- PS
-
Processing Status
- QL
-
poor Quality sequence present at Left (5') end *
- QR
-
poor Quality sequence present at Right (3') end *
- RS
-
Reference Sequence for numbering and mutation detection
- SC
-
Sequencing vector Cloning site
- SE
-
SEnse (ie whether complemented) **
- SF
-
Sequencing vector sequence File
- SI
-
Sequencing vector Insertion length *
- SL
-
Sequencing vector sequence present at Left (5') end *
- SP
-
Sequencing vector Primer site (relative to cloning site)
- SQ
-
SeQuence *
- SR
-
Sequencing vector sequence present at Right (3') end *
- SS
-
Screening Sequence
- ST
-
STrands *
- SV
-
Sequencing Vector type *
- TG
-
Gel reading Tag *
- TC
-
Contig Tag *
- TN
-
Template Name *
- WT
-
Wild type trace
This page is maintained by
staden-package.
Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/formats_unix_19.html