File Formats - Exp-Records

Records

It is important to note that the assembly program gap4 (see section Gap4 introduction) will not operate to its full effect if it is not given all the necessary data. For example gap4 contains many functions that can analyse the positions and relative orientations of readings from the same template in order to check the correctness of the assembly and determine the contig order. However if the records that name templates and their estimated lengths, and define the primers used to obtain readings from them are missing, none of these valuable analyses can be performed reliably. One way to ensure that all the necessary fields are present is to use the program pregap4 (see section Pregap4 introduction).

In the descriptions below records containing * are those read into the database during normal assembly; those with ** are extra items required when entering pre-assembled data; those with *** are read from SCF files (after the experiment file has been read to obtain the SCF file name); (see section SCF introduction) the record marked **** is an extra item required for Directed Assembly.

The order of records in the file is not important. They are listed here in alphabetical order with, where possible, reasons for the origin of their names. Several are redundant and no group is likely to make use of them all. Obviously others can be added in the future. Initially they might be of local use but if their use becomes wider they can be added to the standard set. Standard EMBL records such as FT are assumed to be included.

AC: ACcession number
AP: Assembly Position ****
AQ: AVerage Quality for bases 100..200
AV: Accuracy values for externally assembled data **, ***
BC: Base Calling software
CC: Comment line
CF: Cloning vector sequence File
CH: Special CHemistry
CL: Cloning vector Left end
CN: Clone Name
CR: Cloning vector Right end
CS: Cloning vector Sequence present in sequence *
CV: Cloning Vector type
DR: Direction of Read
DT: DaTe of experiment
EN: Entry Name
EX: EXperimental notes
FM: sequencing vector Fragmentation Method
ID: IDentifier *
LE: was Library Entry, but now identifies a well in a micro titre dish
LI: was subclone LIbrary but now identifies a micro titre dish
LN: Local format trace file Name *
LT: Local format trace file Type *
MC: MaChine on which experiment ran
MN: Machine generated trace file Name
MT: Machine generated trace file Type
ON: Original base Numbers (positions) **
OP: OPerator
PC: Position in Contig **
PD: Primer data (the sequence of a primer)
PN: Primer Name
PR: PRimer type *
PS: Processing Status
QL: poor Quality sequence present at Left (5') end *
QR: poor Quality sequence present at Right (3') end *
RS: Reference Sequence for numbering and mutation detection
SC: Sequencing vector Cloning site
SE: SEnse (ie whether complemented) **
SF: Sequencing vector sequence File
SI: Sequencing vector Insertion length *
SL: Sequencing vector sequence present at Left (5') end *
SP: Sequencing vector Primer site (relative to cloning site)
SQ: SeQuence *
SR: Sequencing vector sequence present at Right (3') end *
SS: Screening Sequence
ST: STrands *
SV: Sequencing Vector type *
TG: Gel reading Tag *
TC: Contig Tag *
TN: Template Name *
WT: Wild type trace

This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/formats_unix_19.html