first previous next last contents

Explanation of Records

Record
AC, ACcession line
Format
AC string
Explanation
A unique identifier for the reading.

Record
AP, Assembly Position
Format
AP Name_of_anchor_reading sense offset tolerance
Explanation
For readings whose position has been mapped by an external program, these records tell the "directed assembly" algorithm where to assemble the data. Positions are defined as offsets from an "anchor reading" which is the name of any reading already in the database, an orientation (sense, + or -), and a tolerance. Readings are aligned at relative position offset + or - tolerance.

Record
AQ, Average Quality of the reading.
Format
AQ Numeric value in range 1 - 99.
Explanation
The average value of the "numerical estimate of base calling accuracy" as calculated by program eba. The value is useful for monitoring data quality and could also be used for deciding on an order of assembly - for example assemble the highest quality readings first.

Record
AV, Accuracy Values
Format
AV q1 q2 q3 ... or a1,c1,g1,t1 a2,c2,g2,t2 ...
Explanation
The accuracy values lie in the range 1-99. Either 1 per base (eg 89 50 ... or 4 per base (eg 0,89,5,2 50,3,7,10). Bonfield,J.K and Staden,R. The application of numerical estimates of base calling accuracy to DNA sequencing projects. Nucleic Acids Res. 23 1406-1410, (1995).

Record
BC, Base Calling software

Record
CC, Comment line
Format
CC string
Explanation
Any comments can be added on any number of lines.

Record
CF, Cloning vector sequence File
Format
CF string
Explanation
The name of the file containing the sequence of the cloning vector, to be used by vector_clip (see section Screening Against Vector Sequences).

Record
CH, Special CHemistry
Format
CH number
Explanation
Used to flag readings as having been sequenced using a "special chemistry". The number is a bit pattern with a bit for each chemistry type, thus allowing combinations of chemistries to be listed. Currently bit 0 is used to distinguish between dye-primer (0) and dye-terminator (1) chemistries. Bits 1 to 4 inclusive indicate the type of chemistry: unknown (0, 0000), ABI Rhodamine (1, 0001), ABI dRhodamine (2, 0010), BigDye (3, 0011), Energy Transfer (4, 0100) and LiCor (5, 0101). So for example a BigDye Terminator has bits 00111 set which is 7 in decimal.

Record
CL, Cloning vector Left end
Format
CL number
Explanation
The base position in the sequence that contains the last base in the cloning vector. Currently gap4 only uses the CS line.

Record
CN, Clone Name
Format
CN string
Explanation
The name of the segment of DNA that the reading has been derived from. Typically the name of a physical map clone.

Record
CR, Cloning vector Right end
Format
CR number
Explanation
The base position in the sequence that contains the first base in the cloning vector. Currently gap4 only uses the CS line.

Record
CS, Cloning vector Sequence present in sequence
Format
CS range
Explanation
Regions of sequence found by vector_clip (see section Screening Against Vector Sequences) to be cloning vector. Used in assembly to exclude unwanted sequence.

Record
CV, Cloning Vector type
Format
CV string
Explanation
The type of the cloning vector used.

Record
DR, Direction of Read
Format
DR direction
Explanation
Whether forward or reverse primers were used. Allows mapping of forward and reverse reads off the same template. NOTE however that we do not encourage the use of this method as the terms direction, sense and strand can be confusing. Instead we encourage the use of the PRimer line.

Record
DT, DaTe of experiment
Format
DT dd-mon-yyyy
Explanation
Any date information.

Record
EN, Entry Name
Format
EN string
Explanation
The name given to the reading

Record
EX, EXperimental notes
Format
EX string
Explanation
Another type of comment line for additional information.

Record
FM, sequencing vector Fragmentation Method
Format
FM string
Explanation
Fragmentation method used to create sequencing library.

Record
ID, IDentifier
Format
ID string
Explanation
This is the name given to the reading inside the assembly database and is equivalent to the ID line of an EMBL entry.

Record
LE, Can be used to identify the location of materials
Format
LE string
Explanation
Originally a micro titre dish well number. Used in combination with LI.

Record
LI, Can be used to identify the location of materials
Format
LI string
Explanation
Originally a micro titre dish identifier. Used in combination with LE.

Record
LN, Local format trace file Name
Format
LN string
Explanation
The name of the local format trace file. This information is passed onto gap4, and allows for local formats to be used.

Record
LT, Local format trace file Type
Format
LT string
Explanation
The type of the local trace file type (usually SCF).

Record
MC, MaChine on which sequencing experiment was run
Format
MC string
Explanation
The lab's name for the sequencing machine used to create the data. Used for logging the performance of individual machines.

Record
MN, Machine generated trace file Name
Format
MN string
Explanation
The name of the trace file generated by the sequencing machine MC.

Record
MT, Machine generated trace file Type
Format
MT string
Explanation
The type of machine generated trace file.

Record
ON, Original base Numbers (positions)
Format
ON (eg) 1..43 0 45..63 65..74 0 75..536
Explanation
The A..B notation means that values A to B inclusive, so this example reads that bases 1 to 43 are unchanged, there is a change at 44, etc.

Record
OP, OPerator
Format
OP string
Explanation
Someone's name, possibly the person who ran the sequencing machine. Useful, with expansion of the string field for monitoring the performance of individuals!

Record
PC, Position in Contig
Format
PC number
Explanation
For preassembled data, the position to put the left end of the reading.

Record
PD, Primer Data
Format
PD sequence
Explanation
The primer sequence.

Record
PN, Primer Name
Format
PN string
Explanation
Name of primer used, using local naming convention. Could be a universal primer.

Record
PR, PRimer type
Format
PR number
Explanation
This record shows the direction of the reading and distinguishes between primers from the ends of the insert and those that are internal. It is important for the analysis of the relative orientations and positions of readings on templates. When the positions of readings on templates are analysed (see section Find read pairs) primer types 1,2,3 and 4 are represented using the symbols F,R,f and r respectively.
0
Unknown
1
Forward from beginning of insert
2
Reverse from end of insert
3
Custom forward i.e. a forward primer other than type 1.
4
Custom reverse i.e. a reverse primer other than type 2.

Record
PS, Processing Status
Format
PS explanation
Explanation
Indication of processing status.

Record
QL, poor Quality sequence present at Left (5') end
Format
QL position
Explanation
The sequence up to and including the base at the marked position are considered to be of too poor quality to be used. It may overlap with other marked sequences - CS, SL or SR. Used in assembly to exclude unwanted sequence.

Record
QR, poor Quality sequence present at Right (3') end
Format
QR position
Explanation
The sequence from and including the base at the marked position to the end is considered to be of too poor quality to be used. It may overlap with other marked sequences - CS, SL or SR. Used in assembly to exclude unwanted sequence.

Record
RS, Reference Sequence
Format
RS string
Explanation
The name of a sequence, usually in EMBL format, used to define the target sequence, base numbering and feature table data for a project. Used to define the numbering and changes produced by mutations in individual sequence readings (see section Introduction to mutation detection).

Record
SC, Sequencing vector Cloning site
Format
SC position
Explanation
The cloning site of the sequence vector. Used by vector_clip (see section Screening Against Vector Sequences).

Record
SE, SEnse (ie whether complemented)
Format
SE number
Explanation
For preassembled data, the sense of the reading (0 for forward, 1 for reverse).

Record
SF, Sequencing vector sequence File
Format
SF string
Explanation
The name of the file containing the sequence of the sequencing vector, to be used by vector_clip (see section Screening Against Vector Sequences).

Record
SI, Sequencing vector Insertion length
Format
SI range
Explanation
Expected insertion length of sequence in sequencing vector. Useful for selecting templates for further experiments.

Record
SL, Sequencing vector sequence present at Left (5') end
Format
SL position
Explanation
The sequence up to and including the base at the marked position are considered to be sequencing vector. Written by vector_clip (see section Screening Against Vector Sequences).

Record
SP, Sequencing vector Primer site (relative to cloning site)
Format
SP position
Explanation
Location of the primer using to sequence relative to cloning site. Used by vector_clip (see section Screening Against Vector Sequences).

Record
SQ, SeQuence
Format
SQ \nsequence blocks...\n//\n
Explanation
Complete sequence, as determined by the sequencing machine. The sequence is broken into blocks of 10 bases with 6 blocks per line separated by a space (see the example below).

Record
SR, Sequencing vector sequence present at Right (3') end
Format
SR position
Explanation
The sequence from and including the base at the marked position to the end are considered to be sequencing vector. Written by vector_clip (see section Screening Against Vector Sequences).

Record
SS, Screening Sequence
Format
SS string
Explanation
Note that in earlier versions of this documentation this field was explained incorrectly. Due to this the field is not currently being used by any of our programs. The original meaning was to specify a sequence to screen against. Any number of SS lines could be present to denote any number of screening sequences. In the future we may change the meaning of this field to be a single SS line containing a file of filenames of screening sequences. If this causes problems for people then we will choose a new line type, so please inform us now. Also note that contrary to previous documentation, vector_clip does not use this field (it uses the SF field instead).

Record
ST, STrands
Format
ST number
Explanation
Denotes whether this is a single or double stranded template. This is useful for deducing suitable templates for later experiments.

Record
SV, Sequencing Vector type
Format
SV string
Explanation
Type of sequencing vector used. Can be used for choosing templates for custom primer experiments.

Record
TC, Tag to be placed on the Consensus.
Format
TC TYPE S position..length
Explanation
These lines instruct gap4 to place tags on the consensus. The format defines the tag type which is a 4 character identifier and should start at column position 5), its strand ( "+", "-" or "=" which means both strands), its start position followed by the position of its end. These two values are separated by "..". Following lines starting TG with space characters up to column 10 are written into the comment field of the tag. For example the next three lines define a tag of type comment that is to be on both strands over the range 100 to 110 and the comment field will contain "This comment contains several lines".
TC   COMM = 100..110
TC        This comment contains
TC          several lines

Record
TG, Tag to be placed on the reading.
Format
TG TYPE S position..length
Explanation
These lines instruct gap4 to place tags on the reading. See TC for further information.

Record
TN, Template Name
Format
TN string
Explanation
The name of the template used in the experiment.

Record
WT, Wild Type trace file
Format
WT string
Explanation
The filename of the wild type trace file. Used for mutation studies.

first previous next last contents
This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/formats_unix_20.html