Copyright (C) 1999-2002, Medical Research Council, Laboratory of Molecular Biology.
The original version of these methods was described in James K Bonfield, Cristina Rada and Rodger Staden, "Automated detection of point mutations using fluorescent sequence trace subtraction", Nucleic Acids Res. 26, 3404-3409, 1998.. The more recent work has been done by Mark Jordan and James Bonfield with advice from Graham Taylor, Andrew Wallace, Will Wang and others.
Our methods for detecting mutations are based on the alignment and comparison of the fluorescent traces produced by Sanger DNA sequencing. To use clinical terminology, samples from patients are compared to standard reference traces. Patient and reference traces should be produced using the same primers and sequencing chemistry, ideally from both strands of the DNA. The data shown in the examples below is from exon 11 of the BRCA1 gene.
The basic idea is illustrated in the following two figures which are screen dumps from our program gap4(see section Gap4 introduction). The first shows a sample containing a point mutation and the second contains a heterozygous base position. The displays are bisected vertically: at the top left is the sample trace from one strand of the DNA, below that the reference trace for that strand, and underneath the difference between these traces which is obtained by subtracting one from the other. On the right is corresponding data from the other DNA strand (shown complemented).
Figure 1. Top and bottom strand differences for a point mutation.
Figure 2. Top and bottom strand differences for a heterozygous base.
As can be seen, although no vertical scaling is performed the difference trace
is quite flat or is consistently either above or below the mid-line, except
at the sites of mutations. Near these are strong peaks, but notice that only
for the mutated base are there peaks both above and below the mid-line. The
context effects caused by the mutation produce peaks only in one direction.
It is perhaps necessary to point out that analysis of the traces is essential
because base callers make mistakes: they can assign the wrong base types and
also assign single bases where the DNA is heterozygous. An example of the latter
can be observed in Figure 2: on one strand the base caller has assigned
a "-" symbol at position 251, at least indicating uncertainty, but on the
other strand it has assigned "T". The DNA is clearly heterozygous at this
position. This means that simply looking for differences between patient
sequences and reference sequences will cause point mutations and heterozygous
bases to be missed (of course base calling errors will also create
false differences).
These trace displays alone are very useful for visual inspection of data
and are all
some users want. However we also have programs which automatically analyse
the trace differences and tag the bases which have significant peaks as possible
sites of mutation.
Trace viewing is initiated from within the gap4 editor(see section Editing in gap4).
Each record in the editor shows an individual reading with its number and name
at the left. Negative numbers denote readings which have been complemented.
Several sequences have special status. At the top is a sequence labelled with
a letter S at the left edge. This is the reference sequence, here the EMBL
entry HSLBRCA1 which covers the entirety of the BRCA1 gene. The numbering
at the top of the display corresponds to positions in this reference sequence.
The program has also coloured (green) all exons on the reference sequence.
The bottom DNA sequence in the editor is labelled "CONSENSUS". For mutation
detection work this sequence is forced to be identical to the reference.
Below the CONSENSUS sequence is the amino acid sequence for the reference.
This is calculated on the fly using the feature table of the reference
sequence and so translates only exons and in their correct reading frames.
Two other sequences (near the top) are labelled R and F. These are the readings
providing the reverse
and forward reference traces for this segment of the data.
Figure 3. A set of aligned sequence readings displayed in the gap4 editor.
At the very bottom of the editor is an information line which is used to
display data about items touched by the mouse cursor. Here it is showing
data about one of the positions tagged as possibly being heterozygous.
It includes the
observed base types (G and A) and the scores achieved by the automated analysis.
The editor can be set to show only differences between readings and the
reference; all matching bases appear as dots. For example, Figure 4.
shows the same data as Figure 3, but with the editor set to show differences,
and the information line showing details about a possible mutation.
Figure 4. An alternative view of aligned sequence readings in the gap4 editor.
One column contains several bases tagged in red, signifying possible
heterozygotes, and some in orange denoting possible point mutations.
During visual inspection the program can be made to move the cursor from
one tag to the next and to display the aligned traces as shown
above in Figures 1 and 2.
It is also possible to have positive controls for displaying the trace
differences; i.e. reference traces which contain the mutation. In this case the traces
appear as shown in figure 5. Here the forward and reverse positive controls
are shown to the right of the normal plots. In Figure 5 the positive control
difference plots are quite flat hence, in this case, providing confirmation
of the presence of the heterozygous base.
Figure 5. Top and bottom strand differences and positive control for a heterozygous base.
As mentioned above the package contains programs which can automatically
compare the traces and their reference sequences. The output from these
programs are the tags shown in the editor. Users can check the traces at
these positions using the displays shown in Figures 1, 2 and 5; if necessary
removing or adding tags. Alternatively users can rely entirely on visual
inspection and create all tags themselves.
Once all the mutations are correctly tagged the program can produce a report
which includes the reading names, mutation positions relative to the reference
sequence, the actual change, its effect, and the evidence. An example is shown
below in Figure 6.
Figure 6. How gap4 reports mutations.
Here the first record is for reading 001321_11aF, position 33885, T changed
to T and C (i.e. is heterozygous) to produce no amino acid change, with evidence coming only from
the complementary strand. The last record is for reading 000256_11eF, position
36749, A changed to G, producing an amino acid change K to R, with evidence
from both strands of the sequence. The penultimate record denotes a
heterozygote in a noncoding region.
(Click for full size image)
(Click for full size image)
(Click for full size image)
(Click for full size image)
(Click for full size image)
001321_11aF 33885T>Y (silent F) (strand - only)
001321_11aF 34407G>K (expressed E>[ED]) (strand - only)
001321_11cF 35512T>Y (silent L) (double stranded)
001321_11cF 35813C>Y (expressed P>[PL]) (double stranded)
001321_11dF 36314A>R (expressed E>[EG]) (double stranded)
001321_11eF 36749A>R (expressed K>[KR]) (double stranded)
001321_11eF 37313T>K (noncoding) (strand - only)
000256_11eF 36749A>G (expressed K>R) (double stranded)
This page is maintained by
staden-package.
Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/mutations_unix_1.html