first previous next last contents

Positional base preferences

This method for finding protein coding regions is a variant of the codon usage method. Here, instead of measuring the closeness to an table of codon frequencies whose main discriminating power is due codon preferences, we look for similarity to the codon usage that would be expected from a protein sequence of average amino acid composition, but with no codon preference. The method is surprisingly effective: When tested against all the E. coli sequences in the EMBL sequence library it correctly identified the coding frame for 91% of window positions. (The E. coli sequences were chosen only for technical reasons: we have no reason to think the method would work less well on other organisms with roughly even base composition.) Staden R. (1990) Finding protein coding regions in genomic sequences. In Doolittle, R,R (ed), Methods in Enzymology, 183, Academic Press, San Diego, CA, 163-180.

The average amino composition used to derive the values in the codon table is that described by McCaldon and Argos McCaldon and Argos (1988), Proteins 4, 99-122.

[picture]
(Click for full size image)

Above is the result of applying this method to the C. elegans sequence analysed above with a codon preference table. Note that, as would be expected, the main difference is the that the range of observed scores is very much reduced.


first previous next last contents
This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/spin_unix_28.html