This method for finding protein coding regions is a variant of the codon usage method. Here, instead of measuring the closeness to an table of codon frequencies whose main discriminating power is due codon preferences, we look for similarity to the codon usage that would be expected from a protein sequence of average amino acid composition, but with no codon preference. The method is surprisingly effective: When tested against all the E. coli sequences in the EMBL sequence library it correctly identified the coding frame for 91% of window positions. (The E. coli sequences were chosen only for technical reasons: we have no reason to think the method would work less well on other organisms with roughly even base composition.) Staden R. (1990) Finding protein coding regions in genomic sequences. In Doolittle, R,R (ed), Methods in Enzymology, 183, Academic Press, San Diego, CA, 163-180.
The average amino composition used to derive the values in the codon table is that described by McCaldon and Argos McCaldon and Argos (1988), Proteins 4, 99-122.
Above is the result of applying this method to the C. elegans sequence
analysed above with a codon preference table. Note that, as would be expected,
the main difference
is the that the range of observed scores is very much reduced.
(Click for full size image)
This page is maintained by
staden-package.
Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/spin_unix_28.html