Gene finding

Many years ago Staden R. (1984) Graphic methods to determine the function of nucleic acid sequences. Nucl. Acids Res. 12, 521-538 we separated methods for searching for genes and their control regions into two classes: "gene search by signal", and "gene search by content". Staden R. (1985) Computer methods to locate genes and signals in nucleic acid sequences, Genetic Engineering: Principles and Methods Vol. 7, Edited by J. K. Setlow and A. Hollaender, Plenum Publishing Corp.. Signal searches look for short segments of sequences such as promoters, ribosome binding sites, splice junctions, etc, whereas content searches look for the sequence patterns that are characteristic of protein coding regions, or RNA genes. Protein coding sequences produce particular amino acid sequences, often using preferred codons, and this leaves patterns in the sequence that can be used to distinguish them from non-protein-coding DNA. tRNA genes must produce stable cloverleaf structures and "standard" tRNAs must contain particular (conserved) bases at locations within the cloverleaf. These features can be used to locate tRNA genes, and probably other RNA genes could be sought in a similar way.

The methods described in the following sections are either "content" or "signal" searches and spin's graphical presentation of results can be used to see if together they produce a consistent gene prediction.

