When a sequence is read into the program its composition is displayed in the Output Window to provide a simple check that the data has been read correctly. The values can also be requested from the "Statistics" menu, when a dialogue will allow subsections of the sequence to be analysed. The results are displayed as shown below.
============================================================ Wed 12 Nov 17:10:25 1997: sequence composition ------------------------------------------------------------ A 1966 (24.17%) C 1996 (24.54%) G 2185 (26.86%) T 1987 (24.43%) - 0 (0.00%) Or for protein sequences: ============================================================ Mon 14 Oct 17:11:04 2002: sequence composition ------------------------------------------------------------ Sequence MYSA_DROME: 1 to 2411 Protein AA A B C D E F G H I K L M N N 201 0 30 150 281 74 127 45 126 233 243 43 121 % 8.3 0.0 1.2 6.2 11.7 3.1 5.3 1.9 5.2 9.7 10.1 1.8 5.0 M 14287 0 3094 17263 36281 10891 7246 6171 14258 29865 27498 5642 13807 AA P Q R S T V W Y Z X * - N 55 167 141 96 93 108 14 63 0 0 0 0 % 2.3 6.9 5.8 4.0 3.9 4.5 0.6 2.6 0.0 0.0 0.0 0.0 M 5341 21398 22022 8360 9403 10706 2607 10280 0 0 0 0 M 5341 21398 22022 8360 9403 10706 2607 10280 0 0 0 0
This routine simply counts dinucleotide frequencies for the selected region of the sequence. It also calculates an expected distribution based on the base composition. The output looks like:
A C G T Obs Expected Obs Expected Obs Expected Obs Expected A 7.91 5.84 5.64 5.93 5.05 6.49 5.57 5.91 C 5.91 5.93 5.14 6.02 7.38 6.59 6.10 5.99 G 6.11 6.49 7.56 6.59 6.30 7.22 6.90 6.56 T 4.24 5.91 6.18 5.99 8.14 6.56 5.86 5.97