first previous next last contents

Sample Points.

The trace information is stored at byte offset Header.samples_offset from the start of the file. For each sample point there are values for each of the four bases. Header.sample_size holds the precision of the sample values. The precision must be one of "1" (unsigned byte) and "2" (unsigned short). The sample points need not be normalised to any particular value, though it is assumed that they represent positive values. This is, they are of unsigned type.

With the introduction of scf version 3.00, in an attempt to produce efficiently compressed files, the sample points are stored in A,C,G,T order; i.e. all the values for base A, followed by all those for C, etc. In addition they are stored, not as their original magnitudes, but in terms of the differences between successive values. The C language code used to transform the values for precision 2 samples is shown below.

void delta_samples2 ( uint_2 samples[], int num_samples, int job) {
 
    /* If job == DELTA_IT:
     *  change a series of sample points to a series of delta delta values:
     *  ie change them in two steps:
     *  first: delta = current_value - previous_value
     *  then: delta_delta = delta - previous_delta
     * else
     *  do the reverse
     */
 
    int i;
    uint_2 p_delta, p_sample;
 
    if ( DELTA_IT == job ) {
        p_delta  = 0;
        for (i=0;i<num_samples;i++) {
            p_sample = samples[i];
            samples[i] = samples[i] - p_delta;
            p_delta  = p_sample;
        }
        p_delta  = 0;
        for (i=0;i<num_samples;i++) {
            p_sample = samples[i];
            samples[i] = samples[i] - p_delta;
            p_delta  = p_sample;
        }
    }
    else {
        p_sample = 0;
        for (i=0;i<num_samples;i++) {
            samples[i] = samples[i] + p_sample;
            p_sample = samples[i];
        }
        p_sample = 0;
        for (i=0;i<num_samples;i++) {
            samples[i] = samples[i] + p_sample;
            p_sample = samples[i];
        }
    }
}

The io library data structure is as follows:

/*
 * Type definition for the Sample data
 */
typedef struct {
        uint_1 sample_A;           /* Sample for A trace */
        uint_1 sample_C;           /* Sample for C trace */
        uint_1 sample_G;           /* Sample for G trace */
        uint_1 sample_T;           /* Sample for T trace */
} Samples1;

typedef struct {
        uint_2 sample_A;           /* Sample for A trace */
        uint_2 sample_C;           /* Sample for C trace */
        uint_2 sample_G;           /* Sample for G trace */
        uint_2 sample_T;           /* Sample for T trace */
} Samples2;

first previous next last contents
This page is maintained by staden-package. Last generated on 22 October 2002.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/formats_unix_4.html