first previous next last contents

Database I/O in C

Introduction and Overview

[General notes to go somewhere: It is better to check success return codes rather than failure ones as the failure ones are often variable (-1, 1, >0, etc) but most return 0 for success.]

The Gap4 I/O access from within C consists of several layers. These layers provide ways of breaking down the tasks into discrete methods, and of hiding most of the implementation details. For the programmer willing to extend Gap4, only the higher layer levels are of interest. Hence the lowest levels are described only briefly.

"g" Level - Raw Database Access

At the final end of any I/O is the actual code to read and write information to the disk. In Gap4 this is handled through a library named "g". This contains code for reading, writing, locking and updating of the physical database. It does not describe the structures contained in the gap database format itself, but rather provides functions to read and write arbitrary blocks of data. Don't delve into this unless you're feeling brave!

The code for this library is contained within the `src/g' directory. No documentation is currently available on these functions.

"Communication" Level - Interfaces to the "g" Level

This level of code deals with describing the real Gap4 data structures and the interfacing with the g library. Generally this code should not be used.

This code is contained within the `src/gap4' directory and breaks down as follows:

Interface functions with the g library. These are to provide support for a local (ie compiled in) or remote (unimplemented) database server.
Contains GAP_READ and GAP_WRITE functions in byte swap and non byte swap forms (depending on the system arch.). The gap_io_init() function automatically determines the machine endian and sets up function pointers to call the correct functions.
Definitions of GAP_ERROR and GAP_ERROR_FATAL functions.
Functions for creation, initialisation, and copying of database files.
VERY USEFUL! The definitions of the gap structures that are stored in the database.
Initialises communication with the "g" database server by use of gap_init(), gap_open_server() and gap_shutdown_server() functions.

No documentation is currently available on these functions.

Basic Gap4 I/O

This level contains the basic functions for reading, writing, creation and deletion of the Gap4 structures, such as readings and templates as well as higher level functions built on top of these. It is this level of code that should generally be used by the programmer. The implementation of this level has function code and prototypes spread over a variety of files, but the programmer should only #include the `IO.h' file.

The primary functions are:

Opening/creation, closing and deletion of databases.
GT_Read, GT_Write, GT_Write_cached
TextRead, TextAllocRead, TextWrite
DataRead, DataWrite
ArrayRead, ArrayWrite
BitmapRead, BitmapWrite
The basic IO calls. Note that the GT ones are for handling structures (eg GReadings) and the others for data of the associated type.
Some functions for initialising new data structures. These in turn call the allocate() function to create new database records.
Reads and writes sequence information.
Fetches the trace type and name values for a reading.
Reading and writing of annotations (also known as tags).
Allocation and deallocation of records.
Flushes changes back to disk. The various write commands write the data to disk, but until a flush occurs they will not be committed as the up to date copies.
Converts between C GapIO pointer and an integer value which can be passed around in Tcl and Fortran. The integer handle is used in the Tcl scripting language.
get_gel_num, lget_gel_num
get_contig_num, lget_contig_num
Converts single or lists of reading identifiers into reading or contig numbers (with start and end ranges).
Converts a list of reading identifiers to contig numbers.
Converts a structure number into its textual name.
Finds the left most reading number in a contig from a given reading number.
Converts from a reading number into a contig number.

Other I/O Functions

Still more I/O functions exist that aren't listed under the "Basic Gap4 I/O" header. The reason for this is primarily due to code structure rather than any particular grouping based on functionality. Specifically, these functions cannot be easily linked into "external" applications without a considerable amount of effort.

The file break down is as follows.

Complements, in memory, a sequence and associated structures.
Modifies in memory sequence details.
Modifies a single base in a sequence on the disk.
Inserts pads to the consensus sequence and all the readings at that point.
Removes a contig structure.
Fetches miscellaneous information for reads (primers, insert size, etc), vectors and clones.
Returns the right cutoff of a reading. Found by checking the cut points and any vector tags.
Modifies the cutoffs of readings.
Updates a reading name in memory and disk.

Compiling and Linking with Other Programs

If you require usage of the Gap4 I/O functions in a program other than Gap4 itself you will need to compile and link in particular ways to use the function prototypes and to add the Gap4 functions to your binary. At present, the object files required for database access do not comprise a library.

The compiler include search path needs adjusting to add the `$STADENROOT/src/gap4' directory and possibly the `$STADENROOT/src/g' directory. Once your own object files are compiled, they need to be linked with the following gap4 object files.


Finally, a library search path of `$STADENROOT/lib/$MACHINE-binaries' should be used to link the -lg -ltext_utils -lmisc libraries.

All of the above definitions have been added to a single Makefile held in `$STADENROOT/src/mk/' as the GAPDB_EXT_INC, GAPDB_EXT_OBJS and GAPDB_EXT_LIBS variables. When possible, these should be used in preference to hard coding the variable object filenames as this provides protection against future coding changes. So for example, if we have a program held in the file `demo.c' we could have a simple Makefile as follows.

include $(SRCROOT)/mk/
include $(SRCROOT)/mk/$(MACHINE).mk

OBJS = $(O)/demo.o


$(O)/demo: $(OBJS)
        $(CLD) -o $ $(OBJS) $(LIBS) $(LIBSC)

If we now extend this program so that it requires the Gap4 I/O routines, the Makefile should be modified to:

include $(SRCROOT)/mk/
include $(SRCROOT)/mk/$(MACHINE).mk
include $(SRCROOT)/mk/


OBJS = $(O)/demo.o $(GAPDB_EXT_OBJS)


$(O)/demo: $(OBJS)
        $(CLD) -o $ $(OBJS) $(LIBS) $(LIBSC)

If you require an example of a program that utilises the Gap4 I/O functions, see the convert program in `$STADENROOT/src/convert/'.

first previous next last contents
This page is maintained by staden-package. Last generated on 1 March 2001.