first previous next last contents

Database I/O in C

Introduction and Overview

[General notes to go somewhere: It is better to check success return codes rather than failure ones as the failure ones are often variable (-1, 1, >0, etc) but most return 0 for success.]

The Gap4 I/O access from within C consists of several layers. These layers provide ways of breaking down the tasks into discrete methods, and of hiding most of the implementation details. For the programmer willing to extend Gap4, only the higher layer levels are of interest. Hence the lowest levels are described only briefly.

"g" Level - Raw Database Access

At the final end of any I/O is the actual code to read and write information to the disk. In Gap4 this is handled through a library named "g". This contains code for reading, writing, locking and updating of the physical database. It does not describe the structures contained in the gap database format itself, but rather provides functions to read and write arbitrary blocks of data. Don't delve into this unless you're feeling brave!

The code for this library is contained within the `src/g' directory. No documentation is currently available on these functions.

"Communication" Level - Interfaces to the "g" Level

This level of code deals with describing the real Gap4 data structures and the interfacing with the g library. Generally this code should not be used.

This code is contained within the `src/gap4' directory and breaks down as follows:

`gap-if.c'
`gap-local.c'
`gap-remote.c'
Interface functions with the g library. These are to provide support for a local (ie compiled in) or remote (unimplemented) database server.
`gap-io.c'
Contains GAP_READ and GAP_WRITE functions in byte swap and non byte swap forms (depending on the system arch.). The gap_io_init() function automatically determines the machine endian and sets up function pointers to call the correct functions.
`gap-error.c'
Definitions of GAP_ERROR and GAP_ERROR_FATAL functions.
`gap-dbstruct.c'
`gap-create.c'
Functions for creation, initialisation, and copying of database files.
`gap-dbstruct.h'
VERY USEFUL! The definitions of the gap structures that are stored in the database.
`gap-init.c'
Initialises communication with the "g" database server by use of gap_init(), gap_open_server() and gap_shutdown_server() functions.

No documentation is currently available on these functions.

Basic Gap4 I/O

This level contains the basic functions for reading, writing, creation and deletion of the Gap4 structures, such as readings and templates as well as higher level functions built on top of these. It is this level of code that should generally be used by the programmer. The implementation of this level has function code and prototypes spread over a variety of files, but the programmer should only #include the `IO.h' file.

The primary functions are:

`IO.c'
open_db
close_db
del_db
Opening/creation, closing and deletion of databases.
GT_Read, GT_Write, GT_Write_cached
TextRead, TextAllocRead, TextWrite
DataRead, DataWrite
ArrayRead, ArrayWrite
BitmapRead, BitmapWrite
The basic IO calls. Note that the GT ones are for handling structures (eg GReadings) and the others for data of the associated type.
io_init_contig
io_init_annotations
io_init_reading
Some functions for initialising new data structures. These in turn call the allocate() function to create new database records.
io_read_seq
io_write_seq
Reads and writes sequence information.
io_read_rd
Fetches the trace type and name values for a reading.
io_read_annotation
io_write_annotation
Reading and writing of annotations (also known as tags).
allocate
deallocate
io_deallocate_reading
Allocation and deallocation of records.
flush2t
Flushes changes back to disk. The various write commands write the data to disk, but until a flush occurs they will not be committed as the up to date copies.
`io_handle.c'
io_handle
handle_io
Converts between C GapIO pointer and an integer value which can be passed around in Tcl and Fortran. The integer handle is used in the Tcl scripting language.
`io_utils.[ch]'
get_gel_num, lget_gel_num
get_contig_num, lget_contig_num
Converts single or lists of reading identifiers into reading or contig numbers (with start and end ranges).
to_contigs_only
Converts a list of reading identifiers to contig numbers.
get_read_name
get_contig_name
get_vector_name
get_template_name
get_clone_name
Converts a structure number into its textual name.
chain_left
Finds the left most reading number in a contig from a given reading number.
rnumtocnum
Converts from a reading number into a contig number.

Other I/O Functions

Still more I/O functions exist that aren't listed under the "Basic Gap4 I/O" header. The reason for this is primarily due to code structure rather than any particular grouping based on functionality. Specifically, these functions cannot be easily linked into "external" applications without a considerable amount of effort.

The file break down is as follows.

`IO2.c'
io_complement_seq
Complements, in memory, a sequence and associated structures.
io_insert_seq
io_delete_seq
io_replace_seq
Modifies in memory sequence details.
io_insert_base
io_modify_base
io_delete_base
Modifies a single base in a sequence on the disk.
pad_consensus
Inserts pads to the consensus sequence and all the readings at that point.
io_delete_contig
Removes a contig structure.
`IO3.c'
get_read_info
get_vector_info
get_clone_info
Fetches miscellaneous information for reads (primers, insert size, etc), vectors and clones.
io_get_extension
Returns the right cutoff of a reading. Found by checking the cut points and any vector tags.
io_mod_extension
Modifies the cutoffs of readings.
write_rname
Updates a reading name in memory and disk.

Compiling and Linking with Other Programs

If you require usage of the Gap4 I/O functions in a program other than Gap4 itself you will need to compile and link in particular ways to use the function prototypes and to add the Gap4 functions to your binary. At present, the object files required for database access do not comprise a library.

The compiler include search path needs adjusting to add the `$STADENROOT/src/gap4' directory and possibly the `$STADENROOT/src/g' directory. Once your own object files are compiled, they need to be linked with the following gap4 object files.

$STADENROOT/src/gap4/$MACHINE-binaries/actf.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-create.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-dbstruct.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-error.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-if.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-init.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-io.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-local.o
$STADENROOT/src/gap4/$MACHINE-binaries/gap-remote.o
$STADENROOT/src/gap4/$MACHINE-binaries/IO.o
$STADENROOT/src/gap4/$MACHINE-binaries/io_handle.o
$STADENROOT/src/gap4/$MACHINE-binaries/io-reg.o
$STADENROOT/src/gap4/$MACHINE-binaries/io_utils.o
$STADENROOT/src/gap4/$MACHINE-binaries/text-io-reg.o

Finally, a library search path of `$STADENROOT/lib/$MACHINE-binaries' should be used to link the -lg -ltext_utils -lmisc libraries.

All of the above definitions have been added to a single Makefile held in `$STADENROOT/src/mk/gap4_defs.mk' as the GAPDB_EXT_INC, GAPDB_EXT_OBJS and GAPDB_EXT_LIBS variables. When possible, these should be used in preference to hard coding the variable object filenames as this provides protection against future coding changes. So for example, if we have a program held in the file `demo.c' we could have a simple Makefile as follows.

SRCROOT=$(STADENROOT)/src
include $(SRCROOT)/mk/global.mk
include $(SRCROOT)/mk/$(MACHINE).mk

OBJS = $(O)/demo.o

LIBS = $(MISC_LIB)

$(O)/demo: $(OBJS)
        $(CLD) -o $ $(OBJS) $(LIBS) $(LIBSC)

If we now extend this program so that it requires the Gap4 I/O routines, the Makefile should be modified to:

SRCROOT=$(STADENROOT)/src
include $(SRCROOT)/mk/global.mk
include $(SRCROOT)/mk/$(MACHINE).mk
include $(SRCROOT)/mk/gap4_defs.mk

INCLUDES_E += $(GAPDB_EXT_INC)

OBJS = $(O)/demo.o $(GAPDB_EXT_OBJS)

LIBS = $(MISC_LIB) $(GAPDB_EXT_LIBS)

$(O)/demo: $(OBJS)
        $(CLD) -o $ $(OBJS) $(LIBS) $(LIBSC)

If you require an example of a program that utilises the Gap4 I/O functions, see the convert program in `$STADENROOT/src/convert/'.


first previous next last contents
This page is maintained by staden-package. Last generated on 1 March 2001.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/scripting_111.html