Index of /pub/blast-1.3/dfa

      Name                    Last modified      Size  Description
Parent Directory - CRN 20-Feb-1992 08:58 1.2K Makefile 27-Jan-1995 11:08 1.3K README.orig 15-Jun-1993 08:02 3.6K accepts.c 19-Aug-1992 11:51 3.0K check.c 19-Aug-1992 11:56 1.9K contains.c 06-Apr-1993 17:44 3.4K dfa.3 18-Jan-1994 20:17 21K dfa.3ps 18-Jan-1994 20:17 55K dfa.c 08-Feb-1993 15:07 32K dfa.h 18-Jan-1994 20:39 9.8K dfa.pdf 12-May-2005 08:55 30K dfatest.c 26-Jul-1992 17:20 4.0K dump.c 26-Jul-1992 17:30 4.3K errstr.c 18-Jan-1994 20:39 2.6K extent.c 26-Jul-1992 17:31 2.0K fscanf.c 26-Jul-1992 17:31 1.8K libinp.msc 18-Feb-1992 10:28 154 makefile.msc 18-Feb-1992 10:28 1.2K makefile.msw 18-Feb-1992 10:28 1.1K nextstat.c 26-Jul-1992 17:31 2.1K opstate.c 19-Aug-1992 11:55 1.5K perror.c 26-Jul-1992 17:31 1.8K pscanf.c 26-Jul-1992 17:32 2.6K scanf.c 26-Jul-1992 17:33 1.8K size.c 26-Jul-1992 17:33 2.1K sscanf.c 26-Jul-1992 17:33 2.6K
This DFA (deterministic finite-state automata) function library provides a
general purpose facility for searching for one or more fixed- or
variable-length pattern strings expressed in arbitrary, user-defined
alphabets.  The computational complexity (essentially the speed) of searching
for multiple patterns is the same as searching for a single pattern; and
initial construction of the DFA requires time and storage that are no worse
than proportional to the sum of the lengths of the patterns.

The DFA library was used by early versions of BLAST (Altschul et al., 1990)
for biological sequence similarity searching.  As such, the fixed-length
words used by BLAST do not exercise the full potential of the DFA library.

dfa.tar.Z is a compressed UNIX tar archive of all of the files splayed beneath
the explode subdirectory.  A single UNIX command pipeline can be used to
unpack the archive:  zcat dfa.tar | tar xf -

The DFA source code requires header files from the stripped-down ncbi library
(see /pub/ncbi) and my personal header files from gish.h (see /pub/gish).


CAVEAT EMPTOR:  the documentation file, dfa.3, a UNIX-style manual page,
has not been closely examined for consistency with the source code.

The DFA function library distribution includes the following files:

    Makefile - makefile for SunOS and other UNIXes
    dfa.3 - manual page (nroff -man macros) for section 3 of the manual
    dfa.h - generally used header file
    dfa.c - core functions (dfa_new,dfa_begin,dfa_add,dfa_close)

Additional source files:


    dfatest.c - a small test program (not even close to being comprehensive!)

When compiling, the C preprocessor variable DFA_THENEED4SPEED can be defined
to omit some error checking for a modest improvement in execution speed.

Warren Gish
September 15, 1991

Modification History
Added a few functions to dfa.c, e.g. dfa_addx().

Several changes made, primarily to accommodate the NCBI standard alphabets
for representation of bio-sequences.

o dfa_open() no longer exists for creating a new dfa structure.  It is replaced
by the function dfa_new(), along with a few ancillary functions.  dfa_new()
has only two arguments, the minimum and maximum letter values (both signed
integers) in the working alphabet.

o the library no longer pre-defines any alphabets.  The specification
of an alphabet is left entirely to the developer.

o when dfa_open() was discarded, so were its special-case, obscurely
defined flags.  A saner, more flexible arrangement now exists in the form
of 2 new functions, dfa_ignore() and dfa_halton(), which are called upon
to indicate letters in the input stream that either should be ignored
or should halt a search.  For instance, when searching a plain text
file being read with getchar(), hyphens might be ignored and EOF (-1)
should halt the search.

o elements in the typedef for the DFA structure have been "opened up"
to the developer in the header file dfa.h, so that they can be directly
referenced without the need to interrogate values via calls to functions
in the dfa library.  As a result, several of the dfa functions could be
and have been discarded, e.g. dfa_statesize() and dfa_nstates() are
unnecessary because references of the form dp->statesize and dp->nstates,
where dp is a DFAPtr, can be made instead.  The private header file dfa_priv.h
has been eliminated.

o source code and documentation are each about 33% smaller.

Corrected the name of dfa_restore() (formerly called dfa_restart)
and corrected some but probably not all inconsistencies in the documentation
files dfa.3 and dfa.3ps.

Modified the DFA_Accept structure in preparation for implementing
variable-length pattern identifiers.

Corrected some of the remaining inaccuracies/errors in the dfa.3 manual page.

Changed the "pattern" argument to dfa_contains() and dfa_accepts()
from CharPtr to unsigned char *.  Fixed bug in dfa_contains().

A few changes in preparation for migrating to the Mac and MS-Windows.