Index of /pub/blast-1.3/dfa

      Name                    Last modified      Size  Description
Parent Directory - CRN 1992-02-20 08:58 1.2K Makefile 1995-01-27 11:08 1.3K README.orig 1993-06-15 08:02 3.6K accepts.c 1992-08-19 11:51 3.0K check.c 1992-08-19 11:56 1.9K contains.c 1993-04-06 17:44 3.4K dfa.3 1994-01-18 20:17 21K dfa.3ps 1994-01-18 20:17 55K dfa.c 1993-02-08 15:07 32K dfa.h 1994-01-18 20:39 9.8K dfa.pdf 2005-05-12 08:55 30K dfatest.c 1992-07-26 17:20 4.0K dump.c 1992-07-26 17:30 4.3K errstr.c 1994-01-18 20:39 2.6K extent.c 1992-07-26 17:31 2.0K fscanf.c 1992-07-26 17:31 1.8K libinp.msc 1992-02-18 10:28 154 makefile.msc 1992-02-18 10:28 1.2K makefile.msw 1992-02-18 10:28 1.1K nextstat.c 1992-07-26 17:31 2.1K opstate.c 1992-08-19 11:55 1.5K perror.c 1992-07-26 17:31 1.8K pscanf.c 1992-07-26 17:32 2.6K scanf.c 1992-07-26 17:33 1.8K size.c 1992-07-26 17:33 2.1K sscanf.c 1992-07-26 17:33 2.6K
This DFA (deterministic finite-state automata) function library provides a
general purpose facility for searching for one or more fixed- or
variable-length pattern strings expressed in arbitrary, user-defined
alphabets.  The computational complexity (essentially the speed) of searching
for multiple patterns is the same as searching for a single pattern; and
initial construction of the DFA requires time and storage that are no worse
than proportional to the sum of the lengths of the patterns.

The DFA library was used by early versions of BLAST (Altschul et al., 1990)
for biological sequence similarity searching.  As such, the fixed-length
words used by BLAST do not exercise the full potential of the DFA library.

dfa.tar.Z is a compressed UNIX tar archive of all of the files splayed beneath
the explode subdirectory.  A single UNIX command pipeline can be used to
unpack the archive:  zcat dfa.tar | tar xf -

The DFA source code requires header files from the stripped-down ncbi library
(see /pub/ncbi) and my personal header files from gish.h (see /pub/gish).

=================================

CAVEAT EMPTOR:  the documentation file, dfa.3, a UNIX-style manual page,
has not been closely examined for consistency with the source code.

The DFA function library distribution includes the following files:

    README
    Makefile - makefile for SunOS and other UNIXes
    dfa.3 - manual page (nroff -man macros) for section 3 of the manual
    dfa.h - generally used header file
    dfa.c - core functions (dfa_new,dfa_begin,dfa_add,dfa_close)

Additional source files:

    accepts.c
    contains.c
    destruct.c
    dump.c
    check.c
    nextstat.c
    errstr.c
    extent.c
    fscanf.c
    perror.c
    pscanf.c
    scanf.c
    size.c
    sscanf.c

    dfatest.c - a small test program (not even close to being comprehensive!)

When compiling, the C preprocessor variable DFA_THENEED4SPEED can be defined
to omit some error checking for a modest improvement in execution speed.

Warren Gish
September 15, 1991

Modification History
4/6/93
Added a few functions to dfa.c, e.g. dfa_addx().

2/18/92
Several changes made, primarily to accommodate the NCBI standard alphabets
for representation of bio-sequences.

o dfa_open() no longer exists for creating a new dfa structure.  It is replaced
by the function dfa_new(), along with a few ancillary functions.  dfa_new()
has only two arguments, the minimum and maximum letter values (both signed
integers) in the working alphabet.

o the library no longer pre-defines any alphabets.  The specification
of an alphabet is left entirely to the developer.

o when dfa_open() was discarded, so were its special-case, obscurely
defined flags.  A saner, more flexible arrangement now exists in the form
of 2 new functions, dfa_ignore() and dfa_halton(), which are called upon
to indicate letters in the input stream that either should be ignored
or should halt a search.  For instance, when searching a plain text
file being read with getchar(), hyphens might be ignored and EOF (-1)
should halt the search.

o elements in the typedef for the DFA structure have been "opened up"
to the developer in the header file dfa.h, so that they can be directly
referenced without the need to interrogate values via calls to functions
in the dfa library.  As a result, several of the dfa functions could be
and have been discarded, e.g. dfa_statesize() and dfa_nstates() are
unnecessary because references of the form dp->statesize and dp->nstates,
where dp is a DFAPtr, can be made instead.  The private header file dfa_priv.h
has been eliminated.

o source code and documentation are each about 33% smaller.

6/10/92
Corrected the name of dfa_restore() (formerly called dfa_restart)
and corrected some but probably not all inconsistencies in the documentation
files dfa.3 and dfa.3ps.

6/15/92
Modified the DFA_Accept structure in preparation for implementing
variable-length pattern identifiers.

7/7/92
Corrected some of the remaining inaccuracies/errors in the dfa.3 manual page.

7/20/92
Changed the "pattern" argument to dfa_contains() and dfa_accepts()
from CharPtr to unsigned char *.  Fixed bug in dfa_contains().

7/26/92
A few changes in preparation for migrating to the Mac and MS-Windows.