UNIX-compatible source code for the Basic Local Alignment Search Tool (BLAST)
family of sequence database search programs, along with some support utilities,
is posted here.  See the HISTORY file for brief descriptions of changes that
have been made.  Some of the earliest source distributions are archived beneath
the "/pub/blast/archive" directory.

Three separate support libraries are necessary to compile and link the BLAST
programs:  "ncbi", "gish", and "dfa".  Source code for them is posted on this
server in the same directory as the "blast" distribution, in files named
ncbi.tar.Z, gish.tar.Z, and dfa.tar.Z, respectively.  Exploded views of the
contents of these tar files are present beneath the /pub/ncbi, /pub/gish, and
/pub/dfa directories.

The file blast.tar.Z is an L-Z compressed UNIX tar archive containing all of
the files splayed beneath the "explode" subdirectory.  FTP this file to your
local computer in binary mode, uncompress it, then untar it.  Do the same for
the three support libraries mentioned above.  (VMS-compatible versions of the
compress and tar utilities are posted on this server in the /toolbox/vms_util
directory.  Note, however, that the BLAST software is only UNIX-compatible).
A single UNIX command pipeline can be executed to unpack the archive:
 zcat blast.tar | tar xf -

Building of the BLAST software should begin with the "ncbi" library, then the
"gish" library, the "dfa" library and, finally, the "blast" software itself.
For more precise installation instructions, see the README or INSTALL files
included with each library.  Some pre-built versions of the BLAST programs are
available beneath the /pub/blast/executables directory, however, it is not
possible to keep these programs up to date for all platforms.

Please send bug reports or requests for electronic mail distribution to:

  Dr. Warren Gish, gish *AT* ncbi.nlm.nih.gov
        or
  Dr. Stephen Altschul, altschul *AT* ncbi.nlm.nih.gov

  National Center for Biotechnology Information
  National Library of Medicine
  Bldg. 38A Rm 8N-806
  8600 Rockville Pike
  Bethesda, MD 20894-0001
  (301) 496-2475


The people who played a role in bringing this fine software to you:

  Samuel Karlin, Dept. of Mathematics, Stanford Univ., Stanford, CA 94305
  Stephen Altschul, NCBI, NLM, Bethesda, MD 20894
  Webb Miller, Dept. of CS, Penn. State Univ., University Park, PA 16802
  Gene Myers, Dept. of CS, Univ. of Arizona, Tuscon, AZ 85721
  Warren Gish, NCBI, NLM, Bethesda, MD 20894
  David Lipman, NCBI, NLM, Bethesda, MD 20894


Some of the substitution matrix files beneath the "matrix" directory were
adapted from:

  Dayhoff, M. O., R. M. Schwartz and B. C. Orcutt.  in Atlas of protein
  sequence and structure.  Vol. 5, Suppl. 3, Ed. M. O. Dayhoff (1978).

  Gonnet, G. H., M. A. Cohen and S. A. Benner.  Exhaustive matching
  of the entire protein sequence database.  Science 256:1443-1445 (1992).

  Henikoff, S. and J. G. Henikoff.  Amino acid substitution matrices from
  protein blocks.  PNAS 89:10915-10919 (1992).

** This is not the official distribution point for these substitution matrices.
The authors may have revised or different versions than provided here.  For
instance, the BLOSUM matrices are currently posted by the Henikoffs in the NCBI
Data Repository on ncbi.nlm.nih.gov beneath the /repository/blocks/blosum
directory; and the original matrix published by Gonnet et al. actually contains
decimal fractions which have been rounded to nearest integers in the matrix
provided here. **


Brief descriptions of the blast programs and utilities:

blastp:  compare an amino acid query sequence against a protein sequence
database.

blastn:  compare a nucleotide query sequence against a nucleotide sequence
database.

blastx:  compare a nucleotide query sequence translated in all 6 reading
frames (3 on each strand) against a protein sequence database.

tblastn:  compare an amino acid query sequence against a nucleotide sequence
database translated in all 6 reading frames.

blast3:  compare an amino acid query sequence against a protein sequence
database to identify statistically significant 3-way sequence alignments
(the query sequence plus two database sequences) in which the component
pairwise alignments are statistically insignificant.

setdb:  produce a protein sequence database for use by blastp, blastx,
and blast3 from a multi-sequence file in FASTA format.

pressdb:  produce a nucleotide sequence database for use by blastn and
tblastn from a multi-sequence file in FASTA format.

pam:  generate a PAM matrix of any desired distance (from 2 to 511) and scale.

pir2fasta:  produce a file in FASTA format from one in NBRF PIR(R) format.

gb2fasta:  produce a file in FASTA format from one in GenBank(R) flat file
           format.

sp2fasta:  produce a file in FASTA format from one in SWISS-PROT(R)
           or EMBL flat file format.

gt2fasta:  produce a file in FASTA format from the CDS (coding sequence)
           features in a GenBank(R) flat file.

memfile:  manage the loading, updating, and dropping of files mapped into
shared memory segments.


Other files:

    blast.1:  UNIX style manual page (using nroff's -man macros) describing
    blastp, blastn, blastx, and tblastn.

    blast.1ps:  PostScript(R) version of blast.1

    blast3.1:  UNIX style manual page describing blast3.

    blast3.1ps:  PostScript version of blast3.1

    pir2fasta.nawk:  a new awk script for converting NBRF PIR files into FASTA

    gb2fasta.nawk:  a new awk script for converting GenBank files into FASTA

    dxch.aa:  a sample protein sequence from the PIR in FASTA format

Modification history:
    this information has been moved into the file named HISTORY.