UNIX-compatible source code for the Basic Local Alignment Search Tool (BLAST) family of sequence database search programs, along with some support utilities, is posted here. See the HISTORY file for brief descriptions of changes that have been made. Some of the earliest source distributions are archived beneath the "/pub/blast/archive" directory. Three separate support libraries are necessary to compile and link the BLAST programs: "ncbi", "gish", and "dfa". Source code for them is posted on this server in the same directory as the "blast" distribution, in files named ncbi.tar.Z, gish.tar.Z, and dfa.tar.Z, respectively. Exploded views of the contents of these tar files are present beneath the /pub/ncbi, /pub/gish, and /pub/dfa directories. The file blast.tar.Z is an L-Z compressed UNIX tar archive containing all of the files splayed beneath the "explode" subdirectory. FTP this file to your local computer in binary mode, uncompress it, then untar it. Do the same for the three support libraries mentioned above. (VMS-compatible versions of the compress and tar utilities are posted on this server in the /toolbox/vms_util directory. Note, however, that the BLAST software is only UNIX-compatible). A single UNIX command pipeline can be executed to unpack the archive: zcat blast.tar | tar xf - Building of the BLAST software should begin with the "ncbi" library, then the "gish" library, the "dfa" library and, finally, the "blast" software itself. For more precise installation instructions, see the README or INSTALL files included with each library. Some pre-built versions of the BLAST programs are available beneath the /pub/blast/executables directory, however, it is not possible to keep these programs up to date for all platforms. Please send bug reports or requests for electronic mail distribution to: Dr. Warren Gish, gish *AT* ncbi.nlm.nih.gov or Dr. Stephen Altschul, altschul *AT* ncbi.nlm.nih.gov National Center for Biotechnology Information National Library of Medicine Bldg. 38A Rm 8N-806 8600 Rockville Pike Bethesda, MD 20894-0001 (301) 496-2475 The people who played a role in bringing this fine software to you: Samuel Karlin, Dept. of Mathematics, Stanford Univ., Stanford, CA 94305 Stephen Altschul, NCBI, NLM, Bethesda, MD 20894 Webb Miller, Dept. of CS, Penn. State Univ., University Park, PA 16802 Gene Myers, Dept. of CS, Univ. of Arizona, Tuscon, AZ 85721 Warren Gish, NCBI, NLM, Bethesda, MD 20894 David Lipman, NCBI, NLM, Bethesda, MD 20894 Some of the substitution matrix files beneath the "matrix" directory were adapted from: Dayhoff, M. O., R. M. Schwartz and B. C. Orcutt. in Atlas of protein sequence and structure. Vol. 5, Suppl. 3, Ed. M. O. Dayhoff (1978). Gonnet, G. H., M. A. Cohen and S. A. Benner. Exhaustive matching of the entire protein sequence database. Science 256:1443-1445 (1992). Henikoff, S. and J. G. Henikoff. Amino acid substitution matrices from protein blocks. PNAS 89:10915-10919 (1992). ** This is not the official distribution point for these substitution matrices. The authors may have revised or different versions than provided here. For instance, the BLOSUM matrices are currently posted by the Henikoffs in the NCBI Data Repository on ncbi.nlm.nih.gov beneath the /repository/blocks/blosum directory; and the original matrix published by Gonnet et al. actually contains decimal fractions which have been rounded to nearest integers in the matrix provided here. ** Brief descriptions of the blast programs and utilities: blastp: compare an amino acid query sequence against a protein sequence database. blastn: compare a nucleotide query sequence against a nucleotide sequence database. blastx: compare a nucleotide query sequence translated in all 6 reading frames (3 on each strand) against a protein sequence database. tblastn: compare an amino acid query sequence against a nucleotide sequence database translated in all 6 reading frames. blast3: compare an amino acid query sequence against a protein sequence database to identify statistically significant 3-way sequence alignments (the query sequence plus two database sequences) in which the component pairwise alignments are statistically insignificant. setdb: produce a protein sequence database for use by blastp, blastx, and blast3 from a multi-sequence file in FASTA format. pressdb: produce a nucleotide sequence database for use by blastn and tblastn from a multi-sequence file in FASTA format. pam: generate a PAM matrix of any desired distance (from 2 to 511) and scale. pir2fasta: produce a file in FASTA format from one in NBRF PIR(R) format. gb2fasta: produce a file in FASTA format from one in GenBank(R) flat file format. sp2fasta: produce a file in FASTA format from one in SWISS-PROT(R) or EMBL flat file format. gt2fasta: produce a file in FASTA format from the CDS (coding sequence) features in a GenBank(R) flat file. memfile: manage the loading, updating, and dropping of files mapped into shared memory segments. Other files: blast.1: UNIX style manual page (using nroff's -man macros) describing blastp, blastn, blastx, and tblastn. blast.1ps: PostScript(R) version of blast.1 blast3.1: UNIX style manual page describing blast3. blast3.1ps: PostScript version of blast3.1 pir2fasta.nawk: a new awk script for converting NBRF PIR files into FASTA gb2fasta.nawk: a new awk script for converting GenBank files into FASTA dxch.aa: a sample protein sequence from the PIR in FASTA format Modification history: this information has been moved into the file named HISTORY.