Basic Local Alignment Search Tool
   from Advanced Biocomputing, LLC

Index


Description

AB-BLAST 3.0 is a powerful software package for gene and protein identification, using sensitive, selective and rapid similarity searches of protein and nucleotide sequence databases. The feature list for AB-BLAST is long and continues to expand, while performance is improved. Much of this is outlined below. A complete suite of BLAST search programs (blastp, blastn, blastx, tblastn and tblastx) is provided in the package, along with several database management and support programs that include nrdb, patdb, xdformat, xdget, seg, dust and xnu.

AB-BLAST has been built to be the most trusted database search tool in your software toolbox, doing what you tell it, reporting precisely what it’s doing — even telling you what it could not do because of specific parameter restrictions you might wish to change — and able to handle even your biggest jobs with aplomb. Users of other BLAST implementations have suffered every few years through a series of expensive and time-consuming rewrites, alpha releases, beta releases, format changes, specialized spin-off programs, and bewildering arrays of new program parameters, options, behaviors and interactions. Meanwhile AB-BLAST was built from scratch to offer consistently superior performance and flexibility — combined with painstaking effort by the developer to ensure near-absolute backward compatibility with every AB-BLAST release for over 17 years.

AB-BLAST represents the most rigorous, sensitive implementation of BLAST available, yet it typically runs faster than the rest. AB-BLAST has a simple, easy-to-use command line structure; offers consistent behavior across all search modes; runs on general purpose computer hardware; can uniquely categorize and filter results based on biological criteria; and much more. All of these features help AB-BLAST users be more productive and save money.

AB-BLAST is not a re-hashed version of NCBI-BLAST. AB-BLAST shares virtually no code with NCBI-BLAST except for some portions that both packages copied from the public domain, ungapped BLAST version 1.4 released in 1994 (W. Gish, unpublished). A brief history of AB-BLAST development is available here.


Licensing

Please see https://blast.advbiocomp.com/licensing/ for complete licensing information.


Key Features

Some of the key features of AB-BLAST are described below.

To support XDF databases, the database formatting tool named xdformat is provided with AB-BLAST. Among other distinct capabilities and advantages to using XDF and xdformat are:

A reverse chronological list of changes to the AB-BLAST software is available in the file named HISTORY that comes bundled with the software. When possible, any bugs that have been found have typically been fixed within 24 hours of their being reported.

Please send us bug reports, questions, or suggestions.


Licensing

Full information about licensing of AB-BLAST is provided here.



Manifest

The AB-BLAST 3.0 package includes the following data analysis and utility programs:


AB-BLAST Command Line Options and Parameters

A complete list of command line options and parameters for modifying the behavior of the AB-BLAST search programs is available here.


Comparable AB/NCBI BLAST Parameters

A brief comparison of the some of the most important parameters for controlling sensitivity, selectivity and speed of AB-BLAST and NCBI BLAST is available here.


Environment Variables

AB-BLAST can utilize the settings of a few environment variables to adapt its behavior to different computing environments: BLASTDB, BLASTFILTER and BLASTMAT. To allow for triple AB/WU/NCBI BLAST installations, AB-BLAST also supports the environment variables ABBLASTDB, ABBLASTFILTER and ABBLASTMAT, as well as WUBLASTDB, WUBLASTFILTER and WUBLASTMAT. Settings of the AB versions of these variables take precedence over all others, and WU variable settings take precedence over the corresponding base name variables.

In AB-BLAST, the BLASTDB (or ABBLASTDB) environment variable can be a list of one or more directory names in which the programs are to look for database files. In UNIX parlance, such an environment variable might be called a path for the database files. Directory names should be delimited from one another by a colon (“:”) and listed in the order that they should be searched. If the BLASTDB environment variable is not set, the programs use a default path of .:/usr/ncbi/blast/db, such that the programs first look in the current working directory (“.”) for the requested database and then look in the /usr/ncbi/blast/db directory. For backward compatibility with programs that expect BLASTDB to be a single directory specification and not a path, if the user has set a value for BLASTDB but omitted the current working directory, AB-BLAST will still look for database files in the current working directory as a last resort. This usage is unchanged from NCBI/WU BLAST version 1.4 (1994), except multiple directories could be specified with the BLASTDB variable beginning with WU-BLAST 2.0 ca. 1997.

The BLASTFILTER (or ABBLASTFILTER) environment variable can be set to the directory containing the sequence filter programs, such as seg and xnu. The default directory for the filter programs is /usr/ncbi/blast/filter. This usage is unchanged from NCBI/WU BLAST version 1.4.

The BLASTMAT (or ABBLASTMAT) environment variable can be set to the parent directory for all scoring matrix files. The default directory for these files is /usr/ncbi/blast/matrix, beneath which are expected nt and aa subdirectories for storing scoring matrix files for nucleotide and amino acid alphabets, respectively. This usage is unchanged from NCBI/WU-BLAST version 1.4.

For more information about environment variables, see the Installation instructions.


Filters and Masks

AB-BLAST provides a highly flexible means of applying both “hard” and “soft” masks to a query sequence, of supporting alternative, user-defined filter programs and non-standard parameters to the standard filters. The filter (for hard masking) and wordmask (for soft masking) command line options provide the basic interface. Multiple specifications of each type are acceptable on the BLAST command line. Furthermore, individual filter and wordmask specifications may consist of entire pipelines of commands.

For example, three filters are used in succession by this pipeline:

      filter="myfilter1 | myfilter2 | myfilter3 -x5 -"

The first two filters in this case expect to read their input from UN*X standard input (also known as stdin), whereas myfilter3 apparently needs to be told to read data from stdin, using the usual “-” or hyphen argument. The standard output (stdout) from myfilter1 will be read via stdin by myfilter2, which in turn processes the query before handing its results to myfilter3; finally, myfilter3 reports its results to stdout, which the BLAST program itself reads to obtain the fully masked sequence. The final output from the filter pipeline is expected by the BLAST program to be in FASTA format.

Instead of running all 3 filters in the above example as part of one pipeline, they could instead be specified as three separate filter options like this:

    filter=myfilter1  filter=myfilter2  filter="myfilter3 -x5 -"

The same choice of running as a pipeline or running separately is available for wordmasks, too. Naturally, the two approaches can also be combined on the same command line. An advantage to using the pipeline approach is that all 3 filters in the example above may complete a little bit faster, because much of the I/O overhead is avoided. Furthermore, when used in the pipeline, there is no requirement that the output from myfilter1 and myfilter2 actually be in FASTA format. Those two programs could potentially pass any information between themselves and to myfilter3. The only absolute requirements are that the first filter in the pipeline, myfilter1, must read FASTA data from stdin, and the last filter in the pipeline, myfilter3, must output FASTA data (that is also of the same length as the query!) to stdout.

It should be noted that with some filter programs, passing the query sequence sequentially through a pipeline of filters may yield a different result than processing the query independently with each filter and OR-ing the results. The script seg+xnu included in the filter/ directory provides an example with which to test this. Specifying filter=seg+xnu on the BLAST command line invokes a seg and xnu pipeline that is built-in to the search programs; whereas specifying filter="seg+xnu -" causes the seg+xnu script to be invoked on the query, which independently executes seg and xnu, then logically “ORs” the results with the pmerge utility program. (The echofilter option can be used to see the results of filtering displayed in search program output). The built-in seg+xnu pipeline is historically the way these two filters have been invoked together, but the somewhat slower method employed by the seg+xnu script with pmerge may be more desirable.


Precomputed Statistical Parameters

Nucleotide Scoring Systems

Precomputed values for λ, K and H are available for BLASTN searches with the following match,mismatch (M,N) scoring systems, using the sets of gap penalties {Q,R}:


Precomputed Nucleotide Scoring Systems
MN{Q,R}
+1−3 {3,3}  {3,2}  {3,1}  {7,2} 
+1−2 {2,2}  {2,1}  {1,1} 
+3−5 {10,5}  {6,3}  {5,5} 
+4−5 {10,5} 
+1−1 {3,1}  {2,1} 
+5−4 {20,10}  {10,10} 
+5−11 {22,22}  {22,11}  {12,2}  {11,11} 


Precomputed values are also available for a Purine-Pyrimidine scoring matrix named “pupy”:

PuPy Matrix
QR
2010
1010


Protein Scoring Systems

Precomputed values for λ, K and H are available for protein-level searches (BLASTP, BLASTX, TBLASTN and TBLASTX) with the following scoring matrix and gap penalty combinations (or gap penalty ranges for R) {Q, R}:


BLOSUM50
QR
161–4
151–4, 6, 8
141–5, 8
131–5, 8
122–5, 7
112–4, 6, 8
102–6, 8
93–5, 7
84–8
76, 7


BLOSUM55
QR
161–4
151–4, 6, 8
141–5, 7
132–5, 8
122–5, 8
112–6, 8
103–6, 9
93–5, 7
84–8
77


BLOSUM62
QR
121–3
111–3
101–4
91–5
82–7
72–6
63–5
55


BLOSUM80
QR
122–12
112–11
102–10
93–9
84–8
75–7


PAM40
QR
121, 2, 6
111, 2, 7
101–3, 7
91–3, 6
81–4
71–4
62–5
52–5
43, 4


PAM120
QR
121, 2, 4
111–3
101–3, 5
91–3, 5
81–4, 6
72–4, 6
62–5
53–5


PAM250
QR
161–4
151–5
141–6
131–6
122–7
112–7
103–8
93–7
85–7
77

Bugs

AB-BLAST is certainly not bug free, but historically bugs have been fixed typically within 24 hours of their being reported. The currently known bugs are:

If you think you might be experiencing the effects of a bug, please contact us.

AB-BLAST exhibits a few different behaviors worth mentioning here, because they could trip up or confuse even the most knowledgeable of BLAST users. Any unexpected behavior might rightfully be construed as being a bug, so the following information is provided here in the Bugs section to help avoid the unexpected. If you should encounter problems or confusing areas other than those described below, or if you have questions or suggestions for improvement, please send them to us.


Supported Platforms for Standard & Enterprise Editions

The computing platforms currently supported for AB-BLAST Standard Edition and Enterprise Edition are listed below. (The list of platforms supported for AB-BLAST Personal Edition is much shorter). Software for computing platforms other than those listed here may be available upon request, but additional charges may apply.

The list of supported platforms is subject to change without notice.
Multiple processors (multithreading or parallel processing) are effectively and efficiently supported by AB-BLAST on all of the above platforms. AB-BLAST also supports large files (files greater than 2 GB in size), when the host operating system and file system support large files.

Installation

To install AB-BLAST, the first step is to download the UN*X tar archive of executable files appropriate for your computing platform from the Advanced Biocomputing, LLC website. To locate the software, licensed users will have received a confidential URL via e-mail. Please note that scoring matrix files and documentation, which are not generally platform-specific, are nevertheless included in each package. No databases are included, however.

Unpack the tar archive in a new, empty directory. For convenience, precompiled and optimized versions of the low-complexity sequence filters (e.g., seg, xnu, and dust) are included (see the filter/ subdirectory that gets created), along with two sequence redundancy removal programs nrdb and patdb.

Users of Mac OS X 10.6 and 10.7 (“Snow Leopard” and “Lion”) Only
To ensure proper, complete unpacking of tar archives on normal, case-insensitive HFS+ file systems, use the Terminal app to execute the command:

          gnutar zxf archive.tar.gz

where archive.tar.gz is substituted with the name of the AB-BLAST archive you downloaded. The use of gnutar is needed to avoid a bug in the version of tar currently distributed with Snow Leopard (at least up to version 10.6.1) that involves the treatment of hard-linked files. If your web browser uncompressed the archive after downloading, the file will lack the .gz extension, in which case the “z” should be omitted from the gnutar command.

The executable programs from the tar archive may be moved as desired into any directory listed in the PATH environment variable, whether this means adding the newly created directory to the PATH or moving the executables into an existing directory already listed in the PATH. (Lots of information about interrogating and setting environment variables — and about the PATH environment variable itself — can be found in Google and other search engines using the query “path environment variable”). If the software is installed in a directory that was already listed in the PATH, it may be necessary to exit the currently open shell and open a new one in order for the shell to recognize the existence of the newly installed programs.

Note that the files blastp, blastn, blastx, tblastn and tblastx are actually “hard links” to the same executable program, blasta, that encodes the integrated capabilities of all 5 search methods. If desired, the links can be renamed, as long as the original names appear as substrings within the new names. Alphabetic case is unimportant. For instance, a link named ab-blastp will still invoke blasta in its blastp operational mode.

A Note to Mac OS X Users
AB-BLAST software is intended to be invoked via a CLI (command line interface). Programs will need to be invoked either using the Terminal application (located in the /Applications/Utilities folder) or from within a script or other application provided by a third party. The programs bundled with AB-BLAST are not themselves intended to be double-clicked to execute.

A Note About File Permissions and File Copying
The AB-BLAST package is copyrighted and only available under license. To help ensure users of the software do not unintentionally copy or distribute it, all copies of binary files are recommended to be maintained with execute-only permissions. As delivered in the software archives from Advanced Biocomputing, LLC, execute-only permissions have already been set, but if the binary files should be copied by you, these permissions may become altered and thus allow other users to then copy the software in an unauthorized manner. Restoration of execute-only permissions to an executable program file can be accomplished by running the command:

    chmod 0111 filename

where filename is the name of the executable file.


If you already had AB-BLAST (or WU-BLAST) installed (with BLAST-able databases), your installation or update of AB-BLAST is essentially complete. If you did not have AB-BLAST or WU-BLAST installed, read on...

Unpacking the tar archive creates a matrix/ subdirectory containing scoring matrix files. Wherever this directory ultimately resides, the BLASTMAT (or ABBLASTMAT) environment variable should be set to point there. In the absence of this environment variable being set, AB-BLAST programs first look for scoring matrix files in any matrix/ subdirectory of the directory in which the search programs reside and then in the /usr/ncbi/blast/matrix directory.

Low-complexity sequence filters or masking programs — e.g., seg, xnu and dust — are now included in the tar archives described here. The bundled versions of these programs are precompiled and optimized. While these filter programs are not required for running the search programs, they can enormously reduce the amount of garbage output produced, memory used, and search time taken. Hence, it is highly recommended that these programs be made available to users. Whatever directory you install the filter programs in, the BLASTFILTER (or ABBLASTFILTER) environment variable should be set to point there. In the absence of this environment variable being set, the programs look for masking programs in any filter/ subdirectory of the directory in which the search programs themselves reside and then in /usr/ncbi/blast/filter.

NOTE: unlike NCBI BLAST, the AB-BLAST search programs do not employ sequence filtering by default. This behavior might change in the future, though. In case the search programs are updated on your system without warning and you wish to guarantee in an automated analysis pipeline that no filtering will ever be performed, just specify filter=none on the command line.

The databases themselves are obviously not included with the software. Once the source databases have been downloaded from any of many Internet sites, the database files are typically uncompressed and processed into FASTA format, if they are not in FASTA format already. Included in the tar archives are several utility programs for converting textual database files:

The NCBI software Toolbox also contains some relevant parsers. One of these is asn2fsa, which converts both nucleotide and peptide sequences in GenBank ASN.1 format into FASTA format files. The asn2ff parser, which converts GenBank ASN.1 data into other flat file formats, may also come in handy, especially if you are inclined to parse GenBank into FASTA using your own routines or to using the gb2fasta and gt2fasta programs mentioned above.

All of the above parsers can read from standard input (sometimes signified by a single dash, “-”), so their input files can be maintained on disk in compressed format and dynamically zcat-ed or gunzip-ed directly into the parsers, thus saving the time and storage required for the uncompressed data. Because a dash is often used to signify the start of each command line option, if a dash is needed to specify standard input for the required input file name argument, some of these programs require that a double-dash (--) be specified on the command line before the single-dash. This double-dash signifies the end of the command line options and the start of the required arguments.

Once a source database is in FASTA format, the xdformat program should be used to convert it into “blastable” format. Concise usage instructions for xdformat (and xdget) can be obtained by invoking each program without any command line arguments. By default, xdformat produces 3 output files whose names are derived from the name of the FASTA input file. The 3 output files have distinct file name extensions and together comprise the blastable database. If sequence identifiers are optionally indexed during database creation, the blastable database will consist of a total of 4 output files. Databases formatted by xdformat contain full ambiguity code information within the blastable database files it produces.

By default, if any unrecognized amino acid or nucleotide codes are encountered or if the FASTA input file should otherwise appear corrupt, xdformat will emit an error message and halt. In such cases, if the blastable database was to be newly created, xdformat will remove the blastable database files before halting. If an existing blast database was being appended with new sequences when the error arose, the blastable database will be rolled back to its original state prior to the update attempt with none of the new sequences appended.

While formatting the database, the xdformat program can optionally (-I option) index the sequence identifiers for later identifier-based retrieval with the xdget program. XDF databases that were formatted without an identifier index can have an index created post hoc by xdformat with its -X option. It may be of interest to note for the purposes of their maintenance that xdformat and xdget are actually one-and-the-same program file, merely invoked under the two different names to obtain the two different program behaviors. This helps ensure that the index created with xdformat will be compatible with xdget. See the file "FAQ-Indexing.html" for more details on identifier indexing.

For compatibility with legacy BLAST installations, the xdformat program can function in a setdb- and pressdb-compatibility mode, wherein its behavior is similar to that of setdb and pressdb. In its compatibility mode, a similar command line structure is used and the output files produced have the same names as those produced by setdb and pressdb. Compatibility mode is invoked when xdformat is renamed or has links pointing to it named setdb and pressdb. While the files produced in compatibility mode have the same file names as those produced by the original setdb and pressdb programs (setdb.real and pressdb.real), the content of these files is always XDF. Versions of the BLASTA search program dated on or after 1999-12-14 are able to work with the more-capable XDF databases.

Note that two XDF databases — one protein and one nucleotide — can be created with the exact same name and exist in the exact same directory, because the 3-letter extensions of XDF database file names are distinct for protein sequence databases and nucleotide sequence databases.

If xdformat and the legacy setdb and pressdb programs have all been used to create databases with the same name that reside in the same directory, the BLAST search programs will preferentially search the databases created with xdformat which will have the standard XDF database file name extensions. Note that two XDF databases — one protein and one nucleotide — can be created with the same name and exist in the same directory, because the file name extensions of XDF database files are distinct for protein sequence databases and nucleotide sequence databases.

Using the -t option to xdformat, a descriptive name or title can be assigned to a database that will appear in BLAST search output. The title of an existing database can be changed after its creation, by appending an empty FASTA database and specifying the -t option with the desired new title. For example,


     xdformat -n -a mydb -t "Fancy New Title" /dev/null

The blastable database files can be placed anywhere, but for convenience the BLASTDB environment variable should include their directory location. If the BLASTDB environment variable is not set, the programs look for databases by default in /usr/ncbi/blast/db and in the current working directory. If the old pressdb program (instead of xdformat) is used to create the blastable database, the associated nucleotide sequence FASTA file must be located in the same directory as the three output files from pressdb, if the BLAST search programs are to find the FASTA file. It may sometimes be useful to maintain the FASTA files in a separate directory — even on another disk partition — and provide UNIX soft links in the BLASTDB directory that point to the real location of the FASTA files. In addition, on systems where NCBI BLAST will not be in use, blastable databases can be maintained in multiple directories listed in the BLASTDB environment variable, with each directory name delimited from the next by a colon (:), just as directory names are often delimited in the PATH environment variable.

On multi-processor computer systems, the search programs will employ as many CPUs as are installed; when more than about 4 CPUs are used, this default behavior cause efficiency of hardware utilization to be quite low, compared to running individual single-threaded BLAST jobs on each CPU. Memory use also increases linearly with the number of CPUs or threads employed. One way to govern the number of processors employed is to wrap the search programs in a shell script that sets a lower number of CPUs via the cpus=# command line option. Another, simpler approach to changing the default number of CPUs for all users follows below, for implementation by BLAST system managers possessing “root” or “SuperUser” privileges.

Distributions of AB-BLAST include a sample file named sysblast.sample, that illustrates the system-wide configuration parameters that can be established to govern the execution of BLAST jobs and, thereby, provide a more productive, trouble-free level of service. When the sysblast file is installed under the name /etc/sysblast, all BLAST jobs executed on a given computer system can be made subject to the parameters:

The sysblast file is only effective when installed in the /etc directory. The /etc directory generally resides locally to any given computer system, so parameter settings can be tailored to each computer, even if the BLAST software is maintained on a shared disk partition. The /etc directory should only be writable by “root”. Unlike the shell script wrapper approach described above, the limits set in /etc/sysblast typically can not be circumvented by normal (non-root) users of a computer system. See the comments included in the sample sysblast file for further details.


Differences between AB-BLAST and WU-BLAST

Apart from bug fixes, the most outward differences in usage and appearance of AB-BLAST and WU-BLAST include:


Citing BLAST

Citations or acknowledgments of AB-BLAST usage are greatly appreciated, as are any personal accounts of how the software is being used that you might wish to share. When URLs are acceptable, please cite with:

   Gish, W. (1996-2009) http://blast.advbiocomp.com

When URLs are not acceptable, please use:

   Gish, W. (unpublished).

In scientific communications, it is important to report both the program name and the specific version used. In the case of AB-BLAST, the version is a combination of the version number, edition (Personal, Standard, or Enterprise), release date, target platform, and build date. The release date is the first (left-most) date displayed on the first line of output and corresponds to the completion date of the source code. The build date is the second date reported and corresponds to the date and time the executables were built for the indicated target platform. Both dates are reported in ISO 8601 format.

For example, consider this introductory line of output from AB-BLAST 3.0 Standard Edition:

  BLASTN 3.0SE-AB [2009-05-29] [sol10-x64-ILPF64 2009-05-30T01:25:46]

Here the program name is BLASTN, the software version is “3.0SE” from “AB” (Advanced Biocomputing, LLC), the release date is May 29, 2009, and the build date of the 64-bit Solaris 10 X64 binary is May 30, 2009, at 1:25 AM. “ILPF64” in the target platform description indicates integers (I), long integers (L), memory pointers (P), and file pointers (F) were all compiled with 64-bits precision.

The first line of output from AB-BLAST Personal Edition substitutes the letters “PE” for SE, as shown in this example:

   BLASTP 3.0PE-AB [2009-09-27] [linux26-x64-ILPF64 2009-09-27T18:03:31]

The first line of output from AB-BLAST Enterprise Edition substitutes the letters “EE” for SE:

   TBLASTX 3.0EE-AB [2009-09-27] [linux26-x64-ILPF64 2009-09-27T18:03:31]

Historical Notes


References

Altschul, SF, and W Gish (1996). Local alignment statistics. ed. R. Doolittle. Methods Enzymol. 266:460–80.

Altschul, SF, and DJ Lipman (1990). Protein database searches for multiple alignments. Proc. Natl. Acad. Sci. USA 87:5509–13.

Altschul, SF, Gish, W, Miller, W, Myers, EW, and DJ Lipman (1990). Basic local alignment search tool. J. Mol. Biol. 215:403–10.

Altschul, SF, Madden, TL, Schäffer, AA, Zhang, J, Zhang, Z, Miller, W, and DJ Lipman (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25(17):3389–402.

Claverie, JM, and DJ States (1993). Information enhancement methods for large scale sequence analysis. Computers in Chemistry 17:191–201.

Collins, JF, and AF Coulson (1990). Significance of protein sequence similarities. Methods Enzymol. 183:474–7.

Dembo, A, and S Karlin (1991). Strong limit theorems of empirical functionals for large exceedances of partial sums of i.i.d. variables. Ann. Probab. 19:1737–55.

Dembo, A, and S Karlin (1992). Limit distributions of maximal segmental score among Markov dependent partial sums. Adv. Appl. Probab. 24:113–40.

Gish, W, and DJ States (1993). Identification of protein coding regions by database similarity search. Nat. Genet. 3:266–72.

Hancock, JM, and JS Armstrong (1994). SIMPLE34: an improved and enhanced implementation for VAX and Sun computers of the SIMPLE algorithm for analysis of clustered repetitive motifs in nucleotide sequences. Comput. Appl. Biosci. 10:67–70.

Karlin, S, and SF Altschul (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87:2264–8.

Karlin, S, and SF Altschul (1993). Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. USA 90:5873–7.

Karlin, S, Dembo, A, and T Kawabata (1990). Statistical composition of high scoring segments from molecular sequences. Ann. Stat. 18:571–81.

RF Mott (1992). Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol. 54:59–75.

Smith, TF, and MS Waterman (1981). Identification of common molecular subsequences. J. Mol. Biol. 147:195–7.

States, DJ, and W Gish (1994). Combined use of sequence similarity and codon bias for coding region identification. J. Comp. Biol. 1:39–50.

Waterman, MS, and M Vingron (1994). Rapid and accurate estimates of statistical significance for sequence data base searches. Proc. Natl. Acad. Sci. USA 91:4625–8.

Wootton, JC, and S Federhen (1993). Statistics of local complexity in amino acid sequences and sequence databases. Computers in Chemistry 17:149–63.

Wootton, JC, and S Federhen (1996). Analysis of compositionally biased regions in sequence databases. ed. R. Doolittle. Methods Enzymol. 266:554–71.

Zhang, Z, Schäffer, AA, Miller, W, Madden, TL, Lipman, DJ, Koonin, EV, and SF Altschul (1998). Protein sequence similarity searches using patterns as seeds. Nucl. Acids Res. 26:3986–90.


Last updated:


Return to the AB-BLAST Archives home page