AB-BLAST Parameter Descriptions

AB-BLAST 3.0 Parameters

Command Line Options

The complete list of command line options supported by AB-BLAST 3.0 is provided in the tables below. The information provided here comprises their definitive description. This information should be considered valid only for the current (most recent) version of the software. This file is bundled in AB-BLAST software distributions. If you find an inconsistency between the advertised behavior and the actual behavior of the software, first be sure you are using the latest software version, as indicated by the date of the latest release shown at http://blast.advbiocomp.com. If the inconsistency persists after upgrading, please report it to Technical Support. If you wish to continue using an older version of the software instead of upgrading, please consult the copy of parameters.html that came bundled with that version; it may be more accurate for your purposes than the on-line documentation. Where differences arise between the bundled file and the on-line version, they may be due to improvements or corrections that have been made to the documentation itself or due to actual differences between versions of the software.

When this web page can not be conveniently accessed, terse descriptions of most items may be obtained by entering the relevant BLAST program name alone on the command line without any arguments or options. The most recent version of the page you are viewing is located here. For most of the options, a logical diagram indicates where each imparts its effect.

Command line options for the obsolete NCBI- and WU-BLAST 1.4, first released in 1994, often apply unchanged to AB-BLAST 3.0, just as they did for WU-BLAST 2.0, which affords a degree of compatibility spanning nearly 30 years.

Command Line Syntax

The basic AB-BLAST command line syntax is:

	<program> <database> <query> [options...]

where <program> is one of blastp, blastn, blastx, tblastn and tblastx; <database> is the name of the database to search (previously formatted with xdformat); <query> is the name of a file containing one or more query sequences in FASTA format; and [options...] is a list of zero or more command line options and parameter settings.

Aside from the first two command line arguments (database name and query filename), everything else on the command line is optional. For most options, the order in which they are specified on the command line doesn't matter, but parsing is left-to-right, so the right-most setting takes precedence.

The AB-BLAST search programs support a flexible syntax for command line options and parameters — even too flexible at times. Parameters and options are typically interpreted case-independently. As a matter of habit, parameter values should be set using equals signs (=). An equals sign avoids the occasional ambiguity that arises due to the overabundance of syntax flexibility. A leading hyphen (-) can be included on parameters and options to improve human readability, since this is the custom with many other programs. Combined use of hyphens and equals signs is even allowed, does not need to be consistently applied across a given command line, and will probably be interpreted by the software as you intend, but please get in the habit of just using equals signs and no leading hyphens. It might become necessary in the future to eliminate the other forms of expression.

Large integer values can be specified using floating point representation (e.g., 1e9 instead of 1000000000).
A value of “infinity” (as in “B=infinity”) is interpreted to mean “unlimited” or the maximum value that can be represented by the data type used in the program.

For examples of syntax flexibility, each of the following command lines are valid and equivalent. Only the first line uses the recommended equals-sign syntax:

      blastp nr myquery.aa  v=10  b=100  filter=seg    e=1e-10    nogaps
      blastp nr myquery.aa  V=10  B=100  filter=seg    E=1e-10    nogaps
      blastp nr myquery.aa -V=10 -B=100 -filter=seg   -E=1e-10   -nogaps
      blastp nr myquery.aa -V10  -B100  -filter seg   -E1e-10    -nogaps
      blastp nr myquery.aa -V 10 -B 100 -filter seg   -E 1e-10   -nogaps
      blastp nr myquery.aa  V 10  B 100  filter seg    E 1e-10    nogaps
      blastp nr myquery.aa  -v10  B=100  FILTER=seg   -e=1e-10   -nogaps

An example of an ambiguous situation that can be avoided by using equals-sign notation is:

      blastp nr myquery.aa  -E2

It's unclear here whether the user intends to set parameter E to the value 2 (which the software happens to assume) or if the user intended to set a value for parameter E2 and neglected to provide a value for it. Unambiguous ways to set values for E and E2 are:

      blastp nr myquery.aa  E=2

      blastp nr myquery.aa  E2=0.1

Table of Options

altscore	dbrecmin	gapH	hitdist	msgstyle	postsw	S2	T
B	dbslice	gapK	hspmax	N	progress	seqtest	top
bottom	dbtop	gapL	hspsepQmax	nogaps	prune	shortqueryok	topcomboE
C	E	gaps	hspsepSmax	nonnegok	putenv	soffset	topcomboN
cdb	E2	gapS2	K	nosegs	pvalues	sort_by_count	ucdb
compat1.3	echofilter	gapW	kap	noseqs	Q	sort_by_highscore	V
compat1.4	endgetenv	gapX	L	notes	qframe	sort_by_pvalue	W
compat2.0	endputenv	getenv	lcfilter	novalidctxok	qoffset	sort_by_subjectlength	warnings
consistency	errors	gi	lcmask	nwlen	qrecmax	sort_by_totalscore	wink
cpus	evalues	globalexit	links	nwstart	qrecmin	span	wordmask
ctxfactor	filter	golfraction	M	O	qres	span1	wstrict
dbbottom	gapall	golmax	maskextra	olfraction	qtype	span2	X
dbchunks	gapdecayrate	gspmax	matrix	olmax	R	spoutmax	xmlcompact
dbgcode	gapE	H	mformat	pingpong	restest	stats	Y
dbrecmax	gapE2	haltonfatal	mmio	poissonp	S	sump	Z

Table of Options with Descriptions

Option Description

altscore="score_spec" alter individual scores or entire rows or columns of scores in a scoring matrix, without editing the scoring matrix file itself. Score_spec is a quoted character string consisting of three components, each separated by white space: (1) a letter in the query sequence alphabet; (2) a letter in the subject sequence alphabet; (3) the new pairwise score to be assigned to the alignment of these two letters. If the query (subject) letter is specified as the special word any, the altered score will be assigned to the entire column (row) of the scoring matrix. (N.B. Scoring matrices are stored in row=query, column=subject orientation.) If the indicated score is the special word min (max), the new assigned score will be the minimum (maximum) score observed in the matrix. If the score is given as na, the alignment of the indicated letters will be not allowed, effectively assigning to them an infinite negative score. Multiple altscore options can be specified on the command line and will be applied to the scoring matrix successively in left-to-right order. As an example of the option's use, to assign an alignment score of zero (0) to the presence of a stop codon in either the query or database sequence, these two specifications can be used together: altscore="* any 0" altscore="any * 0".
See also: matrix, M and N.

B=<b> set the maximum number of database sequences for which any alignments will be reported to b. The default limit is 250. The maximum number of alignments that may be saved and reported per database sequence is governed by other parameters.
See also: V, hspmax, gspmax, spoutmax and noseqs.

bottom used to restrict the search of a nucleotide sequence to the bottom (-) strand. In the TBLASTX search mode, where both query and subject are nucleotide sequences, the bottom option only affects the query sequence.
See also: top, dbtop, dbbottom and qframe.

C=<gcid>

use the indicated genetic code to translate the query sequence in the BLASTX and TBLASTX search modes. gcid is a numerical identifier for the desired code. A list of the genetic codes and their identifiers is displayed if C=list is specified on an otherwise syntactically correct command line. (Example: blastx foo foo c=list). In the TBLASTN search mode, the C parameter can be substituted for the dbgcode parameter.

The available genetic codes are:
   1. Standard*
   2. Vertebrate Mitochondrial
   3. Yeast Mitochondrial
   4. Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial;
        Mycoplasma; Spiroplasma
   5. Invertebrate Mitochondrial
   6. Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear
   9. Echinoderm Mitochondrial
  10. Euplotid Nuclear
  11. Bacterial and Plant Plastid
  12. Alternative Yeast Nuclear
  13. Ascidian Mitochondrial
  14. Flatworm Mitochondrial
  15. Blepharisma Macronuclear
  16. Chlorophycean Mitochondrial
  21. Trematode Mitochondrial
  22. Scenedesmus obliquus mitochondrial
  23. Thraustochytrium mitochondrial
  1001. Codon2004
*The default genetic code (1).
Specify the desired genetic code by its number.

The Codon2004 code provides preliminary support for a draft alphabet for working precisely with each of the 64 possible codons, rather than mapping the codons to the usual 20 common amino acids. Scoring matrix files to use the Codon2004 alphabet with a translated query sequence in BLASTX should be placed in a subdirectory named ca, located parallel to the usual aa and nt subdirectories of the matrix directory. For use in TBLASTN searches, the scoring matrix should reside in an ac subdirectory; and for TBLASTX searches, the subdirectory should be cc. (Notice the use of the letter “c” for the codon alphabet, the letter “a” for the amino acid alphabet, and the query-subject ordering of the two letters to create the subdirectory name). For “codon-ized” scoring matrices derived from the BLOCKS database and appropriate for use “as is” with TBLASTX, please go here. For more information about the Codon2004 alphabet, please see Dennis Maeder’s pages.
See also: dbgcode.

cdb force nucleotide sequence databases to be searched in their compressed form. This option is only effective in the BLASTN search mode for word lengths ≥ 7. Users should generally avoid specifying this option themselves, letting the software decide when to employ this search strategy.
See also: ucdb.

compat1.3 perform a BLAST version 1.3-style search (no gaps and significance estimated using Poisson statistics), but with bug fixes, performance enhancements and new options available. To ensure consistent results, compat1.3 should be the first option specified on the command line after the required query argument.
See also: compat1.4 and compat2.0.

compat1.4 perform a BLAST version 1.4-style search (no gaps in the alignments), but with bug fixes, performance enhancements and new options available. To ensure consistent results, compat1.4 should be the first option specified on the command line after the required query argument.
See also: compat1.3 and compat2.0.

compat2.0 perform a search parameterized like WU-BLAST version 2.0, but with the bug fixes, performance enhancements and new options available in AB-BLAST 3.0. This option restores the default, classical 1-hit BLAST algorithm (hitdist=0) for protein-level searches (BLASTP, BLASTX, TBLASTN, TBLASTX). The option also affects the default scoring system used by BLASTN and the default gapX drop-off score used in all search modes. To ensure predictable results, compat2.0 should be the first option specified on the command line after the required query argument.
See also: compat1.3, compat1.4 and hitdist.

consistency turn off the determination of “consistent” sets of HSPs, effectively lumping all HSPs found for a given database sequence into one set. Use of this option also disables a combinatorial adjustment that is otherwise made to the Sum and Poisson statistics to account for the consistent arrangement of the HSPs out of all possible relative arrangements. This option has no effect if Sum or Poisson statistics are not being used.

cpus=<n> request that n processors (or threads) be employed for the search. Within the limits and conditions described below, the default behavior is to employ as many threads as the computer system has processors (or processor cores), In addition, the default behavior for BLASTN is to use up to 4 threads, even if the computer has many more processor cores available, because BLASTN searches can become so easily I/O bound. The default number of threads employed may be altered by setting a specific value for cpus in a system-wide file named /etc/sysblast; see the sysblast.sample example file included in AB-BLAST 3.0 software distributions for further information. NOTE: Memory consumption increases linearly with the number of threads; the actual number of threads employed may be automatically reduced by the software if memory resources are seen to be limiting.

ctxfactor=<c> set the “context factor” that is used as a Bonferroni-like correction in the statistics to c, to account for the number of contexts searched. Each distinct reading frame-to-reading frame or strand-to-strand combination between query and subject sequences constitutes one “context”. Thus, one context exists in a BLASTP search, as many as two contexts (two distinct strand combinations) exist in a BLASTN search, up to 6 contexts (one for each reading frame) exist in a BLASTX or TBLASTN search, and up to 6x6 = 36 contexts exist in a TBLASTX search. The maximum default value for ctxfactor then is 1 for BLASTP, 2 for BLASTN, 6 for BLASTX and TBLASTN, and 36 for TBLASTX. Restricting a search to a single strand of the query and/or database reduces the number of contexts accordingly for that search. More accurately, however, the contribution of any given context to the default value for ctxfactor is the fraction of residues in the query (or reading frame of the query) that are unambiguous (up to a maximum value of 1.0). (N.B. this fraction is computed after any optional filtering has been applied to the query). The default ctxfactor is then merely the sum of these fractions for every context involved in the search.
The software should normally be allowed to set the value of this parameter itself, unless the user has a compelling reason to change it. One rationale for explicitly setting a value for ctxfactor might be to ensure a constant value is used in the statistics across multiple searches, where the results from the searches need to be examined and compared for their statistical significance on an common basis.

dbbottom used to restrict the search to the bottom (-) strand of all database sequences.
See also: dbtop, top, bottom and qframe.

dbchunks=<nchunks> establishes the granularity of the database, as it is divided into slices for assignment to individual threads, to make more efficient use of all CPUs when multiple CPUs are employed for a given search. Higher values are appropriate when the database contains relatively few sequences and/or when the sequences vary greatly in length, composition or content (e.g., genomic contigs). Lower values are appropriate when the database contains many sequences of comparable length (e.g., the EST division of GenBank). The minimum assignable value is the number of threads employed, but this setting is ill-advised; the optimal value for any given search type is likely to be a large multiple of the number of threads employed and need not be an exact multiple. The default value is 500 (AB-BLAST Standard Edition) or 1000 (AB-BLAST Enterprise Edition); When searching databases of large unfinished genomes consisting of numerous contigs of widely varying sizes, more efficient use of all processors might be achieved by using a larger value such as 1000.
Users generally need not be concerned with this parameter.

dbgcode=<gcid> use the indicated genetic code to translate database sequences in the TBLASTN and TBLASTX search modes. gcid is a numerical identifier for the desired code. A list of the genetic codes and their identifiers is displayed if dbgcode=list is specified on an otherwise syntactically correct command line. (Example: tblastn foo foo dbgcode=list).
See also: C.

dbrecmax=<last_record> search the database until last_record, where database records are numbered starting with 1. By default, databases are searched completely. If last_record is greater than the actual number of records in the database, the database is simply searched until its end. It is an error for the requested last_record to be less than the first record requested to be searched in the database. Records in virtual databases are numbered with respect to the entire virtual database.
See also: dbrecmin.

dbrecmin=<first_record> search the database beginning at first_record, where database records are numbered starting with 1. By default, databases are searched completely. It is an error for the requested first_record to be greater than the last record requested to be searched in the database (re: the dbrecmax parameter) or to point beyond the end of the database. Records in virtual databases are numbered with respect to the entire virtual database.
See also: dbrecmax.

dbslice=m/n
dbslice=a-b/n at run time, for expressions of the form m/n, logically divide the database into n equivalent-sized slices and search only the m^th slice, where 1 ≤ m ≤ n ≤ 10000000 (1e7). Alternatively, for expressions of the form a-b/n, search slices a through b (inclusive), where 1 ≤ a ≤ b ≤ n. Slice size is determined solely by the number of sequence records contained within and is not a function of sequence length; this can produce significant disparities in the computational workload associated with searching different slices, which may be alleviated by randomizing the order of sequences in the database before formatting for BLAST. In distributed computing environments, when the same, large database is to be searched repeatedly, overall throughput may benefit from consistently assigning the same slice(s) to the same client/worker nodes for each search; improved efficiency results from the file caching activity that is typically performed by operating systems when the database files are first read from disk or over a network. Logically breaking the database into slices at run time means that each client node need only have sufficient unused memory in which to cache its assigned slice(s) — instead of the entire database — and that the database need not be repartitioned and reformatted into many smaller sub-databases whenever the number of available client nodes changes.

dbtop used to restrict the search to the top (+) strand of all database sequences.
See also: dbbottom, top, bottom and qframe.

E=<e> set the expectation threshold for reporting database hits to e. A database sequence will only be reported if an ascribed E-value for at least one of its alignments — or groups of alignments, if Sum or Poisson statistics are being used — is ≤ E. Lower E-values are more significant (less likely to occur by chance). The default threshold is E=10, such that if the search algorithm exhibited 100% sensitivity and the statistics applied perfectly to the sequences being studied, results involving 10 database sequences would be reported merely by chance.
See also: S.

E2=<e> set the expectation threshold for saving ungapped HSPs to e. In the initial, ungapped alignment phase of a search, individual HSPs will only be saved for further use if their score is ≥ S2, where the default value of S2 is computed from E2. The default value for E2 varies between BLAST search modes; the resultant value for S2 will depend on the scoring system, as well. If both E2 and S2 are specified on the command line, the one corresponding to the more restrictive (higher) score threshold will be used.
See also: gapE2, S2 and gapS2.

echofilter display the query sequence in the BLAST report, after all hard masks have been applied.
See also: filter and lcfilter.

endgetenv ignore any subsequent getenv options found on the command line during left-to-right parsing. This option may be useful in client-server situations when the command line is open to users to modify and the administrator does not wish to expose the values of environment variables to possible unauthorized interrogation.
See also: endputenv, getenv and putenv.

endputenv for security in WWW server installations, where the command line may sometimes be left open to users, ignore any subsequent putenv options found on the command line during left-to-right parsing.
See also: endgetenv, getenv and putenv.

errors suppress all ERROR messages. These messages should rarely, if ever, arise and indicate severe conditions (typically internal software bugs) that should be given immediate attention. When they do arise, parsers may break. If any ERRORs arise with this option, the number SUPPRESSED will be reported at the end of the search.
See also: notes and warnings.

evalues report E-values (expectations) instead of P-values (probabilities) in the initial one-line descriptions section of output.
See also: pvalues.

filter=<filter> “hard mask” the query sequence using the specified filter. The filter program may alter the sequence in composition but not in length. For protein-level searches (BLASTP, BLASTX, TBLASTN and TBLASTX), the supported filter programs include: seg and xnu. For nucleotide-level (BLASTN) searches, supported filter programs include: dust and seg. If multiple filter specifications are made on the command line, their independent results are logically OR-ed to produce the final, masked query sequence.
filter=none causes any earlier specifications (to the left) on the command line to be ignored.
NOTE: By default, no filtering is performed.
Arbitrary user-defined filter programs can be utilized here, if the program input and output are in FASTA/Pearson sequence format and if input and output are tied to stdin and stdout, respectively. Complete command lines with options can be specified with the filter parameter by enclosing the entire command in quotes.
The location of all filter programs (including user-defined programs) is governed by the BLASTFILTER environment variable, which can be set to a colon-delimited list of directories that the BLAST programs will successively examine to find filters.
See also: wordmask, lcfilter, lcmask and echofilter.

gapall effectively generate a gapped alignment for every ungapped HSP found (up to hspmax). This is the default behavior.
See also: gapE.

gapdecayrate=<r> define r to be the common ratio of the terms in a geometric progression used in altering probabilities as a function of the number of Poisson events involved (typically the number of “consistent” HSPs in a set), according to a method suggested by Phil Green. An initial Poisson probability for n HSPs is weighted by the quantity T_n, which is itself the reciprocal of the n^th term in the progression t_n = (1-r)r^n-1. The default value for r is 0.5, such that the default weights are successively T₁=2, T₂=4, T₃=8, T₄=16, and so on. These weights provide a conservative Bonferroni-like correction to the probabilities, in case multiple trials are performed in determining the set of HSPs yielding the lowest P-value for a given database sequence. That the geometric progression contains an infinite number of terms allows this correction method to satisfy the need for any number of tests, when this number is unknown prior to the search.
The value for gapdecayrate affects the statistics when the default Sum statistics or optional Poisson statistics are used, but not when multiple HSP statistics have been turned off with the kap option.

gapE=<gapE> generate gapped alignments for all HSPs between sequences whose expected frequency of chance occurrence is ≤ gapE. Default value is gapE=infinity — i.e., gapall is in effect.
See also: gapall.

gapE2=<e> set the E-value for saving gapped HSPs to e. In the secondary, gapped alignment phase of a search, individual gapped HSPs will only be saved for further use if their score is ≥ gapS2, where the default gapS2 is computed from gapE2. The default value for gapE2 varies between BLAST search modes; the resultant gapS2 will depend on the scoring system, as well. If both gapE2 and gapS2 are specified on the command line, the one corresponding to the more restrictive (higher) score threshold will be used.
See also: gapS2, E2 and S2.

gapH=<h> set the value of the relative entropy, H, used in evaluating the statistical significance of gapped alignment scores.
See also H.

gapK=<k> set the value of the extreme value statistics K parameter (Karlin and Altschul, 1990) used in evaluating the significance of gapped alignment scores. Useful when precomputed values are unavailable in the internal tables for the chosen scoring matrix and gap penalty combination.
See also K.

gapL=<lambda> use lambda for the value of the λ parameter in the extreme value statistics used to evaluate the significance of gapped alignment scores (Altschul and Gish, 1996). Useful when precomputed values are unavailable in the internal tables for the chosen scoring matrix and gap penalty combination.
See also: L.

gaps produce gapped alignments (the default behavior), negating the effect of any previously specified nogaps option.
See also: nogaps and gapall.

gapS2=<s> set the score threshold for saving gapped HSPs to s. In the secondary, gapped alignment phase of a search, individual gapped HSPs will only be saved for further use if their score is ≥ gapS2. The default score threshold is computed from gapE2 and will depend on the scoring system. If both E2 and S2 are specified on the command line, the one corresponding to the more restrictive (higher) score threshold will be used.
See also: gapE2, E2 and S2.

gapW=<gapW> set the window width (or band width) within which gapped alignments are computed by dynamic programming (default is gapW=32 for protein comparisons, gapW=16 for BLASTN). Note: gapW is the full bandwidth, not the half-width.

gapX=<x> set the drop-off score for gapped alignment extensions to x. Gapped extension of ungapped HSPs found between query and subject sequences continues until the cumulative alignment score deteriorates from the maximum value seen thus far by a quantity gapX or more. The default value for gapX is the score associated with 15 bits of significance (2^-15 ≈ 3x10^-5 probability) for protein-level searches or 30 bits of significance (2^-30 ≈ 10^-9 probability) for nucleotide-level (BLASTN) searches. Higher values for gapX will increase sensitivity at the expense of run time.
See also: X and gapW.

getenv="NAME" display the value of the environment variable named NAME. This may be useful for verifying that the settings of environment variables on a web server or in an analysis pipeline have been propagated all the way to the BLAST search program.
See also: endgetenv, putenv and endputenv.

gi report NCBI “gi” (GenInfo) identifiers for sequences, when present in sequence definition lines. Normally these identifiers are suppressed from output, but they represent one of the best, stable identifiers available for the GenBank/EMBL/DDBJ databases (with ACCESSION.VERSION being the other stable identifier).

globalexit when processing a file containing multiple query sequences and globalexit has been specified, if any search encounters a FATAL error, then after all queries have been processed, the line "EXIT CODE 12" is appended to the output and a testable exit status of 12 will be provided to the command shell; if the exit status is 0 for the complete run or if the last line of output is not "EXIT STATUS 12", then it can be assumed all queries succeeded. Without the globalexit option, it may be necessary to scan the output in its entirety for instances of EXIT CODE with a non-zero argument, in order to know whether any queries failed. With the globalexit option, scanning of the output is only necessary when one wishes to identify the specific query (or queries) that failed and what the individual reason codes were.
See also: haltonfatal.

golfraction=<g> maximum fractional length of overlap, g, of two gapped alignments for them to be considered independent and mutually “consistent” and their joint (Sum or Poisson) probability to be computed. The default value is 0.125 (maximum 12.5% of the length from either end of either HSP). For any given pair of HSPs, the more restrictive of golfraction and golmax is used. To eliminate golfraction from consideration, set its value to 1, to indicate the acceptability of even a complete, 100% overlap.
See also: golmax, olfraction, and olmax.

golmax=<len> set the maximum permitted length of overlap (in residues), len, of two gapped alignments for their joint (Sum or Poisson) probability to be computed. The default is unlimited length, with the maximum extent of overlap being governed only by the golfraction parameter.
See also: golfraction, olfraction, and olmax.

gspmax=<gspmax> establish gspmax as the maximum number of GSPs (gapped HSPs) to report per subject sequence or pairwise sequence comparison. If more than gspmax GSPs are found, only the best-scoring GSPs are retained for subsequent processing and reporting. The setting of gspmax will have no effect if the nogaps option is specified or if the setting of hspmax is more restrictive.
The default value for gspmax is 0, which implies no limit.
See also: hspmax, spoutmax.
NOTE: the B and V options limit the number of subject sequences for which any results whatsoever are reported, regardless of the number of HSPs or GSPs found.

H=<h> use h for the value of the relative entropy, H, when computing the statistics of ungapped alignments.
NOTE: In BLAST 1.4 and earlier, the H option was used to invoke the display of a histogram of search results; this functionality is no longer supported.)
See also: gapH.

haltonfatal when processing a file containing multiple query sequences, use this option to halt further processing at the first occurrence of a FATAL error. Processing will otherwise resume with the next query sequence when a FATAL error arises.
See also: globalexit.

hitdist=<hitdist> A positive value for hitdist invokes a 2-hit BLAST algorithm similar to — but slightly more sensitive and using much less memory than — that of Altschul et al. (1997). With the 2-hit algorithm, two word hits on the same diagonal and within hitdist residues of each other are required to trigger an ungapped extension and potentially find an HSP. The 2-hit algorithm is an option in all search modes, including BLASTN.
The default value in all search modes is hitdist=0, such that the classical 1-hit BLAST algorithm is utilized. Only a single word hit is required to trigger extension.
Explicitly specifying hitdist=0 in analysis pipelines will ensure the classical 1-hit BLAST algorithm is still used after updating to any future AB-BLAST version that may change the default behavior.
The 1-hit BLAST algorithm will always be more sensitive than the 2-hit algorithm, with all else equal. In protein-level searches, the 2-hit algorithm requires a smaller value for the word score threshold T to achieve comparable sensitivity. Smaller values for T generate more neighborhood words, which require more memory and reduce search speed. On balance, the 1-hit algorithm achieves the best sensivity for the memory used; but with the default BLOSUM62 scoring matrix and at comparable sensitivity, the 1-hit algorithm incurs roughly a 25% speed penalty. (The speed penalty is not the 3X suggested by Altschul et al. (1997) through their omission of comparable data for the 1-hit algorithm.) Relative speeds of the two algorithms at comparable sensitivity have not been assessed for other scoring systems than BLOSUM62.
See also: wink and wink.

hspmax=<hspmax> establishes hspmax as the maximum number of ungapped HSPs that will be saved per subject sequence or pairwise sequence comparison. Saved HSPs are then fed to the gapped alignment phase of the program or are statistically evaluated if gapped alignments are not to be performed. If more than hspmax HSPs are found, only the best-scoring HSPs are retained for subsequent processing.
The default value is 1000; a value of 0 signifies no limit.
See also: gspmax and spoutmax.
NOTE: This usage of hspmax is subtly, but importantly, different from the parameter's classical interpretation, wherein all ungapped HSPs that satisfied the S2 score threshold were saved; hspmax merely limited the number of HSPs (gapped or ungapped) that would be reported. The new interpretation was instituted to provide vastly improved speed on large problems, while imparting no effect on small problems and many medium-sized problems. The new behavior can help guard against horrendously slow searches resulting from an inadvertent omission of a low-complexity filter. Adverse effects on sensitivity may be obtained, however, if every HSP is sacred. To restore classical behavior, specify hspmax=0. As a compromise between sensitivity and speed, set a higher value than the default.
NOTE: the B and V options limit the number of database or subject sequences for which any results are reported, regardless of the number of HSPs or GSPs found.

hspsepQmax=<d> maximum allowed separation along the query sequence between two HSPs (gapped or ungapped) that will be clustered into a “consistent” set. Distance is measured here in units of residues at the level of the actual sequence comparison — i.e., in nucleotides for BLASTN and in peptides (or codons) for all other search modes. This option is useful for improving the statistical power to discriminate clusters that have potential biological interest from random background clusters, when the query sequence is significantly longer than the features of interest. Without this restriction, HSPs may be linked that arise from very distant portions of the query sequence. Depending on the specific search performed, distant links may be desirable, but often a reasonable setting for this parameter might be the expected maximum length of an intron. A distance restriction not only avoids clustering HSPs that would be widely separated but improves the statistics of those HSPs that still can be clustered.

hspsepSmax=<d> maximum allowed separation along the subject (database) sequence between two HSPs (gapped or ungapped) that will be clustered into a consistent set. Distance is measured here in units of residues at the level of the actual sequence comparison — i.e., in nucleotides for BLASTN and in peptides (or codons) for all other search modes. This option is useful for improving the statistical power to discriminate clusters that have potential biological interest from random background clusters, when the database contains sequences significantly longer than the features of interest. Without this restriction, HSPs may be linked that arise from very distant portions of a subject sequence. Depending on the specific search performed, distant links may be desirable, but often a reasonable setting for this parameter might be the expected maximum length of an intron. A distance restriction not only avoids clustering HSPs that would be widely separated but improves the statistics of those HSPs that still can be clustered.

K=<k> set the value for extreme value statistics K parameter (Karlin and Altschul, 1990) used in computing the statistics of ungapped alignments.
See also: gapK, L and H.

kap use basic Karlin and Altschul (1990) statistics on individual alignment scores (i.e., do not evaluate the joint probability of multiple consistent HSP scores, such as with Poisson or the default Karlin and Altschul (1993) “Sum” statistics); in order to be reported, each HSP must pass the significance test on its own; these basic statistics are an option in all search modes.
See also: poissonp and sump.

L=<lambda> use lambda for the value of the λ parameter in the extreme value statistics (Karlin and Altschul, 1990) used to compute the significance of ungapped alignments.
See also: gapL, K and H.

lcfilter replace any lower case letters in the input query sequence with the appropriate ambiguity code for “any” residue (N for nucleotide sequences; X for protein sequences).
See also: lcmask, filter, wordmask and echofilter.

lcmask when generating the neighborhood word list for the query sequence, do not process any portions of the query that were represented in lower case letters in the input file. Lower case letters in the query sequence remain unchanged by this “soft masking” procedure and can therefore participate in alignments seeded by word hits that occur in flanking regions.
See also: lcfilter, wordmask, filter, maskextra and echofilter.

links

report consistent link information for each alignment, indicating the set of “consistent” alignments used in joint statistical significance calculations. Links information appears on its own line for each HSP and begins with the keyword Links. Each HSP involving the query and a given subject sequence is numbered from 1 to n, where n is the total number of HSPs reported for the pair of sequences. When the links option is specified, the current HSP number is enclosed in parentheses.

For example, the links information for an HSP might look like the following, where the HSP number 1 enclosed in parentheses indicates that this information accompanied the first HSP reported for the given subject sequence. It is evident in this example that a total of at least 8 HSPs were reported for the subject sequence (re: the 8 in the links list), but only 3 consistent HSPs (numbers 8, 2 and 1, in that order) were involved in obtaining the Sum statistics P-value of 0.15.

		 Score = 72 (30.4 bits), Expect = 0.16, Sum P(3) = 0.15
		 Identities = 41/174 (23%), Positives = 74/174 (42%)
		 Links = 8-2-(1)

NOTE: While all link lists describe sets of consistent HSPs, unless one of the topcomboN or topcomboE options is used, only the list reported for HSPs in the most significant set for each subject sequence is guaranteed to represent the precise set of HSPs for which the joint statistics were computed; all other link lists often do correctly describe the set of HSPs involved but may have one or more missing or extraneous HSPs.
See also: hspsepQmax, hspsepSmax, topcomboE and topcomboN.

M=<m> set the positive reward score for matching nucleotides in the BLASTN search mode to m, with default value +1.
For compatibility with earlier versions of BLAST, in search modes other than BLASTN the M parameter is synonymous with the matrix parameter, but the use of M for this purpose is deprecated. To provide a fully specified scoring matrix to BLASTN, the matrix parameter itself must be used.
See also: N, matrix and altscore.

maskextra=<extra> soft mask for an additional extra letters to each side of regions that are soft masked by the lcmask and wordmask options. This reduces the incidence of high scoring alignments in low-complexity regions that would be initiated by spurious word hits in otherwise unmasked flanking regions.
See also: wordmask, lcmask and lcfilter.

matrix=<name> use the 2-dimensional matrix named name to score residue pairs in gapped and ungapped alignments. The default matrix for protein-level searches is BLOSUM62 (Henikoff and Henikoff, 1992). For BLASTN searches, the default scoring matrix is computed dynamically from a +5/-4 match/mismatch scoring system which can be altered using the M and N parameters. BLASTN can also use fully specified scoring matrices of the user's own design, by providing the name of the matrix with the matrix option. After unpacking the software, see the matrix/nt subdirectory for some examples of nucleotide scoring matrices.
NOTE: matrices need not be symmetric about their major diagonal. The row-column format of a matrix corresponds to query-subject letter pairs.
See also: altscore, M and N.

mformat=<m>[,outfile]

used to select an output format by numerical identifier, m, and optionally the name of the file where the output should be written, outfile. Multiple formats (multiple mformat options) may be requested for simultaneous output during a single search, as long as a different outfile is indicated for each format. If no outfile is specified, either standard output (stdout) or the setting of the O option (if set) is used. At most one mformat specification on a given command line may lack an outfile. If outfile contains any white space (e.g., blanks or tabs), the entire token should be enclosed in quotes, to prevent command line interpreters from breaking it into separate arguments.

The available output formats are listed if mformat=list is specified on an otherwise syntactically correct command line. (Example: blastp foo foo mformat=list). Setting mformat=0 clears any mformat specification(s) appearing to the left on the command line.

Depending on the output format, some command line options cause additional elements to appear in the output. These options include: topcomboN, topcomboE and links.

This example produces 3 different output streams (myq.out, myq.tab and myq.xml) from a single search:

      blastp swissprot myq.aa mformat=1 mformat=3,myq.tab mformat=7,myq.xml > myq.out

The mformat=1 specification will cause the normal human-readable output to be output to stdout which is redirected (“>”) into the file named myq.out.

The available choices for m and their associated formats are:

<m>	output format
list	output this list and halt
0	reset to default output only
1	pairwise, human-readable (default)
2	tabular (see description)
3	tabular with comments (see description)
4	PostScript™ graphics* (see description)
5	neighborhood word listings*
7	`XML` conforming to NCBI_BlastOutput.dtd (see example — best viewed in Firefox)

*Formats that are subject to change or removal without notice.

See also: msgstyle, O and xmlcompact.

mmio turns off the use of memory-mapped I/O when reading database files. Use of this option will usually slow the search, particularly when multiple processors are being used, but it serves both to demonstrate the effectiveness of this form of I/O and to validate the associated I/O routines. Note that no special daemon or support programs (such as the old memfile program) are required to take full advantage of memory-mapped I/O. When running 32-bit versions of the BLAST software, the mmio option might free up important virtual address space for use as working storage or heap memory.
For the vast majority of users, this option should never be used.

msgstyle=<n>

used to select by numerical identifier, n, the style of informatory messages to produce (i.e., NOTEs, WARNINGs, etc.)
The available choices for n and their associated styles are:

    0 => line-wrapped (default)
    1 => single-line with the query sequence identifier embedded (if available)

N=<n> set the negative penalty score for mismatching nucleotides in the BLASTN search mode to n, with default value −3.
See also: M, matrix, and altscore.

nogaps do not create gapped alignments, in essence reverting to WU BLAST 1.4 behavior
See also: gaps and gapall.

nonnegok Do not abort processing with a FATAL error when the expected score is non-negative. Formally, for Karlin-Dembo-Altschul statistics to apply to the evaluation of the alignment scores found during a search, the expected score for a sequence having the same residue composition as the query must be negative, but this condition does not always hold with unusual scoring matrices or query sequences. Use the novalidctxok option to cause the search to proceed even under these unusual conditions.
See also: novalidctxok and shortqueryok.

nosegs do not segment the query sequence on hyphens (-). By default, hyphens in the query sequence create insurmountable barriers for sequence alignment. As an example of where this feature is useful, multiple contigs may be concatenated together into one sequence with a hyphen separating each contig; no alignment will then extend beyond a contig boundary.
CAUTION: do not confuse this option with the similarly appearing noseqs option.

noseqs produce abbreviated output by omitting the sequence alignments. The result is often correctly interpretable by parsers of normal output.
CAUTION: do not confuse this option with the similarly appearing nosegs option.

notes suppress all NOTE messages. Important recommendations from the software may be missed if this option is used. If any NOTEs arise with this option, the number SUPPRESSED will be reported at the end of the search.
See also: errors and warnings.

novalidctxok do not treat it as a FATAL error when none of the “contexts” (e.g., strands or reading frames) of the query are valid. A valid context is one in which the threshold score for saving alignments can be achieved under ideal circumstances (typically if an alignment of 100% identity were to be found).
See also: nonnegok and shortqueryok.

nwlen=<len> generate neighborhood words (or seed words) starting from the beginning of the query sequence (or from the location specified with the nwstart parameter) and continuing for the distance len or to the end of the sequence, whichever comes first. While this parameter can be used to restrict the region in which word hits occur for seeding ungapped alignments (and indirectly gapped alignments), it does not restrict alignments from extending beyond this region.
See also: nwstart.

nwstart=<start> generate neighborhood words (or seed words) starting from coordinate position start in the query sequence and continuing to the end of the sequence (or for the distance specified with the nwlen parameter). While this parameter can be used to restrict the region in which word hits occur for seeding ungapped alignments (and indirectly gapped alignments), it does not restrict alignments from extending beyond this region.
See also: nwlen.

O=<outfile> output results to the file named outfile instead of standard output (stdout).

olfraction=<f> set the maximum fractional length of overlap, f, of two ungapped alignments for them to be considered independent and mutually “consistent” and their joint (Sum or Poisson) probability to be computed. The default f is 0.1 (maximum 10% of the length from either end of either HSP). For any given pair of HSPs, the more restrictive of olfraction and olmax is used. To eliminate olfraction from consideration, set its value to 1, to indicate the acceptability of even a complete, 100% overlap.
See also: golfraction, golmax, and olmax.

olmax=<len> set the maximum permitted length of overlap (in residues), len, of two ungapped alignments for their joint (Sum or Poisson) probability to be computed. The default is unlimited length, with the maximum extent of overlap being governed only by the olfraction parameter.
See also: golfraction, golmax, and olfraction.

pingpong Perform additional work to help ensure the gapped alignments produced are locally optimal. This option typically adds 3-10% to the execution time without affecting the results, as only rarely with “normal” scoring parameters will the score of an alignment be improved.

poissonp use Poisson statistics (Karlin and Altschul, 1990) to compute joint P-values of consistent sets of alignments; Poisson statistics are an option in all search modes.
See also: kap and sump.

postsw perform full Smith-Waterman alignment of sequences and re-rank the database matches accordingly prior to output (currently supported in BLASTP only)

progress=<s> provide an indication that the search is alive by outputting an asterisk (“*”) every s seconds during a search, if some other indication of activity has not been provided in the mean time. Such “keepalive” indicators may be useful when the software is invoked over a network connection. The default behavior (obtained with progress=0) is only to report the actual progress made through the database, using periods (“.”) and reports of percentages.

prune do not prune HSP lists, but instead report all HSPs, even those that were not involved in satisfying the statistical significance threshold necessary for reporting the database sequence. NOTE: When the default Sum statistics are used, the normal pruning activity is robust; when Poisson statistics are used, some HSPs may get through the pruning process and be reported that were not involved in satisfying the statistical significance threshold.
See also: span, span1 and span2.

putenv="NAME=VALUE" in the local environment to the BLAST search program, set the environment variable named NAME to the value VALUE.
See also: endgetenv, endputenv and getenv.

pvalues report P-values (the default) in the initial one-line descriptions section of output.
See also: evalues.

Q=<q> set the penalty for a gap of length one to q (default Q=9 for proteins; Q=7 for BLASTN).
See also: R.

qframe=<f> search with the query sequence translated in the single reading frame f. This parameter is useful for speeding up a search and improving both the biological and statistical significance of the findings, when the reading frame of a translation product in the query is known in advance, such as when the query sequence entails a complete ORF. Reading frames on the top (plus) strand of the query are numbered 1, 2, 3; reading frames on the bottom (minus) strand are numbered -1, -2, -3.
See also: top, bottom, dbtop and dbbottom.

Qoffset=<i> adjust all query sequence coordinates in the output by the fixed quantity i (default 0).

qrecmax=<n> in a multi-sequence query file, end database searches with the query sequence numbered n.

qrecmin=<m> in a multi-sequence query file, start database searches using the query sequence numbered m. Record are numbered starting with 1.

qres treat as a FATAL error when the query sequence contains any invalid residue codes. By default, WARNINGs are issued for invalid residue codes, which are then skipped.

qtype treat as a FATAL error if the query sequence appears from its letter composition to be of the wrong type (peptide or nucleotide).

R=<r> set the per-residue penalty for extending a gap to r (default R=2 for proteins; R=2 for BLASTN)
See also: Q.

restest causes a Bonferroni-like correction used in computing statistical significance to depend upon the relative lengths in residues of a given database sequence and the total length (in residues) of the database. restest is the default correction method used in the BLASTN, TBLASTN, and TBLASTX search modes. In all search modes, if the Z parameter is set, the size correction method defaults to restest. This behavior can be overridden by the seqtest option.
See also: seqtest, Y and Z.

S=<s> set the score-equivalent E-value threshold for reporting database hits to s. Hits for a given database sequence will only be reported if the statistical significance ascribed to a group of alignments — or a single alignment, if the kap option is used — is at least as high as that of a single alignment with score S. Unlike the score thresholds S2 and gapS2, which establish fundamental lower limits on individual ungapped and gapped alignment scores, comparisons to S are performed indirectly using E-values in the final stage of screening database hits by their statistical significance. When the user sets a value for S, it is converted to a corresponding E-value using Karlin-Dembo-Altschul statistics, which depends on the scoring system, length of the query sequence and size of the database. During the search, the E-values computed for alignments (singly or in groups) are screened against the E-value computed from S. If both E and S are specified on the command line, the one corresponding to the more restrictive (lower) E-value is used in the comparisons. If neither E nor S is specified on the command line, the default value for E (10) is used, as a default value for S is not defined.
See also: E, gapS2, kap and S2.

S2=<s> set the score threshold for saving ungapped HSPs to s. In the initial, ungapped alignment phase of a search, individual HSPs will only be saved for further use if their score is ≥ S2. The default score threshold is computed from the default value for E2 and will depend on the scoring system. If both E2 and S2 are specified on the command line, the one corresponding to the more restrictive (higher) score threshold will be used.
gapS2, E2 and gapE2.

seqtest causes a Bonferroni-like correction used in computing statistical significance to depend upon the number of sequences in the database. seqtest is the default correction method in the BLASTP and BLASTX search modes. This behavior can be overridden by the restest option.
NOTE: In all search modes, including BLASTP and BLASTX, for backward compatibility with legacy BLAST software: if the Z parameter is specified, the value for Z is expected to be expressed in units of residues, unless seqtest is also specified on the command line.
See also: restest, Y and Z.

shortqueryok do not treat it as a FATAL error when the query sequence is shorter than the BLAST algorithm word length.
See also: novalidctxok and nonnegok.

Soffset=<i> adjust all subject sequence coordinates in the output by the fixed quantity i (default 0).

sort_by_count sort database sequences from highest to lowest by the number of HSPs identified. Multiple sort_by* options may be specified and take precedence in the order specified.

sort_by_highscore sort database sequences from highest to lowest by the highest HSP score found. Multiple sort_by* options may be specified and take precedence in the order specified.

sort_by_pvalue sort database sequences from lowest to highest by their best P-value. Multiple sort_by* options may be specified and take precedence in the order specified. sort_by_pvalue is the default primary sort key.

sort_by_subjectlength sort database sequences from longest to shortest. Multiple sort_by* options may be specified and take precedence in the order specified.

sort_by_totalscore sort database sequences from highest to lowest by the sum total score of all HSPs found. Multiple sort_by* options may be specified and take precedence in the order specified.

span retain HSPs (ungapped or gapped) regardless of whether they span or are spanned by any other HSP. When this option is specified, memory requirements may increase dramatically to accommodate an increased number of HSPs that must be tracked, particularly when the sequences being compared contain short periodicity repeats and low complexity regions.
See also: span1 and span2.

span1 discard an HSP (ungapped or gapped) when it spans or is spanned by another HSP along either the query or the subject sequence (or both). When a pair of such HSPs is found, the one with the lowest score is discarded; if their scores are equal, the longer, less information-dense HSP is discarded.
See also: span and span2.

span2 discard an HSP (ungapped or gapped) when it spans or is spanned by another HSP along both the query and subject sequences. When a pair of such HSPs is found, the one with the lowest score is discarded; if their scores are equal, the longer, less information-dense HSP is discarded.
span2 is the default behavior.
See also: span and span1.

spoutmax=<spoutmax> establishes spoutmax as the maximum number of segment pairs to report in program output per subject sequence or pairwise comparison, independent of the number of HSPs or GSPs actually found and evaluated. If more than spoutmax segment pairs are found, the segment pairs are sorted by the criteria in effect for the search and only the first spoutmax segment pairs will be reported. The setting of spoutmax will have no effect if either hspmax or gspmax is more restrictive.
The default value for spoutmax is 0, which signifies no limit.
See also: hspmax and gspmax.

stats gather a variety of statistics about the search (e.g., the number of word hits in each reading frame, the highest score observed, etc.) and report them in the output. Use of this option marginally impacts search speed.

sump use Sum statistics (Karlin and Altschul, 1993) to compute joint P-values of consistent sets of alignments; the use of Sum statistics is the default behavior in all search modes.
See also: kap and poissonp.

T=<t> set the neighborhood word score threshold for the ungapped BLAST algorithm to t. For a given word of length W in the query sequence, its neighborhood words are defined as the set of words that have scores ≥ T when aligned with it. Neighborhood words become the seed words used to find ungapped alignments by the BLAST algorithm. Lower values for T tend to yield a larger neighborhood (more seed words) and improved sensitivity for lower scoring alignments, but at the expense of increased memory use and run time. Higher values for T will yield a smaller (possibly empty) neighborhood word list and faster execution, at the expense of reduced sensitivity. The default T varies with the scoring matrix, word length, and between search modes. For improved sensitivity and to obtain behavior that better satisfies user expectations, identical words are included with neighborhood words in the list of potential seeds, if their score is positive but happens to be less than T.
No neighborhood words (only exactly matching words) are used by default in the BLASTN search mode; however, neighborhood words can be used even by BLASTN if a value for T is specified on the BLASTN command line. CAUTION: for the long word lengths typically employed with BLASTN, the memory required for neighborhood words can easily be prohibitive and may only be possible for short query sequences. If the T option is to be used with BLASTN, the use of a short word length, W, is also advised.
See also: W.

top used to restrict the search of a nucleotide sequence to the top (+) strand. In the TBLASTX search mode, where both query and subject are nucleotide sequences, the top option only affects the query sequence.
See also: bottom, dbtop, dbbottom and qframe.

topcomboE=<E_ratio> E_ratio is the maximum ratio of E_current/E_best for which the current “topcombo” group of consistent (colinear) local alignments will be reported for a given database sequence. The "best" group is reported in the output as "Group = 1" and tends to be the most statistically significant. The default behavior is to impose no limit on this ratio, in which case all topcombo groups satisfying E are reported (up to a maximum of topcomboN groups, if specified).
See also: links and topcomboN.

topcomboN=<n> report at most n “topcombo” groups of consistent (colinear) local alignments (HSPs). Each local alignment is allowed to be a member of only one group. Use of this option causes the addition of a "Group = #" indicator in the output for each HSP. Groups of HSPs tend to be assembled in decreasing order of statistical significance. Members of the most significant group thus tend to be reported with "Group = 1".
See also: links and topcomboE.

ucdb force nucleotide sequence databases to be searched in their uncompressed form, with any-and-all ambiguity codes in place. This option is only effective in the BLASTN search mode for word lengths ≥ 7. Users should generally avoid specifying this option themselves, letting the software decide when to employ this search strategy. This option can increase sensitivity when ambiguity codes are present in database sequences, at the expense of memory and possibly speed. Searching the uncompressed database is the only available behavior for word lengths ≤ 7. This option offers improved sensitivity only when searching databases in XDF format that contain ambiguity codes. The option is accepted by the software but offers no improvement in sensitivity for databases in the earlier BLAST 1.4 database format.
See also: cdb.

V=<v> set the maximum number of one-line descriptions of significant database sequences to report in the first section of program output to v. The default limit is 500.
See also: B.

W=<w> set the seed word length for the ungapped BLAST algorithm to w. The default word length for protein-level searches is 3 amino acids; for BLASTN searches, the default length is 11 nucleotides. Shorter word lengths may increase sensitivity, at the expense of increased run time. In all search modes, the acceptable range of word lengths is 1 ≤ w ≤ 1024.
See also: T.

warnings suppress all WARNING messages.
CAUTION: important advisories may be missed if this option is used; however, if any WARNING situations should arise, the number SUPPRESSED will be reported at the end of the search.
See also: errors and notes.

wink=<wink> generate word hits at every wink^th residue position along the query, where the default wink=1 produces neighborhood words at every position. For best sensitivity, wink should not be adjusted. Wink settings greater than 1 are best used to find identical or nearly identical sequences more rapidly. When used in conjunction with the hitdist option to obtain the highest search speed, care should be taken that desirable alignments are not precluded by these parameters.
NOTE: When using BLASTN to search compressed nucleotide sequence databases in their compressed form, an increase in speed (and concomitant decrease in sensitivity) will not be observed unless wink is set to a value greater than the compression ratio, which is usually 4.
CAUTION: Some versions of WU-BLASTN (those prior to [15-Oct-2004]) were associated with a major bug in their handling of the wink parameter.

wordmask=<filter> “soft mask” the query sequence using the indicated filter. A copy of the query sequence is passed through the filter program and any letters converted by it to ambiguity codes are skipped during neighborhood word or seed word generation. Unlike the filter option, the query sequence itself remains unaltered and available for alignment. Usage of the wordmask parameter is otherwise identical to that of filter, with the same set of filtering methods available for use.
See also: filter, lcmask, lcfilter and maskextra.

wstrict when searching a nucleotide database sequence that contains one or more ambiguous residues, require that every ungapped alignment found during the initial, ungapped phase of a search actually contain an identical word hit (in the usual case of BLASTN usage) or neighborhood word hit (in the case of TBLASTN and TBLASTX). The wstrict option has no effect whatsoever on BLASTX and has no effect on BLASTP when gapped alignments (the default) are to be produced. When ungapped alignments are the desired end product from BLASTP (i.e., the -nogaps option is specified), wstrict will prevent the software from exhaustively searching diagonals that are found to contain HSPs in an effort to find other HSPs that would not be seeded by neighborhood word hits.

X=<x> set the drop-off score for the ungapped BLAST algorithm to x. Ungapped extension of initial neighborhood word hits or seed word hits between the query and subject sequences continues until the cumulative alignment score deteriorates from the maximum value seen thus far during the extension by a quantity X or more. The default value for X is the score associated with 10 bits of significance (2^-10 ≈ 10^-3 probability) for protein-level searches (all but BLASTN) and 20 bits of significance (2^-20 ≈ 10^-6 probability) for nucleotide-level searches (BLASTN only). Higher values for X will increase sensitivity at the expense of execution time, but both tend to diminish rapidly in their rate of change as X is further increased.
See also: gapX.

xmlcompact omit newline and white space characters normally reported between entities in XML documents produced with mformat=7. Their purpose is merely to improve human readability of a document when viewed with an XML-ignorant program, but these characters often comprise a substantial fraction (30% is not uncommon) of the bytes in a document and they are completely extraneous for the purposes of automated parsing and viewing with XML-aware software.
See also: mformat.

Y=<y> set the effective length of the querY sequence (in units of residues) used in statistical significance calculations to y. The interpretation of y as being in units of residues is unaffected by any other options or parameter settings, including the setting of Z or seqtest.
See also: restest, seqtest and Z.

Z=<z> set the effective size of the database (databaZe) used in statistical significance calculations to z. Caution: use of the Z parameter fundamentally changes the way database size is measured and used in statistical calculations in searches of protein sequence databases (but not nucleotide sequence databases). Users of this parameter are strongly urged to read about the seqtest and restest options. Unless overridden by the seqtest option, the unit of measure for z is residues. If seqtest is also specified, the unit of measure for z becomes sequences instead.
See also: restest, seqtest and Y.

Last modified: 2023-07-27

Return to the AB-BLAST Archives home page