The complete list of command line options supported by
AB-BLAST 3.0 is provided in the
tables below.
The information provided here
comprises their definitive description.
This information should be considered valid only
for the current (most recent) version of the software.
This file is bundled in AB-BLAST software distributions.
If you find an inconsistency between the advertised
behavior and the actual behavior of the software,
first be sure you are using the latest software version,
as indicated by the date of the latest release shown at
http://blast.advbiocomp.com.
If the inconsistency persists after upgrading,
please report it to
Technical Support.
If you wish to continue using an older version of the software
instead of upgrading,
please consult the copy of parameters.html
that came bundled with that version;
it may be more accurate for your purposes than the on-line documentation.
Where differences arise between the bundled file and the on-line version,
they may be due to improvements or corrections that have been made
to the documentation itself
or due to actual differences between versions of the software.
When this web page can not be conveniently accessed, terse descriptions of most items may be obtained by entering the relevant BLAST program name alone on the command line without any arguments or options. The most recent version of the page you are viewing is located here. For most of the options, a logical diagram indicates where each imparts its effect.
Command line options for the obsolete NCBI- and WU-BLAST 1.4, first released in 1994, often apply unchanged to AB-BLAST 3.0, just as they did for WU-BLAST 2.0, which affords a degree of compatibility spanning nearly 30 years.
The basic AB-BLAST command line syntax is:
<program> <database> <query> [options...]
where <program> is one of
blastp
,
blastn
,
blastx
,
tblastn
and
tblastx
;
<database>
is the name of the database to search
(previously formatted with xdformat
);
<query> is the name of a file containing one or more query
sequences in FASTA format;
and [options...] is a list of zero or more command line options
and parameter settings.
Aside from the first two command line arguments (database name and query filename), everything else on the command line is optional. For most options, the order in which they are specified on the command line doesn't matter, but parsing is left-to-right, so the right-most setting takes precedence.
The AB-BLAST search programs support a flexible syntax for command line options and parameters — even too flexible at times. Parameters and options are typically interpreted case-independently. As a matter of habit, parameter values should be set using equals signs (=). An equals sign avoids the occasional ambiguity that arises due to the overabundance of syntax flexibility. A leading hyphen (-) can be included on parameters and options to improve human readability, since this is the custom with many other programs. Combined use of hyphens and equals signs is even allowed, does not need to be consistently applied across a given command line, and will probably be interpreted by the software as you intend, but please get in the habit of just using equals signs and no leading hyphens. It might become necessary in the future to eliminate the other forms of expression.
Large integer values can be specified using floating point representation
(e.g., 1e9 instead of 1000000000).
A value of “infinity” (as in “B=infinity”)
is interpreted to mean “unlimited” or the maximum value that can
be represented by the data type used in the program.
For examples of syntax flexibility, each of the following command lines are valid and equivalent. Only the first line uses the recommended equals-sign syntax:
blastp nr myquery.aa v=10 b=100 filter=seg e=1e-10 nogaps blastp nr myquery.aa V=10 B=100 filter=seg E=1e-10 nogaps blastp nr myquery.aa -V=10 -B=100 -filter=seg -E=1e-10 -nogaps blastp nr myquery.aa -V10 -B100 -filter seg -E1e-10 -nogaps blastp nr myquery.aa -V 10 -B 100 -filter seg -E 1e-10 -nogaps blastp nr myquery.aa V 10 B 100 filter seg E 1e-10 nogaps blastp nr myquery.aa -v10 B=100 FILTER=seg -e=1e-10 -nogaps
An example of an ambiguous situation that can be avoided by using equals-sign notation is:
blastp nr myquery.aa -E2
It's unclear here whether the user intends to set parameter E
to the value 2 (which the software happens to assume)
or if the user intended to set a value for parameter E2
and neglected
to provide a value for it.
Unambiguous ways to set values for E
and E2
are:
blastp nr myquery.aa E=2or
blastp nr myquery.aa E2=0.1
Option | Description | ||||||||||||||||||
altscore= "score_spec" |
alter individual scores or entire rows or columns
of scores in a scoring matrix,
without editing the scoring matrix file itself.
Score_spec is a quoted character string
consisting of three components, each separated by white space:
(1) a letter in the query sequence alphabet;
(2) a letter in the subject sequence alphabet;
(3) the new pairwise score to be assigned to the alignment
of these two letters.
If the query (subject) letter is specified
as the special word any,
the altered score will be assigned
to the entire column (row) of the scoring matrix.
(N.B. Scoring matrices are stored in row=query, column=subject
orientation.)
If the indicated score is the special word min (max),
the new assigned score will be the minimum (maximum) score observed
in the matrix.
If the score is given as na,
the alignment of the indicated letters will be not allowed,
effectively assigning to them an infinite negative score.
Multiple altscore options can be specified
on the command line and will be applied to the scoring matrix
successively in left-to-right order.
As an example of the option's use, to assign an alignment score
of zero (0) to the presence of a stop codon
in either the query or database sequence,
these two specifications can be used together:
altscore="* any 0" altscore="any * 0" .
See also: matrix ,
M and N .
|
||||||||||||||||||
B= <b> |
set the maximum number of database sequences for which any alignments will be reported to b.
The default limit is 250.
The maximum number of alignments that may be saved and reported per
database sequence is governed by other parameters.
See also: V ,
hspmax ,
gspmax ,
spoutmax and
noseqs .
|
||||||||||||||||||
bottom |
used to restrict the search of a nucleotide sequence
to the bottom (-) strand.
In the TBLASTX search mode, where both query and subject
are nucleotide sequences, the bottom option only affects
the query sequence.
See also: top ,
dbtop ,
dbbottom and
qframe .
|
||||||||||||||||||
C= <gcid> |
use the indicated genetic code to translate the query
sequence in the BLASTX and TBLASTX search modes.
gcid is a numerical identifier for the desired code.
A list of the genetic codes and their
identifiers is displayed if C=list is specified
on an otherwise syntactically correct command line.
(Example: blastx foo foo c=list ).
In the TBLASTN search mode, the C parameter can be
substituted for the dbgcode parameter.
The available genetic codes are: 1. Standard* 2. Vertebrate Mitochondrial 3. Yeast Mitochondrial 4. Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma 5. Invertebrate Mitochondrial 6. Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear 9. Echinoderm Mitochondrial 10. Euplotid Nuclear 11. Bacterial and Plant Plastid 12. Alternative Yeast Nuclear 13. Ascidian Mitochondrial 14. Flatworm Mitochondrial 15. Blepharisma Macronuclear 16. Chlorophycean Mitochondrial 21. Trematode Mitochondrial 22. Scenedesmus obliquus mitochondrial 23. Thraustochytrium mitochondrial 1001. Codon2004 *The default genetic code (1). Specify the desired genetic code by its number.The Codon2004 code provides preliminary support
for a draft alphabet for working precisely
with each of the 64 possible codons,
rather than mapping the codons to the usual 20 common amino acids.
Scoring matrix files to use the Codon2004 alphabet
with a translated query sequence in BLASTX should be
placed in a subdirectory named ca ,
located parallel to the usual aa and nt
subdirectories of the matrix directory.
For use in TBLASTN searches, the scoring matrix should reside
in an ac subdirectory;
and for TBLASTX searches, the subdirectory should be cc .
(Notice the use of the letter “c” for the codon alphabet,
the letter “a” for the amino acid alphabet,
and the query-subject ordering of the two letters to create
the subdirectory name).
For “codon-ized” scoring matrices derived from the BLOCKS database
and appropriate for use “as is” with TBLASTX,
please go
here.
For more information about the Codon2004 alphabet, please see
Dennis Maeder’s
pages.
See also: dbgcode .
|
||||||||||||||||||
cdb |
force nucleotide sequence databases to be searched in their compressed form.
This option is only effective in the BLASTN search mode for word lengths ≥ 7.
Users should generally avoid specifying this option themselves,
letting the software decide when to employ this search strategy.
See also: ucdb .
|
||||||||||||||||||
compat1.3 |
perform a BLAST version 1.3-style search (no gaps and significance estimated using Poisson statistics),
but with bug fixes, performance enhancements and new options available.
To ensure consistent results,
compat1.3 should be the first option specified
on the command line after the required query argument.
See also: compat1.4
and
compat2.0 .
|
||||||||||||||||||
compat1.4 |
perform a BLAST version 1.4-style search (no gaps in the alignments),
but with bug fixes, performance enhancements and new options available.
To ensure consistent results,
compat1.4 should be the first option specified
on the command line after the required query argument.
See also: compat1.3
and
compat2.0 .
|
||||||||||||||||||
compat2.0 |
perform a search parameterized like WU-BLAST version 2.0,
but with the bug fixes, performance enhancements
and new options available in AB-BLAST 3.0.
This option restores the default, classical 1-hit BLAST algorithm
(hitdist=0) for protein-level searches (BLASTP,
BLASTX, TBLASTN, TBLASTX).
The option also affects the default scoring system used by BLASTN
and the default gapX
drop-off score used in all search modes.
To ensure predictable results,
compat2.0 should be the first option specified
on the command line after the required query argument.
See also: compat1.3 ,
compat1.4
and
hitdist .
|
||||||||||||||||||
consistency |
turn off the determination of “consistent” sets of HSPs, effectively lumping all HSPs found for a given database sequence into one set. Use of this option also disables a combinatorial adjustment that is otherwise made to the Sum and Poisson statistics to account for the consistent arrangement of the HSPs out of all possible relative arrangements. This option has no effect if Sum or Poisson statistics are not being used. | ||||||||||||||||||
cpus= <n> |
request that n processors (or threads) be employed for the search.
Within the limits and conditions described below,
the default behavior is to employ as many threads as the computer system has
processors (or processor cores),
In addition, the default behavior for BLASTN is to use up to 4 threads,
even if the computer has many more processor cores available,
because BLASTN searches can become so easily I/O bound.
The default number of threads employed may be altered by setting a specific value for cpus
in a system-wide file named /etc/sysblast ;
see the sysblast.sample example file included in
AB-BLAST 3.0 software distributions for further information.
NOTE:
Memory consumption increases linearly with the number of threads;
the actual number of threads employed may be automatically reduced
by the software if memory resources are seen to be limiting.
| ||||||||||||||||||
ctxfactor= <c> |
set the “context factor” that is
used as a Bonferroni-like correction in the statistics to c,
to account for the number of contexts searched.
Each distinct reading frame-to-reading frame or strand-to-strand combination
between query and subject sequences constitutes one “context”.
Thus, one context exists in a BLASTP search,
as many as two contexts (two distinct strand combinations) exist in a BLASTN search,
up to 6 contexts (one for each reading frame) exist in a BLASTX or TBLASTN search,
and up to 6x6 = 36 contexts exist in a TBLASTX search.
The maximum default value for ctxfactor then is 1 for BLASTP,
2 for BLASTN, 6 for BLASTX and TBLASTN,
and 36 for TBLASTX.
Restricting a search to a single strand of the query and/or database
reduces the number of contexts accordingly for that search.
More accurately, however,
the contribution of any given context to the default value
for ctxfactor is the fraction of residues in the query
(or reading frame of the query) that are unambiguous (up to a maximum value of 1.0).
(N.B. this fraction is computed after any optional filtering
has been applied to the query).
The default ctxfactor is then merely the sum of these fractions for every context involved in the search.
The software should normally be allowed to set the value of this parameter itself, unless the user has a compelling reason to change it. One rationale for explicitly setting a value for ctxfactor
might be to ensure a constant value is used in the statistics
across multiple searches,
where the results from the searches need to be examined
and compared for their statistical significance on an common basis.
|
||||||||||||||||||
dbbottom |
used to restrict the search to the bottom (-) strand of all database sequences.
See also: dbtop ,
top ,
bottom and
qframe .
|
||||||||||||||||||
dbchunks= <nchunks> |
establishes the granularity of the database, as it is divided into
slices for assignment to individual threads,
to make more efficient use of all CPUs when multiple CPUs
are employed for a given search.
Higher values are appropriate when the database contains relatively
few sequences and/or when the sequences vary greatly in length,
composition or content (e.g., genomic contigs).
Lower values are appropriate when the database contains many
sequences of comparable length
(e.g., the EST division of GenBank).
The minimum assignable value is the number of threads employed,
but this setting is ill-advised;
the optimal value for any given search type is likely to be
a large multiple of the number of threads employed
and need not be an exact multiple.
The default value is 500 (AB-BLAST Standard Edition)
or 1000 (AB-BLAST Enterprise Edition);
When searching databases of large unfinished genomes
consisting of numerous contigs of widely varying sizes,
more efficient use of all processors might be achieved by using
a larger value such as 1000.
Users generally need not be concerned with this parameter. |
||||||||||||||||||
dbgcode= <gcid> |
use the indicated genetic code to translate database
sequences in the TBLASTN and TBLASTX search modes.
gcid is a numerical identifier for the desired code.
A list of the genetic codes and their
identifiers is displayed if dbgcode=list is specified
on an otherwise syntactically correct command line.
(Example: tblastn foo foo dbgcode=list ).
See also: C .
|
||||||||||||||||||
dbrecmax= <last_record> |
search the database until last_record,
where database records are numbered starting with 1.
By default, databases are searched completely.
If last_record is greater than the actual
number of records in the database, the database is simply
searched until its end.
It is an error for the requested last_record to be
less than the first record requested to be searched in the database.
Records in virtual databases are numbered with respect to the
entire virtual database.
See also: dbrecmin .
|
||||||||||||||||||
dbrecmin= <first_record> |
search the database beginning at first_record,
where database records are numbered starting with 1.
By default, databases are searched completely.
It is an error for the requested first_record to be
greater than the last record requested to be searched in the database
(re: the dbrecmax parameter)
or to point beyond the end of the database.
Records in virtual databases are numbered with respect to the
entire virtual database.
See also: dbrecmax .
|
||||||||||||||||||
dbslice= m/n
dbslice= a-b/n |
at run time, for expressions of the form m/n, logically divide the database into n equivalent-sized slices and search only the mth slice, where 1 ≤ m ≤ n ≤ 10000000 (1e7). Alternatively, for expressions of the form a-b/n, search slices a through b (inclusive), where 1 ≤ a ≤ b ≤ n. Slice size is determined solely by the number of sequence records contained within and is not a function of sequence length; this can produce significant disparities in the computational workload associated with searching different slices, which may be alleviated by randomizing the order of sequences in the database before formatting for BLAST. In distributed computing environments, when the same, large database is to be searched repeatedly, overall throughput may benefit from consistently assigning the same slice(s) to the same client/worker nodes for each search; improved efficiency results from the file caching activity that is typically performed by operating systems when the database files are first read from disk or over a network. Logically breaking the database into slices at run time means that each client node need only have sufficient unused memory in which to cache its assigned slice(s) — instead of the entire database — and that the database need not be repartitioned and reformatted into many smaller sub-databases whenever the number of available client nodes changes. | ||||||||||||||||||
dbtop |
used to restrict the search to the top (+) strand of all database sequences.
See also: dbbottom ,
top ,
bottom and
qframe .
|
||||||||||||||||||
E= <e> |
set the expectation threshold for reporting database hits to e.
A database sequence will only be reported if an ascribed E-value
for at least one of its alignments
— or groups of alignments, if Sum or Poisson statistics are being used —
is ≤ E .
Lower E-values are more significant (less likely to occur by chance).
The default threshold is E=10 ,
such that if the search algorithm exhibited 100% sensitivity
and the statistics applied perfectly to the sequences being studied,
results involving 10 database sequences would be reported merely by chance.
See also: S .
|
||||||||||||||||||
E2= <e> |
set the expectation threshold for saving ungapped HSPs to e.
In the initial, ungapped alignment phase of a search,
individual HSPs will only be saved for further use
if their score is ≥ S2 ,
where the default value of S2 is computed from E2 .
The default value for E2 varies between BLAST search modes;
the resultant value for S2 will depend on the scoring system, as well.
If both E2 and S2 are specified on the command line,
the one corresponding to the more restrictive (higher) score threshold
will be used.
See also: gapE2 ,
S2 and
gapS2 .
|
||||||||||||||||||
echofilter |
display the query sequence in the BLAST report, after all hard masks have been applied.
See also: filter and lcfilter .
|
||||||||||||||||||
endgetenv |
ignore any subsequent getenv options found on the command line during left-to-right parsing.
This option may be useful in client-server situations when the command line
is open to users to modify and the administrator does not wish to expose the values
of environment variables to possible unauthorized interrogation.
See also: endputenv ,
getenv and putenv .
|
||||||||||||||||||
endputenv |
for security in WWW server installations, where the command line may sometimes be left open to users,
ignore any subsequent putenv options found on the command line during left-to-right parsing.
See also: endgetenv ,
getenv and putenv .
|
||||||||||||||||||
errors |
suppress all ERROR messages. These messages should rarely, if ever, arise and indicate severe conditions (typically internal software bugs) that should be given immediate attention. When they do arise, parsers may break.
If any ERROR s arise with this option, the number SUPPRESSED will be reported
at the end of the search.
See also: notes
and
warnings .
|
||||||||||||||||||
evalues |
report E-values (expectations) instead of P-values (probabilities) in the initial one-line descriptions section of output.
See also: pvalues .
|
||||||||||||||||||
filter= <filter> |
“hard mask” the query sequence using the specified filter.
The filter program may alter the sequence in composition but not in length.
For protein-level searches (BLASTP, BLASTX, TBLASTN and TBLASTX), the supported filter programs include:
seg and xnu .
For nucleotide-level (BLASTN) searches, supported filter programs include:
dust and seg .
If multiple filter specifications are made on the command line, their independent results are logically OR-ed
to produce the final, masked query sequence.
filter=none causes any earlier specifications (to the left) on the command line to be ignored.
NOTE: By default, no filtering is performed. Arbitrary user-defined filter programs can be utilized here, if the program input and output are in FASTA/Pearson sequence format and if input and output are tied to stdin and stdout, respectively. Complete command lines with options can be specified with the filter parameter by
enclosing the entire command in quotes.
The location of all filter programs (including user-defined programs) is governed by the BLASTFILTER
environment variable, which can be set to a colon-delimited list of directories that the BLAST programs will successively examine to find filters.
See also: wordmask ,
lcfilter ,
lcmask and
echofilter .
|
||||||||||||||||||
gapall |
effectively generate a gapped alignment for every ungapped HSP found (up to hspmax ). This is the default behavior.
See also: gapE .
|
||||||||||||||||||
gapdecayrate= <r> |
define r to be the common ratio of the terms in a geometric progression
used in altering probabilities as a function of the number of Poisson
events involved (typically the number of “consistent” HSPs in a set),
according to a method suggested by Phil Green.
An initial Poisson probability for n HSPs is weighted
by the quantity Tn, which is itself
the reciprocal of the nth term in the progression
tn = (1-r)rn-1.
The default value for r is 0.5, such that
the default weights are successively T1=2,
T2=4,
T3=8,
T4=16, and so on.
These weights provide a conservative Bonferroni-like correction to the probabilities,
in case multiple trials are performed in determining the set
of HSPs yielding the lowest P-value for a given database sequence.
That the geometric progression contains an infinite number of terms
allows this correction method to satisfy the need for any number
of tests, when this number is unknown prior to the search.
The value for gapdecayrate affects the statistics
when the default Sum statistics or optional Poisson statistics are used,
but not when multiple HSP statistics have been turned off
with the kap option.
|
||||||||||||||||||
gapE= <gapE> |
generate gapped alignments for all HSPs between sequences whose expected frequency of chance occurrence is ≤ gapE.
Default value is gapE= infinity — i.e., gapall is in effect.
See also: gapall .
|
||||||||||||||||||
gapE2= <e> |
set the E-value for saving gapped HSPs to e.
In the secondary, gapped alignment phase of a search,
individual gapped HSPs will only be saved for further use
if their score is ≥ gapS2 ,
where the default gapS2 is computed from gapE2 .
The default value for gapE2 varies between BLAST search modes;
the resultant gapS2 will depend on the scoring system, as well.
If both gapE2 and gapS2 are specified on the command line,
the one corresponding to the more restrictive (higher) score threshold
will be used.
See also: gapS2 ,
E2 and
S2 .
|
||||||||||||||||||
gapH= <h> |
set the value of the relative entropy, H, used in evaluating the statistical significance of gapped alignment scores.
See also H .
|
||||||||||||||||||
gapK= <k> |
set the value of the extreme value statistics K parameter
(Karlin and Altschul, 1990)
used in evaluating the significance of gapped alignment scores.
Useful when precomputed values are unavailable in the internal tables for the chosen scoring matrix and gap penalty combination.
See also K .
|
||||||||||||||||||
gapL= <lambda> |
use lambda for the value of the
λ parameter in the extreme value statistics
used to evaluate the significance of gapped alignment scores
(Altschul and Gish, 1996).
Useful when precomputed values are unavailable in the internal tables for the chosen scoring matrix and gap penalty combination.
See also: L .
|
||||||||||||||||||
gaps |
produce gapped alignments (the default behavior),
negating the effect of any previously specified nogaps option.
See also: nogaps and gapall .
|
||||||||||||||||||
gapS2= <s> |
set the score threshold for saving gapped HSPs to s.
In the secondary, gapped alignment phase of a search,
individual gapped HSPs will only be saved for further use
if their score is ≥ gapS2 .
The default score threshold is computed from gapE2
and will depend on the scoring system.
If both E2 and S2 are specified on the command line,
the one corresponding to the more restrictive (higher) score threshold
will be used.
See also: gapE2 ,
E2 and
S2 .
|
||||||||||||||||||
gapW= <gapW> |
set the window width (or band width) within which gapped alignments are computed by dynamic programming (default is gapW=32 for protein comparisons, gapW=16 for BLASTN). Note: gapW is the full bandwidth, not the half-width. |
||||||||||||||||||
gapX= <x> |
set the drop-off score for gapped alignment extensions to x.
Gapped extension of ungapped HSPs found
between query and subject sequences
continues until the cumulative alignment score deteriorates
from the maximum value seen thus far by a quantity gapX or more.
The default value for gapX is the score associated with 15 bits
of significance (2-15 ≈ 3x10-5 probability) for protein-level searches
or 30 bits of significance (2-30 ≈ 10-9 probability)
for nucleotide-level (BLASTN) searches.
Higher values for gapX will increase sensitivity at the expense
of run time.
See also: X and
gapW .
|
||||||||||||||||||
getenv= "NAME" |
display the value of the environment variable named NAME. This may be useful for verifying that the settings of environment variables
on a web server or in an analysis pipeline have been propagated all the way to the BLAST search program.
See also: endgetenv ,
putenv and endputenv .
|
||||||||||||||||||
gi |
report NCBI “gi” (GenInfo) identifiers for sequences,
when present in sequence definition lines.
Normally these identifiers are suppressed from output,
but they represent one of the best, stable identifiers available
for the GenBank/EMBL/DDBJ databases
(with ACCESSION.VERSION being the other stable identifier).
|
||||||||||||||||||
globalexit |
when processing a file containing multiple query sequences and globalexit
has been specified,
if any search encounters a FATAL error,
then after all queries have been processed,
the line "EXIT CODE 12 " is appended to the output and
a testable exit status of 12 will be provided to the command shell;
if the exit status is 0 for the complete run or if the last line
of output is not "EXIT STATUS 12 ",
then it can be assumed all queries succeeded.
Without the globalexit option,
it may be necessary to scan the output in its entirety for instances
of EXIT CODE with a non-zero argument,
in order to know whether any queries failed.
With the globalexit option, scanning of the output
is only necessary when one wishes to identify the specific query (or queries)
that failed and what the individual reason codes were.
See also: haltonfatal .
|
||||||||||||||||||
golfraction= <g> |
maximum fractional length of overlap, g, of two gapped alignments for them to be considered independent and mutually “consistent” and their joint (Sum or Poisson) probability to be computed. The default value is 0.125 (maximum 12.5% of the length from either end of either HSP).
For any given pair of HSPs, the more restrictive of golfraction
and golmax is used.
To eliminate golfraction from consideration,
set its value to 1,
to indicate the acceptability of even a complete, 100% overlap.
See also: golmax ,
olfraction ,
and
olmax .
|
||||||||||||||||||
golmax= <len> |
set the maximum permitted length of overlap (in residues), len, of two gapped alignments for their joint (Sum or Poisson) probability to be computed. The default is unlimited length, with the maximum extent of overlap being governed only by the golfraction parameter.
See also: golfraction ,
olfraction ,
and
olmax .
|
||||||||||||||||||
gspmax= <gspmax> |
establish gspmax as the maximum number of GSPs (gapped HSPs)
to report per subject sequence or pairwise sequence comparison.
If more than gspmax GSPs are found,
only the best-scoring GSPs are retained for subsequent processing and reporting.
The setting of gspmax will have no effect
if the nogaps option is specified or
if the setting of hspmax is more restrictive.
The default value for gspmax is 0, which implies no limit.
See also: hspmax , spoutmax .
NOTE: the B and V options limit the number
of subject sequences for which any results whatsoever are reported,
regardless of the number of HSPs or GSPs found.
|
||||||||||||||||||
H= <h> |
use h for the value of the relative entropy, H,
when computing the statistics of ungapped alignments.
NOTE: In BLAST 1.4 and earlier, the H option was used to invoke the display of a histogram of search results; this functionality is no longer supported.)
See also: gapH .
|
||||||||||||||||||
haltonfatal |
when processing a file containing multiple query sequences, use this option to
halt further processing at the first occurrence of a FATAL error.
Processing will otherwise resume with the next query sequence
when a FATAL error arises.
See also: globalexit .
|
||||||||||||||||||
hitdist= <hitdist> |
A positive value for hitdist invokes a 2-hit BLAST algorithm
similar to
— but slightly more sensitive and using much less memory than —
that of
Altschul et al. (1997).
With the 2-hit algorithm, two word hits on the same diagonal and
within hitdist residues of each other
are required to trigger
an ungapped extension and potentially find an HSP.
The 2-hit algorithm is an option in all search modes,
including BLASTN.
The default value in all search modes is hitdist=0 ,
such that the classical 1-hit BLAST algorithm is utilized.
Only a single word hit is required to trigger extension.
Explicitly specifying hitdist=0 in analysis pipelines
will ensure the classical 1-hit BLAST algorithm is still used
after updating to any future AB-BLAST version
that may change the default behavior.
The 1-hit BLAST algorithm will always be more sensitive than the 2-hit algorithm, with all else equal. In protein-level searches, the 2-hit algorithm requires a smaller value for the word score threshold T
to achieve comparable sensitivity.
Smaller values for
T
generate more neighborhood words,
which require more memory and reduce search speed.
On balance,
the 1-hit algorithm achieves the best sensivity
for the memory used;
but with the default BLOSUM62 scoring matrix and
at comparable sensitivity,
the 1-hit algorithm incurs roughly a 25% speed penalty.
(The speed penalty is not the 3X suggested
by Altschul et al. (1997)
through their omission of comparable data for the 1-hit algorithm.)
Relative speeds of the two algorithms at comparable sensitivity
have not been assessed for other scoring systems than BLOSUM62.
See also: wink
and
wink .
|
||||||||||||||||||
hspmax= <hspmax> |
establishes hspmax as the maximum number of ungapped HSPs
that will be saved per subject sequence or pairwise sequence comparison.
Saved HSPs are then fed to the gapped alignment phase of the program
or are statistically evaluated
if gapped alignments are not to be performed.
If more than hspmax HSPs are found,
only the best-scoring HSPs are retained for subsequent processing.
The default value is 1000; a value of 0 signifies no limit. See also: gspmax and
spoutmax .
NOTE: This usage of hspmax is subtly,
but importantly,
different from the parameter's classical interpretation,
wherein all ungapped HSPs that satisfied the S2 score threshold
were saved; hspmax merely limited
the number of HSPs (gapped or ungapped) that would be reported.
The new interpretation was instituted to provide
vastly improved speed on large problems,
while imparting no effect on small problems
and many medium-sized problems.
The new behavior can help guard against horrendously slow searches
resulting from an inadvertent omission of a low-complexity filter.
Adverse effects on sensitivity may be obtained, however,
if every HSP is sacred.
To restore classical behavior, specify hspmax=0 .
As a compromise between sensitivity and speed, set a higher
value than the default.
NOTE: the B and V options limit the number
of database or subject sequences for which any results are reported,
regardless of the number of HSPs or GSPs found.
|
||||||||||||||||||
hspsepQmax= <d> |
maximum allowed separation along the query sequence between two HSPs (gapped or ungapped) that will be clustered into a “consistent” set. Distance is measured here in units of residues at the level of the actual sequence comparison — i.e., in nucleotides for BLASTN and in peptides (or codons) for all other search modes. This option is useful for improving the statistical power to discriminate clusters that have potential biological interest from random background clusters, when the query sequence is significantly longer than the features of interest. Without this restriction, HSPs may be linked that arise from very distant portions of the query sequence. Depending on the specific search performed, distant links may be desirable, but often a reasonable setting for this parameter might be the expected maximum length of an intron. A distance restriction not only avoids clustering HSPs that would be widely separated but improves the statistics of those HSPs that still can be clustered. | ||||||||||||||||||
hspsepSmax= <d> |
maximum allowed separation along the subject (database) sequence between two HSPs (gapped or ungapped) that will be clustered into a consistent set. Distance is measured here in units of residues at the level of the actual sequence comparison — i.e., in nucleotides for BLASTN and in peptides (or codons) for all other search modes. This option is useful for improving the statistical power to discriminate clusters that have potential biological interest from random background clusters, when the database contains sequences significantly longer than the features of interest. Without this restriction, HSPs may be linked that arise from very distant portions of a subject sequence. Depending on the specific search performed, distant links may be desirable, but often a reasonable setting for this parameter might be the expected maximum length of an intron. A distance restriction not only avoids clustering HSPs that would be widely separated but improves the statistics of those HSPs that still can be clustered. | ||||||||||||||||||
K= <k> |
set the value for extreme value statistics K parameter
(Karlin and Altschul, 1990)
used in computing the statistics of ungapped alignments.
See also: gapK ,
L
and H .
|
||||||||||||||||||
kap |
use basic
Karlin and Altschul (1990)
statistics on individual alignment scores (i.e., do not evaluate the joint probability of multiple consistent HSP scores, such as with Poisson or the default
Karlin and Altschul (1993)
“Sum” statistics);
in order to be reported,
each HSP must pass the significance test on its own;
these basic statistics are an option in all search modes.
See also: poissonp and
sump .
|
||||||||||||||||||
L= <lambda> |
use lambda for the value of the
λ parameter in the extreme value statistics
(Karlin and Altschul, 1990)
used to compute the significance of ungapped alignments.
See also: gapL ,
K
and H .
|
||||||||||||||||||
lcfilter |
replace any lower case letters in the input query sequence
with the appropriate ambiguity code for “any” residue
(N for nucleotide sequences; X for protein sequences).
See also: lcmask ,
filter ,
wordmask and
echofilter .
|
||||||||||||||||||
lcmask |
when generating the neighborhood word list for the query sequence,
do not process any portions of the query that were represented
in lower case letters in the input file.
Lower case letters in the query sequence remain unchanged
by this “soft masking” procedure and can therefore participate in alignments
seeded by word hits that occur in flanking regions.
See also: lcfilter ,
wordmask ,
filter ,
maskextra and
echofilter .
|
||||||||||||||||||
links |
report consistent link information for each alignment, indicating the set of “consistent” alignments used in joint statistical
significance calculations.
Links information appears on its own line for each HSP
and begins with the keyword Links .
Each HSP involving the query and a given subject sequence
is numbered from 1 to n,
where n is the total number of HSPs reported
for the pair of sequences.
When the links option is specified,
the current HSP number is enclosed in parentheses.
For example, the links information for an HSP might look like the following, where the HSP number 1 enclosed in parentheses indicates that this information accompanied the first HSP reported for the given subject sequence. It is evident in this example that a total of at least 8 HSPs were reported for the subject sequence (re: the 8 in the links list), but only 3 consistent HSPs (numbers 8, 2 and 1, in that order) were involved in obtaining the Sum statistics P-value of 0.15. Score = 72 (30.4 bits), Expect = 0.16, Sum P(3) = 0.15 Identities = 41/174 (23%), Positives = 74/174 (42%) Links = 8-2-(1)NOTE: While all link lists describe sets of consistent HSPs, unless one of the topcomboN
or topcomboE options is used,
only the list reported for HSPs in the most significant set
for each subject sequence is guaranteed to represent the
precise set of HSPs for which the joint statistics were computed;
all other link lists often do correctly describe the set of HSPs
involved but may have one or more missing or extraneous HSPs.
See also: hspsepQmax ,
hspsepSmax ,
topcomboE and
topcomboN .
|
||||||||||||||||||
M= <m> |
set the positive reward score for matching nucleotides in the BLASTN
search mode to m, with default value +1.
For compatibility with earlier versions of BLAST, in search modes other than BLASTN the M parameter is synonymous with the
matrix parameter, but the use of M for this purpose is deprecated.
To provide a fully specified scoring matrix to BLASTN,
the matrix parameter itself must be used.
See also: N ,
matrix and
altscore .
|
||||||||||||||||||
maskextra= <extra> |
soft mask for an additional extra letters
to each side of regions that are soft masked by the
lcmask and wordmask options.
This reduces the incidence of high scoring alignments
in low-complexity regions that would be
initiated by spurious word hits
in otherwise unmasked flanking regions.
See also: wordmask ,
lcmask and
lcfilter .
|
||||||||||||||||||
matrix= <name> |
use the 2-dimensional matrix named name to score residue pairs
in gapped and ungapped alignments.
The default matrix for protein-level searches is BLOSUM62
(Henikoff and Henikoff, 1992).
For BLASTN searches,
the default scoring matrix is computed
dynamically from a +5/-4 match/mismatch scoring system
which can be altered using the M and N parameters.
BLASTN can also use fully specified scoring matrices
of the user's own design,
by providing the name of the matrix with the matrix option.
After unpacking the software, see the matrix/nt subdirectory
for some examples of nucleotide scoring matrices.
NOTE: matrices need not be symmetric about their major diagonal. The row-column format of a matrix corresponds to query-subject letter pairs. See also: altscore ,
M and N .
|
||||||||||||||||||
mformat= <m>[,outfile] |
used to select an output format by numerical identifier, m, and optionally
the name of the file where the output should be written, outfile.
Multiple formats (multiple mformat options) may be requested for simultaneous output during a single search,
as long as a different outfile is indicated for each format.
If no outfile is specified, either standard output
(stdout)
or the setting of the O option (if set) is used.
At most one mformat specification on a given command line
may lack an outfile.
If outfile contains any white space (e.g., blanks or tabs),
the entire token should be enclosed in quotes, to prevent command line interpreters
from breaking it into separate arguments.
The available output formats are listed if
Depending on the output format, some command line options cause additional elements to appear in the output.
These options include: This example produces 3 different output streams (myq.out, myq.tab and myq.xml) from a single search: blastp swissprot myq.aa mformat=1 mformat=3,myq.tab mformat=7,myq.xml > myq.out The mformat=1 specification will cause the normal human-readable output to be output to stdout which is redirected (“>”) into the file named myq.out. The available choices for m and their associated formats are:
*Formats that are subject to change or removal without notice. See also:msgstyle ,
O and
xmlcompact .
|
||||||||||||||||||
mmio |
turns off the use of memory-mapped I/O when reading database files.
Use of this option will usually slow the search, particularly when multiple processors are being used, but it serves both to demonstrate the effectiveness of this form of I/O and to validate the associated I/O routines. Note that no special daemon or support programs (such as the old memfile program) are required to take full advantage of memory-mapped I/O.
When running 32-bit versions of the BLAST software,
the mmio option might free up important virtual address
space for use as working storage or heap memory.
For the vast majority of users, this option should never be used. |
||||||||||||||||||
msgstyle= <n> |
used to select by numerical identifier, n, the style of informatory messages to produce (i.e., NOTE s, WARNING s, etc.)
The available choices for n and their associated styles are: 0 => line-wrapped (default) 1 => single-line with the query sequence identifier embedded (if available) |
||||||||||||||||||
N= <n> |
set the negative penalty score for mismatching nucleotides
in the BLASTN search mode to n, with default value −3.
See also: M ,
matrix , and
altscore .
|
||||||||||||||||||
nogaps |
do not create gapped alignments, in essence reverting to WU BLAST 1.4 behavior
See also: gaps and gapall .
|
||||||||||||||||||
nonnegok |
Do not abort processing with a FATAL error when the expected score
is non-negative.
Formally, for Karlin-Dembo-Altschul statistics to apply to the
evaluation of the alignment scores found during a search,
the expected score for a sequence having the same residue composition
as the query must be negative, but this condition does not always
hold with unusual scoring matrices or query sequences.
Use the novalidctxok option to cause the search to proceed
even under these unusual conditions.
See also: novalidctxok and shortqueryok .
|
||||||||||||||||||
nosegs |
do not segment the query sequence on hyphens (-).
By default, hyphens in the query sequence create insurmountable
barriers for sequence alignment.
As an example of where this feature is useful,
multiple contigs may be concatenated together into one sequence
with a hyphen separating each contig;
no alignment will then extend beyond a contig boundary.
CAUTION: do not confuse this option with the similarly appearing noseqs option.
|
||||||||||||||||||
noseqs |
produce abbreviated output by omitting the sequence alignments.
The result is often correctly interpretable by parsers of normal output.
CAUTION: do not confuse this option with the similarly appearing nosegs option.
|
||||||||||||||||||
notes |
suppress all NOTE messages. Important recommendations from the software may be missed if this option is used.
If any NOTE s arise with this option, the number SUPPRESSED will be reported at the end of the search.
See also: errors
and
warnings .
|
||||||||||||||||||
novalidctxok |
do not treat it as a FATAL error when none of the “contexts”
(e.g., strands or reading frames) of the query are valid.
A valid context is one in which the threshold score for saving
alignments can be achieved under ideal circumstances (typically
if an alignment of 100% identity were to be found).
See also: nonnegok and shortqueryok .
|
||||||||||||||||||
nwlen= <len> |
generate neighborhood words (or seed words) starting from
the beginning of the query sequence (or from the location specified
with the nwstart parameter) and continuing
for the distance len or to the end of the sequence,
whichever comes first.
While this parameter can be used to restrict the region in which
word hits occur for seeding ungapped alignments (and indirectly gapped alignments),
it does not restrict alignments from extending beyond this region.
See also: nwstart .
|
||||||||||||||||||
nwstart= <start> |
generate neighborhood words (or seed words) starting from
coordinate position start in the query sequence and continuing
to the end of the sequence (or for the distance specified with the nwlen parameter).
While this parameter can be used to restrict the region in which
word hits occur for seeding ungapped alignments (and indirectly gapped alignments),
it does not restrict alignments from extending beyond this region.
See also: nwlen .
|
||||||||||||||||||
O= <outfile> |
output results to the file named outfile instead of standard output (stdout ).
|
||||||||||||||||||
olfraction= <f> |
set the maximum fractional length of overlap, f, of two ungapped alignments
for them to be considered independent and mutually “consistent” and their joint (Sum or Poisson) probability to be computed.
The default f is 0.1 (maximum 10% of the length from either end of either HSP).
For any given pair of HSPs, the more restrictive of olfraction
and olmax is used.
To eliminate olfraction from consideration,
set its value to 1,
to indicate the acceptability of even a complete, 100% overlap.
See also: golfraction ,
golmax ,
and
olmax .
|
||||||||||||||||||
olmax= <len> |
set the maximum permitted length of overlap (in residues), len, of two ungapped alignments
for their joint (Sum or Poisson) probability to be computed.
The default is unlimited length, with the maximum extent of overlap being governed only
by the olfraction parameter.
See also: golfraction ,
golmax ,
and
olfraction .
|
||||||||||||||||||
pingpong |
Perform additional work to help ensure the gapped alignments produced are locally optimal. This option typically adds 3-10% to the execution time without affecting the results, as only rarely with “normal” scoring parameters will the score of an alignment be improved. | ||||||||||||||||||
poissonp |
use Poisson statistics
(Karlin and Altschul, 1990)
to compute joint P-values of consistent sets of alignments;
Poisson statistics are an option in all search modes.
See also: kap and
sump .
|
||||||||||||||||||
postsw |
perform full Smith-Waterman alignment of sequences and re-rank the database matches accordingly prior to output (currently supported in BLASTP only) | ||||||||||||||||||
progress= <s> |
provide an indication that the search is alive by outputting an asterisk (“*”) every s seconds during a search,
if some other indication of activity has not been provided in the mean time.
Such “keepalive” indicators may be useful when the software
is invoked over a network connection.
The default behavior
(obtained with progress=0 )
is only to report the actual progress made through the database,
using periods (“.”) and reports of percentages.
|
||||||||||||||||||
prune |
do not prune HSP lists, but instead report all HSPs, even
those that were not involved
in satisfying the statistical significance threshold necessary
for reporting the database sequence.
NOTE: When the default Sum statistics are used,
the normal pruning activity is robust;
when Poisson statistics are used,
some HSPs may get through the pruning process and be reported
that were not involved
in satisfying the statistical significance threshold.
See also: span ,
span1 and
span2 .
|
||||||||||||||||||
putenv= "NAME=VALUE" |
in the local environment to the BLAST search program, set the environment variable named NAME to the value VALUE.
See also: endgetenv ,
endputenv and getenv .
|
||||||||||||||||||
pvalues |
report P-values (the default) in the initial one-line descriptions section of output.
See also: evalues .
|
||||||||||||||||||
Q= <q> |
set the penalty for a gap of length one to q
(default Q=9 for proteins; Q=7 for BLASTN).
See also: R . |
||||||||||||||||||
qframe= <f> |
search with the query sequence translated in the single reading frame f.
This parameter is useful for speeding up a search and improving both the
biological and statistical significance of the findings,
when the reading frame of a translation product in the query
is known in advance,
such as when the query sequence entails a complete ORF.
Reading frames on the top (plus) strand of the query are numbered 1, 2, 3;
reading frames on the bottom (minus) strand are numbered -1, -2, -3.
See also: top ,
bottom ,
dbtop and
dbbottom .
|
||||||||||||||||||
Qoffset= <i> |
adjust all query sequence coordinates in the output by the fixed quantity i (default 0). | ||||||||||||||||||
qrecmax= <n> |
in a multi-sequence query file, end database searches with the query sequence numbered n. | ||||||||||||||||||
qrecmin= <m> |
in a multi-sequence query file, start database searches using the query sequence numbered m. Record are numbered starting with 1. | ||||||||||||||||||
qres |
treat as a FATAL error when the query sequence contains any invalid residue codes.
By default, WARNING s are issued for invalid residue codes,
which are then skipped.
|
||||||||||||||||||
qtype |
treat as a FATAL error if the query sequence appears from its letter composition to be of the wrong type (peptide or nucleotide).
|
||||||||||||||||||
R= <r> |
set the per-residue penalty for extending a gap to r
(default R=2 for proteins; R=2 for BLASTN)
See also: Q .
|
||||||||||||||||||
restest |
causes a Bonferroni-like correction used in computing
statistical significance to depend upon the relative lengths
in residues of a given database sequence
and the total length (in residues) of the database.
restest
is the default correction method used
in the BLASTN, TBLASTN, and TBLASTX search modes.
In all search modes, if the Z parameter is set, the size correction
method defaults to restest .
This behavior can be overridden by the seqtest option.
See also: seqtest ,
Y
and
Z .
|
||||||||||||||||||
S= <s> |
set the score-equivalent E-value threshold for reporting database hits to s.
Hits for a given database sequence will only be reported
if the statistical significance ascribed to a group of alignments
— or a single alignment, if the kap option is used —
is at least as high as that of a single alignment with score S .
Unlike the score thresholds S2 and gapS2 ,
which establish fundamental lower limits
on individual ungapped and gapped alignment scores,
comparisons to S are performed indirectly using E-values
in the final stage of screening database hits by their statistical significance.
When the user sets a value for S ,
it is converted to a corresponding E-value
using Karlin-Dembo-Altschul statistics,
which depends on the scoring system,
length of the query sequence and size of the database.
During the search, the E-values computed for alignments (singly or in groups)
are screened against the E-value computed from S .
If both E and S are specified
on the command line, the one corresponding to the more restrictive (lower)
E-value is used in the comparisons.
If neither E nor S is specified on the command line,
the default value for E (10) is used,
as a default value for S is not defined.
See also: E ,
gapS2 ,
kap
and
S2 .
|
||||||||||||||||||
S2= <s> |
set the score threshold for saving ungapped HSPs to s.
In the initial, ungapped alignment phase of a search,
individual HSPs will only be saved for further use
if their score is ≥ S2 .
The default score threshold is computed from the default value for E2
and will depend on the scoring system.
If both E2 and S2 are specified on the command line,
the one corresponding to the more restrictive (higher) score threshold
will be used.
gapS2 ,
E2 and
gapE2 .
|
||||||||||||||||||
seqtest |
causes a Bonferroni-like correction used in computing
statistical significance to depend upon
the number of sequences in the database.
seqtest is the default correction method
in the BLASTP and BLASTX search modes.
This behavior can be overridden by the restest option.
NOTE: In all search modes, including BLASTP and BLASTX, for backward compatibility with legacy BLAST software: if the Z parameter is specified, the value for
Z is expected to be expressed in units of residues,
unless seqtest is also specified on the command line.
See also: restest ,
Y
and
Z .
|
||||||||||||||||||
shortqueryok |
do not treat it as a FATAL error when the query sequence is
shorter than the BLAST algorithm word length.
See also: novalidctxok and
nonnegok . |
||||||||||||||||||
Soffset= <i> |
adjust all subject sequence coordinates in the output by the fixed quantity i (default 0). | ||||||||||||||||||
sort_by_count |
sort database sequences from highest to lowest by the number of HSPs identified.
Multiple sort_by* options may be specified and take precedence in the order specified.
|
||||||||||||||||||
sort_by_highscore |
sort database sequences from highest to lowest by the highest HSP score found.
Multiple sort_by* options may be specified and take precedence in the order specified.
|
||||||||||||||||||
sort_by_pvalue |
sort database sequences from lowest to highest by their best P-value.
Multiple sort_by* options may be specified
and take precedence in the order specified.
sort_by_pvalue is the default primary sort key.
|
||||||||||||||||||
sort_by_subjectlength |
sort database sequences from longest to shortest.
Multiple sort_by* options may be specified and take precedence in the order specified.
|
||||||||||||||||||
sort_by_totalscore |
sort database sequences from highest to lowest by the sum total score of all HSPs found.
Multiple sort_by* options may be specified and take precedence in the order specified.
|
||||||||||||||||||
span |
retain HSPs (ungapped or gapped) regardless of whether they
span or are spanned by any other HSP.
When this option is specified, memory requirements may increase
dramatically to accommodate an increased number of HSPs that must
be tracked, particularly when the sequences being compared
contain short periodicity repeats and low complexity regions.
See also: span1 and span2 .
|
||||||||||||||||||
span1 |
discard an HSP (ungapped or gapped) when it spans or is spanned by
another HSP along either the query or the subject sequence (or both).
When a pair of such HSPs is found, the one with the lowest score
is discarded;
if their scores are equal, the longer, less information-dense HSP is discarded.
See also: span and span2 .
|
||||||||||||||||||
span2 |
discard an HSP (ungapped or gapped) when it spans or is spanned by
another HSP along both the query and subject sequences.
When a pair of such HSPs is found, the one with the lowest score
is discarded; if their scores are equal, the longer, less information-dense HSP is discarded.
span2 is the default behavior.
See also: span and span1 .
|
||||||||||||||||||
spoutmax= <spoutmax> |
establishes spoutmax
as the maximum number of segment pairs to report
in program output per subject sequence or pairwise comparison,
independent of the number of HSPs or GSPs actually found and evaluated.
If more than spoutmax segment pairs are found,
the segment pairs are sorted by the criteria in effect
for the search and only the first spoutmax
segment pairs will be reported.
The setting of spoutmax will have no effect
if either hspmax or gspmax
is more restrictive.
The default value for spoutmax is 0,
which signifies no limit.
See also: hspmax and
gspmax .
|
||||||||||||||||||
stats |
gather a variety of statistics about the search (e.g., the number of word hits in each reading frame, the highest score observed, etc.) and report them in the output. Use of this option marginally impacts search speed. | ||||||||||||||||||
sump |
use Sum statistics
(Karlin and Altschul, 1993)
to compute joint P-values of consistent sets of alignments;
the use of Sum statistics is the default behavior
in all search modes.
See also: kap and
poissonp .
|
||||||||||||||||||
T= <t> |
set the neighborhood word score threshold for the ungapped BLAST algorithm to t.
For a given word of length W in the query sequence,
its neighborhood words are defined as the set of words
that have scores ≥ T when aligned with it.
Neighborhood words become the seed words used to find ungapped alignments
by the BLAST algorithm.
Lower values for T tend to yield a larger neighborhood
(more seed words) and improved sensitivity for lower scoring alignments,
but at the expense of increased memory use and run time.
Higher values for T will yield a smaller (possibly empty) neighborhood word list and faster execution, at the expense of reduced sensitivity.
The default T varies with the scoring matrix, word length, and
between search modes.
For improved sensitivity and to obtain behavior that better satisfies user expectations,
identical words are included with neighborhood words in the list
of potential seeds,
if their score is positive but happens to be less than T .
No neighborhood words (only exactly matching words) are used by default in the BLASTN search mode; however, neighborhood words can be used even by BLASTN if a value for T is specified on the BLASTN command line.
CAUTION: for the long word lengths typically employed
with BLASTN, the memory required
for neighborhood words can easily be prohibitive and may only be
possible for short query sequences.
If the T option is to be used with BLASTN,
the use of a short word length, W , is also advised.
See also: W .
|
||||||||||||||||||
top |
used to restrict the search of a nucleotide sequence
to the top (+) strand.
In the TBLASTX search mode, where both query and subject
are nucleotide sequences, the top option only affects
the query sequence.
See also: bottom ,
dbtop ,
dbbottom and
qframe .
|
||||||||||||||||||
topcomboE= <Eratio> |
Eratio is the maximum ratio of Ecurrent/Ebest for which
the current “topcombo” group of consistent (colinear) local alignments will be reported
for a given database sequence.
The "best" group is reported in the output as "Group = 1"
and tends to be the most statistically significant.
The default behavior is to impose no limit on this ratio, in which case all topcombo groups satisfying E are reported (up to a maximum of topcomboN groups, if specified).
See also: links and
topcomboN .
|
||||||||||||||||||
topcomboN= <n> |
report at most n “topcombo” groups of consistent (colinear) local alignments (HSPs).
Each local alignment is allowed to be a member of only one group.
Use of this option causes the addition of a "Group = #" indicator
in the output for each HSP.
Groups of HSPs tend to be assembled in decreasing order of statistical
significance.
Members of the most significant group thus tend to be reported
with "Group = 1".
See also: links and
topcomboE .
|
||||||||||||||||||
ucdb |
force nucleotide sequence databases to be searched in their uncompressed form,
with any-and-all ambiguity codes in place.
This option is only effective in the BLASTN search mode for word lengths ≥ 7.
Users should generally avoid specifying this option themselves,
letting the software decide when to employ this search strategy.
This option can increase sensitivity when ambiguity codes
are present in database sequences,
at the expense of memory and possibly speed.
Searching the uncompressed database is the only available behavior
for word lengths ≤ 7.
This option offers improved sensitivity only when searching databases in XDF format that contain ambiguity codes.
The option is accepted by the software but offers no improvement in sensitivity for databases in the earlier BLAST 1.4 database format.
See also: cdb .
|
||||||||||||||||||
V= <v> |
set the maximum number of one-line descriptions of significant
database sequences to report in the first section of program output to v.
The default limit is 500.
See also: B .
|
||||||||||||||||||
W= <w> |
set the seed word length for the ungapped BLAST algorithm to w.
The default word length for protein-level searches is 3 amino acids;
for BLASTN searches, the default length is 11 nucleotides.
Shorter word lengths may increase sensitivity,
at the expense of increased run time.
In all search modes,
the acceptable range of word lengths is 1 ≤ w ≤ 1024.
See also: T .
|
||||||||||||||||||
warnings |
suppress all WARNING messages.
CAUTION: important advisories may be missed if this option is used; however, if any WARNING situations should arise,
the number SUPPRESSED will be reported at the end of the search.
See also: errors
and
notes .
|
||||||||||||||||||
wink= <wink> |
generate word hits at every winkth residue position along the query,
where the default wink=1 produces neighborhood words at every position.
For best sensitivity, wink should not be adjusted.
Wink settings greater than 1 are best used to find identical or nearly identical sequences more rapidly.
When used in conjunction with the hitdist option
to obtain the highest search speed, care should be taken that desirable alignments are not precluded
by these parameters.
NOTE: When using BLASTN to search compressed nucleotide sequence databases in their compressed form, an increase in speed (and concomitant decrease in sensitivity) will not be observed unless wink is set to a value
greater than the compression ratio, which is usually 4.
CAUTION: Some versions of WU-BLASTN (those prior to [15-Oct-2004]) were associated with a major bug in their handling of the wink parameter.
|
||||||||||||||||||
wordmask= <filter> |
“soft mask” the query sequence using the indicated filter.
A copy of the query sequence is passed through the filter program
and any letters converted by it to ambiguity codes
are skipped during neighborhood word or seed word generation.
Unlike the filter option,
the query sequence itself remains unaltered and available for alignment.
Usage of the wordmask parameter is otherwise identical to that of filter ,
with the same set of filtering methods available for use.
See also: filter ,
lcmask ,
lcfilter and
maskextra .
|
||||||||||||||||||
wstrict |
when searching a nucleotide database sequence
that contains one or more ambiguous residues,
require that every ungapped alignment found during the initial, ungapped phase of a search
actually contain an identical word hit (in the usual case of BLASTN usage)
or neighborhood word hit (in the case of TBLASTN and TBLASTX).
The wstrict option has no effect whatsoever on BLASTX
and has no effect on BLASTP when gapped alignments (the default)
are to be produced.
When ungapped alignments are the desired end product from BLASTP
(i.e., the -nogaps option is specified),
wstrict will prevent the software from exhaustively
searching diagonals that are found to contain HSPs in an effort
to find other HSPs that would not be seeded by neighborhood word hits.
|
||||||||||||||||||
X= <x> |
set the drop-off score for the ungapped BLAST algorithm to x.
Ungapped extension of initial neighborhood word hits or seed word hits
between the query and subject sequences
continues until the cumulative alignment score deteriorates
from the maximum value seen thus far during the extension by a quantity X or more.
The default value for X is the score associated with 10 bits
of significance (2-10 ≈ 10-3 probability) for protein-level searches
(all but BLASTN)
and 20 bits of significance (2-20 ≈ 10-6 probability)
for nucleotide-level searches (BLASTN only).
Higher values for X will increase sensitivity at the expense
of execution time, but both tend to diminish rapidly in their rate of change
as X is further increased.
See also: gapX .
|
||||||||||||||||||
xmlcompact |
omit newline and white space characters normally reported between
entities in XML documents produced with mformat=7 .
Their purpose is merely to improve human readability of a document
when viewed with an XML-ignorant program,
but these characters often comprise
a substantial fraction (30% is not uncommon) of the bytes in a document and they are completely
extraneous for the purposes of automated parsing and viewing
with XML-aware software.
See also: mformat .
|
||||||||||||||||||
Y= <y> |
set the effective length of the querY sequence (in units of residues)
used in statistical significance calculations to y.
The interpretation of y as being in units of residues
is unaffected by any other options or parameter settings,
including the setting of Z or seqtest .
See also: restest ,
seqtest
and
Z .
|
||||||||||||||||||
Z= <z> |
set the effective size of the database (databaZe) used in statistical significance calculations to z.
Caution: use of the Z parameter fundamentally changes the way database size is measured and used
in statistical calculations in searches of protein sequence databases (but not nucleotide sequence databases).
Users of this parameter are strongly urged to read about the seqtest and restest options.
Unless overridden by the seqtest option,
the unit of measure for z is residues.
If seqtest is also specified,
the unit of measure for z becomes sequences instead.
See also: restest ,
seqtest
and
Y .
|
Last modified: 2023-07-27
Return to the AB-BLAST Archives home page