BLAST Memory Requirements

Memory Requirements for the Classical Ungapped BLAST Algorithm

Several characteristics of a BLAST search determine its heap memory requirements. For a rote implementation of the "classical" ungapped BLAST algorithm (Altschul et al., 1990), contributors to memory use include:

For example, the minimum storage required (in bytes) for a classical BLASTN search is approximately: 5SQ + C[8S(Q+D) + D] + B, where S=2 when both strands of the query are searched. In this example, B will be no more than (and often much less than) P(4W). Using one processor or thread (C=1), this simplifies to 26Q + 17D + B bytes. If an additional processor or thread is used, the minimum memory requirement increases by 16Q + 17D, for a total of 42Q + 34D + B bytes.

Memory requirements can be greatly reduced by limiting the number of threads employed (C) on multiprocessor or multi-core systems. The default behavior is to use all available processors (one thread per core or HyperThread) in the case of BLASTP, BLASTX, TBLASTN and TBLASTX; and up to 4 threads in the case of BLASTN. This default behavior can be altered in a local file named /etc/sysblast. An example sysblast.sample file is provided in licensed AB-BLAST software distributions. The most efficient use of computing resources will often be obtained by limiting individual BLAST jobs to a single thread, so that the computational overhead of thread creation and memory management is avoided.

When activated for a computer system, Intel HyperThreads typically appear like separate logical processors to application programs like BLAST, and the software may spawn an additional thread of execution for each one. Use of HyperThreads often (not always!) speeds up a search, but with an increase in memory required for each additional thread. HyperThreads are not as efficient as real cores.

The default behavior of the BLAST programs is to search both strands of a nucleotide query sequence or database. Memory use can be minimized by requesting just one strand at a time, Collating results from multiple searches may be impractical however.

Sufficient real memory should be provided to the search programs that they can run without spilling over into virtual memory swap storage, as it can be disastrous to BLAST performance to be hitting disk. AB-BLAST tries to avoid using virtual memory swap by estimating the memory required per thread and only spawning as many threads as can safely be managed within the currently available free physical memory.

Database File Caching

Beyond the above requirements for program heap storage, any additional memory available may improve BLAST performance, through caching of database files in what would otherwise be unused memory. When databases are searched repeatedly (e.g., by an automated analysis pipeline), caching of the database files avoids the latency and throughput limitations of disk I/O and potential contension between different processes for the same disk resources.

If sufficient memory is only available to cache files for a subset of databases, file caching will not be effective. Files are usually cached by the operating system in a FIFO (first in/first out) manner, such that files accessed earlier in a job stream will be dropped from the cache to make room for files accessed later. Overall system throughput may improve if the job stream can be structured to search all queries against one cache-able subset of the databases before proceeding to search the next cacheable subset, and so on, until all of the desired databases have been searched. In this manner, analysis pipelines run on memory-limited computers can still benefit from caching.

How much additional memory is useful for file caching? Typical BLAST searches involve a sequential search through an entire database. For AB-BLAST databases in XDF format, each search requires that the entirety of the .x[np]s file be read, in addition to the associated .x[np]t file. For any database hits, the associated .x[np]d file will be read to obtain sequence descriptions. Sufficient memory should be available to cache the .x[np]s and .x[np]t files, plus large portion (if not all) of the .x[np]d file. Due to the FIFO nature of cache management, adding some memory is unlikely to improve performance if still not enough is available to cache the entire .x[np]s and .x[np]t files,

One should be wary of other jobs executing simultaneously with BLAST, whose actions may purge the file cache of BLAST database files. If other jobs besides BLAST are active, additional memory should be provided for them to function within memory, too.

References

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 215:403-10.


Return to the AB-BLAST Archives home page

Last updated: 2018-12-21