The BLAST programs all provide information in roughly the same format. First comes (A) an introduction to the program; (B) a histogram of scores may then be displayed if one was requested; (C) a series of one-line descriptions of matching database sequences is then presented; (D) the actual sequence alignments are then shown; and finally the parameters and other statistics gathered during the search are presented at the end. Sample BLASTP output from comparing pir|A01243|DXCH against the SWISS-PROT database is presented below. A. Program Introduction. The introductory output provides the program name (BLASTP in this case), the version number (1.3.11MP in this case), the date the program source code last changed substantially (Oct. 29, 1993), the date the program was built (July 9, 1993), and a description of the database to be searched. These may all be important pieces of information if a bug is suspected or if reproducibility of results is important. The "Searching..." indicator also indicates progress that the program made in searching the database. A complete database search will yield 50 periods (.). If fewer than 50 periods are displayed and the program aborted for some reason, dividing the number of periods by 0.5 will yield the approximate percentage (0-100%) of the database that was searched before the program died. B. One-line Summaries. The one-line sequence descriptions and summaries of results are useful for identifying biologically interesting database matches and correlating this interest with the statistical significance estimates. Unless otherwise requested, the database sequences are sorted by increasing P-value (probability). Identifiers for the database sequences appear in the first column; then come brief descriptions of each sequence, which may be truncated in order to fit in the available space. The "High Score" column contains the score of the highest-scoring HSP found with each database sequence. The P(N) column contains the lowest P-value ascribed to any set of HSPs for each database sequence; and the N column displays the number of HSPs in the set which was ascribed the lowest P-value. These P-values are a function of N, as used with standard Poisson statistics, to treat situations where multiple HSPs are found. In part because Poisson statistics are used, the highest-scoring HSP whose score is reported in the "High Score" column is not necessarily a member of the set of HSPs which yields the lowest P-value; the highest-scoring HSP may have been excluded from this set on the basis of consistency rules governing the grouping of HSPs (see the -consistency option to the BLAST programs). C. Alignments. The alignments (High-scoring Segment Pairs, or HSPs) produced from the BLAST algorithm are ungapped. Several statistics describe each HSP: the raw alignment Score; the raw score converted to bits of information by multiplying by Lambda (see the Statistics output); the number of times one might Expect to see such a match (or a better one) by chance alone; the P-value (probability in the range 0-1) of observing such a match; the number and fraction of total residues in the HSP which are identical; the number and fraction of residues for which the substitution scores have positive values. When Poisson statistics have been used in estimating the Expect and P-values, the P-value is qualified with the word "Poisson" and the N parameter used in the Poisson statistics is provided in parentheses to indicate the number of HSPs in the set. Between the two lines of Query and Subject (database) sequence is a line indicating the specific residues which are identical, as well as those which are non-identical but nevertheless have positive substitution scores in the scoring matrix that was used (BLOSUM62 in this case). Identical residues when paired with each other are not highlighted if their substitution score is non-positive. Examples of this would be an X juxtaposed with an X in two amino acid sequences, or an N juxtaposed with another N in two nucleotide sequences. Such ambiguous residue-residue pairings are typically uninformative and thus lend no support to the overall alignment being real or random. BLASTP 1.3.11MP [29-Oct-93] [Build 13:21:00 Jul 9 1994] Reference: Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman (1990). Basic local alignment search tool. J. Mol. Biol. 215:403-410. Query= pir|A01243|DXCH 232 Gene X protein - Chicken (fragment) (232 letters) Database: SWISS-PROT Release 28.0, March 1994 36,000 sequences; 12,496,420 total letters. Searching.................................................done Smallest Poisson High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N sp|P01013|OVAX_CHICK GENE X PROTEIN (OVALBUMIN-RELATED) (... 1191 7.2e-160 1 sp|P01014|OVAY_CHICK GENE Y PROTEIN (OVALBUMIN-RELATED). 949 6.5e-127 1 sp|P01012|OVAL_CHICK OVALBUMIN (PLAKALBUMIN). 645 3.1e-85 1 sp|P19104|OVAL_COTJA OVALBUMIN. 626 1.3e-82 1 sp|P35237|PTI_HUMAN PLACENTAL THROMBIN INHIBITOR. 473 1.2e-61 1 sp|P29508|SCCA_HUMAN SQUAMOUS CELL CARCINOMA ANTIGEN (SCC... 439 8.6e-56 1 sp|P05619|ILEU_HORSE LEUKOCYTE ELASTASE INHIBITOR (LEI). 216 8.6e-52 3 sp|P80229|ILEU_PIG LEUKOCYTE ELASTASE INHIBITOR (LEI) (... 325 2.5e-51 2 sp|P05120|PAI2_HUMAN PLASMINOGEN ACTIVATOR INHIBITOR-2, P... 176 2.9e-45 3 ... many more descriptions deleted ... ... alignments with the top 8 database sequences deleted ... >sp|P05120|PAI2_HUMAN PLASMINOGEN ACTIVATOR INHIBITOR-2, PLACENTAL (PAI-2) (MONOCYTE ARG- SERPIN). Length = 415 Score = 176 (80.2 bits), Expect = 1.3e-16, P = 1.3e-16 Identities = 38/89 (42%), Positives = 50/89 (56%) Query: 1 QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKPVQMMCMNN 60 +I +LL S D DT +VLVNA+YFKG WKT F + PF V + PVQMM + Sbjct: 180 KIPNLLPEGSVDGDTRMVLVNAVYFKGKWKTPFEKKLNGLYPFRVNSAQRTPVQMMYLRE 239 Query: 61 SFNVATLPAEKMKILELPFASGDLSMLVL 89 N+ + K +ILELP+A L+L Sbjct: 240 KLNIGYIEDLKAQILELPYAGDVSMFLLL 268 Score = 165 (75.2 bits), Expect = 1.9e-34, Poisson P(2) = 1.9e-34 Identities = 33/78 (42%), Positives = 47/78 (60%) Query: 155 ANLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADHPFLFL 214 AN +G+S L +S+ H A ++++E+G E A TG + + QF ADHPFLFL Sbjct: 338 ANFSGMSERNDLFLSEVFHQAMVDVNEEGTEAAAGTGGVMTGRTGHGGPQFVADHPFLFL 397 Query: 215 IKHNPTNTIVYFGRYWSP 232 I H T I++FGR+ SP Sbjct: 398 IMHKITKCILFFGRFCSP 415 Score = 144 (65.6 bits), Expect = 2.9e-45, Poisson P(3) = 2.9e-45 Identities = 26/62 (41%), Positives = 41/62 (66%) Query: 90 LPDEVSDLERIEKTINFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTSVLMALGMTD 149 + D + LE +E I ++KL +WT+ + M + V+VY+PQ K+EE Y L S+L ++GM D Sbjct: 272 IADVSTGLELLESEITYDKLNKWTSKDKMAEDEVEVYIPQFKLEEHYELRSILRSMGMED 331 Query: 150 LF 151 F Sbjct: 332 AF 333 Score = 61 (27.8 bits), Expect = 1.6e-15, Poisson P(4) = 1.6e-15 Identities = 10/17 (58%), Positives = 16/17 (94%) Query: 81 SGDLSMLVLLPDEVSDL 97 +GD+SM +LLPDE++D+ Sbjct: 259 AGDVSMFLLLPDEIADV 275 ... alignments with the remaining database sequences deleted ... Parameters: E = 10., S = 57 (26.0 bits), E2 = 0.11, S2 = 36 W = 3, T = 11 (5.0 bits), X = 22 (10.0 bits) M = BLOSUM62 H = 0, V = 500, B = 250 -gapdecayrate 0.5 (the default) Statistics: Lambda = 0.316 nats/unit score, Lambda/ln2 = 0.455 bits/unit score K = 0.132, H = 0.534 bits/position Expected/Observed high score = 61 (27.8 bits) / 1191 (542.5 bits) # of letters in query: 232 # of neighborhood words in query: 4988 # of exact words scoring below T: 0 Database: SWISS-PROT Release 28.0, March 1994 # of letters in database: 12,496,420 # of word hits against database: 5,251,731 # of failed hit extensions: 4,184,796 # of excluded hits: 1,066,623 # of successful extensions: 312 # of overlapping HSPs discarded: 52 # of HSPs reportable: 260 # of sequences in database: 36,000 # of database sequences satisfying E: 86 No. of states in DFA: 561 (55 KB) Total size of DFA: 110 KB (128 KB) Time to generate neighborhood: 0.02u 0.00s 0.02t Real: 00:00:00 No. of processors used: 8 Time to search database: 27.99u 1.22s 29.21t Real: 00:00:05 Total cpu time: 28.12u 1.28s 29.40t Real: 00:00:05