.TH "xnu" 1 "February 18, 1992" \" This is the on-line manual page for xnu. \" \" The operation of the 'man' command can be simulated with the command: \" nroff -man xnu.1 | ul | more \" .UC 4 .SH NAME xnu \- exclude non-unique .SH SYNOPSIS .LP .B xnu fasta-protein-sequences-file [options] .br [-n search-width] [-p probability-cut] [-s score-cut] .br [-60] [-120] [-250] .br [-.] [-x] [-o] [-a] [-d] [-r] [-v] .sp .br .SH DESCRIPTION .LP Functionally, .I xnu will read a file containing FASTA format protein sequences. It will search each sequence for statistically significant tandem repeats. .LP The motivation for this program is to filter sequences or databases of sequences to eliminate short period internal repeats which would confound the scoring of sequence similarity searches. .LP For .I xnu the PAM120 matrix is used for scoring similarities, internal repeats with a period of less then or equal to 4 are eliminated, and the expectation cutoff for elimination is 0.01. Repeat sequences will be replaced by a space holding character in the output sequence. By default both of the segments in the alignment defining the internal repeat will be replaced by the space holder 'X'. .LP .SH OPTIONS .LP .B \-n 10 Set the width to search for internal repeats. The value 0 will search for internal repeats of any period. .LP .B \-p 0.01 Set the probability cutoff for accepting an internal repeat as significant. .LP .B \-s Set the score cutoff for accepting an internal repeat as significant. By default this is calculated from the probability cutoff, sequence length, and scoring matrix using the standard Dayhoff amino acid frequencies. .LP .B \-60 Use PAM60 scoring matrix. (xnu only) .LP .B \-120 Use PAM120 scoring matrix. (xnu only) .LP .B \-250 Use PAM250 scoring matrix. (xnu only) .LP .B \-. Print a '.' as the space holder in place of eliminated sequences in the output. .LP .B \-x Print an 'X' as the space holder in place of eliminated sequences in the output. .LP .B \-o Print the sequence in lowercase as the space holder in place of eliminated sequences in the output. .LP .B \-a Eliminate only the ascending half of an alignment. .LP .B \-d Eliminate only the descending half of an alignment. .LP .B \-r Reverse the output and print the repeats while eliminating the unique portion of the sequence. .LP .B \-v Turns the verbose flag on. Default is off. Verbose output is sent to stdout. .SH EXAMPLE Assuming a file named 'seq' contains the following: .nf >Sample sequence with internal repeats ACDEFGHIKLMNPQRQRQRQRQRQRQRQRQRSTVWY xnu seq will print the sequence with the repeats eliminated: >Sample sequence with internal repeats ACDEFGHIKLMNPXXXXXXXXXXXXXXXXXXSTVWY xnu seq -r will print the repetitive part of the sequence: >Sample sequence with internal repeats XXXXXXXXXXXXXQRQRQRQRQRQRQRQRQRXXXXX .fi \" When ever possible, pointers to related commands should be given. .SH SEE ALSO .LP .I fasta, blast, seg \" This section can be real useful. .SH BUGS Feature: Only a fixed set of scoring systems is allowed. .LP Feature: The algorithm is a simple search. For long sequences and large search widths, it can be slow. .SH REFERENCES .nf Jean Michel Claverie & David J. States (1993) Information Enhancement Methods for Large Scale Sequence Analysis, Computers and Chemistry 17: 191-201. Jean Michel Claverie (1993) Large Scale Sequence Analysis, Chapter 36 in "Automated DNA sequencing and analysis techniques" (J.C. Venter, ed), Academic Press National Center for Biotechnology Information National Library of Medicine 8600 Rockville Pike, 38A 8S806 Bethesda, MD 20894 .fi