Diamond
Diamond is a fast sequence similarity search tool for matching nucleotide or protein sequences against protein databases. The key features of Diamond are:
- Pairwise alignment of proteins and translated DNA at 500x-20,000x speed of BLAST.
- Frameshift alignments for long read analysis.
- Low resource requirements and suitable for running on standard desktops or laptops.
- Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.
License
Free to use and open source under GNU AGPLv3.
Available
- Puhti: 2.0.15, 2.1.6, 2.1.10
Usage
To use Diamond, run first the command:
or:
To load a specific version, e.g:
After that, you can check the Diamond help with the command:
CSC provides Diamond indexes for SwissProt (swiss), Uniprot (uniprot) and NCBI non-redundant databases (nr). Location of these databases is defined with the environment variable $DIAMONDDB
. For example, searching hits for a set of nucleotide sequences from the SwissProt database could be done with the command:
diamond blastx --query nuc.fasta -d $DIAMONDDB/swiss --out diamond_results.txt -p 4 --max-target-seqs 500
You can also do searches against your own protein sequence database. In this case, you must first calculate Diamond indexes for your reference protein set with command diamond makedb
. For example:
The command above creates a Diamond index file (my_ref.dmnd
) that can be used as the query database: