EMBOSS (European Molecular Biology Open Software Suite) package contains over 200 programs for sequence analysis. EMBOSS is designed for classical sequence analysis were the amount of seqences are less than 100 000. Because of that, most of the tools are not effective for raw NGS datasets where you have milloins sequences (reads). Examples of application areas of EMBOSS tools are given below.
- Sequence alignment
- Hidden Markow models
- Rapid database searching with sequence patterns
- Protein motif identification, including domain analysis
- EST analysis
- Nucleotide sequence pattern analysis, for example to identify CpG islands.
- Simple and species-specific repeat identification
- Codon usage analysis for small genomes
- Rapid identification of sequence patterns in large scale sequence sets.
- Presentation tools for publication
- RNA secondary structure prediction
Version on CSC's Servers
- Puhti: 6.5.7
- Chipster provides a graphical interface to many EMBOSSS tools.
To make EMBOSS programs available in Puhti super-cluster, give command:
module load biokit
The biokit module sets up a set of commonly used bioinformatics tools, including EMBOSS. Note however that there are bioinformatics tools in Puhti, that have a separate setup commands.
After loading biokit, you can start any of the EMBOSS programs by typing its name. For example:
The wossname command is a help tool that you can use to see what EMBOSS commands are available. You can also use it to search EMBOSS tools using key words.