QIIME (Quantitative Insights Into Microbial Ecology) is a package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data). QIIME takes users from their raw sequencing output through initial analyses such as OTU picking, taxonomic assignment, and construction of phylogenetic trees from representative sequences of OTUs, and through downstream statistical analysis, visualization, and production of publication-quality graphics.
On 2017 a totally rewritten version of Qiime: Qiime2 was released. The development of the original Qiime version has stopped. At the moment only Qiime2 is available in Puhti.
- Puhti:qiime2-2020.6, qiime2-2020.2, qiime2-2019.10
In Puhti, QIIME2 can be taken in use as a bioconda environment:
module load bioconda conda env list source activate qiime2-2020.6 source tab-qiime
After that you can start Qiime2 with command:
Please check Qiime2 home page for more instructions.
Note that many Qiime tasks involve heavy computing. Thus, these tasks should be executed as batch jobs. Qiime needs to have access to a local node specific file system for handling temporary data. This kind of directory is available on the NVME nodes of Puhti. Therefore, you must include a request for NVME space in your batch job file.
The easiest way to start using Qiime is to use command
sinteractive to launch an interactive batch job:
In the interactive session, you can set up Qiime with commands:
csc-workspaces cd /scratch/<project> export PROJAPPL=/projappl/<project> module load bioconda conda env list source activate qiime2-2020.2
Interactive batch jobs include local temporary disk that is mandatory for running Qiime.
In case of normal batch jobs, you must reserve NVME disk area that will be used as $TMPDIR area.
For example, to reserve 100 GB of local disk space:
In addition you must define that the NVME space (LOCAL_SCRATCH) is used as temporary storage area (TMPDIR).
For example, the batch job script below runs the denoising step of the QIIME moving pictures tutorial as a batch job using eight cores.
#!/bin/bash #SBATCH --job-name=qiime_denoise #SBATCH --account=<project> #SBATCH --time=01:00:00 #SBATCH --ntasks=1 #SBATCH --nodes=1 #SBATCH --output==qiime_out_8 #SBATCH --error=qiime_err_8 #SBATCH --cpus-per-task=8 #SBATCH --mem=16G #SBATCH --partition=small #SBATCH --gres=nvme:100 #set up qiime export PROJAPPL=/projappl/<project> module load bioconda source activate qiime2-2019.10 export TMPDIR="$LOCAL_SCRATCH" # run task. Don't use srun in submission as it resets TMPDIR qiime dada2 denoise-single \ --i-demultiplexed-seqs demux.qza \ --p-trim-left 0 \ --p-trunc-len 120 \ --o-representative-sequences rep-seqs-dada2.qza \ --o-table table-dada2.qza \ --o-denoising-stats stats-dada2.qza \ --p-n-threads $SLURM_CPUS_PER_TASK
In the example above
<project> must be replaced with your project name. You can use
csc-workspaces to check your Puhti projects.
Maximum running time is set to 1 hour (
--time=01:00:00). As QIIME2 uses threads based parallelization,
the job is requested to use one task (
--ntasks=1) where all cores need to be in the same node (
This one task will use eight cores as parallel threads
can use in total up to 16 GB of memory (
--mem=16G). Note that the number of cores to be used needs to be defined in
actual qiime command, too. That is done with Megahit option
--p-n-threads. In this case we use
variable that contains the cpus-pre-task value ( we could as well use
--p-n-threads 8 but then we have to remember
to change the value if the number of reserved CPUs is changed).
The job is submitted to the to the batch job system with
sbatch command. For example, if the batch job
file is named as qiime_job.sh then the submission command is:
More information about running batch jobs can be found from the batch job section of the Puhti user guide.