QIIME (Quantitative Insights Into Microbial Ecology) is a package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data). QIIME takes users from their raw sequencing output through initial analyses such as OTU picking, taxonomic assignment, and construction of phylogenetic trees from representative sequences of OTUs, and through downstream statistical analysis, visualization, and production of publication-quality graphics.
On 2017 a totally rewritten version of Qiime: Qiime2 was released. The development of the original Qiime version has stopped. At the moment only Qiime2 is available in Puhti.
Free to use and open source under BSD 3-Clause License.
- Puhti: qiime2-2022.2, qiime2-2021.2, qiime2-2020.8, qiime2-2020.6, qiime2-2020.2, qiime2-2019.10
In Puhti, QIIME2 can be taken in use as a bioconda environment:
export PROJAPPL=/projappl/<project> #replace <project> with your project name (typically project_some-number) module load bioconda conda env list source activate qiime2-2022.2 source tab-qiime
After that you can start Qiime2 with command:
Please check Qiime2 home page for more instructions.
Note that many Qiime tasks involve heavy computing. Thus, these tasks should be executed as batch jobs. Qiime needs to have access to a local node specific file system for handling temporary data. This kind of directory is available on the NVME nodes of Puhti. Therefore, you must include a request for NVME space in your batch job file.
The easiest way to start using Qiime is to use command
sinteractive to launch an interactive batch job:
csc-workspaces cd /scratch/<project> export PROJAPPL=/projappl/<project> module load bioconda conda env list source activate qiime2-2021.2
Interactive batch jobs include local temporary disk that is mandatory for running Qiime.
In case of normal batch jobs, you must reserve NVME disk area that will be used as $TMPDIR area.
For example, to reserve 100 GB of local disk space:
#!/bin/bash #SBATCH --job-name=qiime_denoise #SBATCH --account=<project> #SBATCH --time=01:00:00 #SBATCH --ntasks=1 #SBATCH --nodes=1 #SBATCH --output==qiime_out_8 #SBATCH --error=qiime_err_8 #SBATCH --cpus-per-task=8 #SBATCH --mem=16G #SBATCH --partition=small #SBATCH --gres=nvme:100 #set up qiime export PROJAPPL=/projappl/<project> module load bioconda source activate qiime2-2022.2 export TMPDIR="$LOCAL_SCRATCH" # run task. Don't use srun in submission as it resets TMPDIR qiime dada2 denoise-single \ --i-demultiplexed-seqs demux.qza \ --p-trim-left 0 \ --p-trunc-len 120 \ --o-representative-sequences rep-seqs-dada2.qza \ --o-table table-dada2.qza \ --o-denoising-stats stats-dada2.qza \ --p-n-threads $SLURM_CPUS_PER_TASK
In the example above
<project> must be replaced with your project name. You can use
csc-workspaces to check your Puhti projects.
Maximum running time is set to 1 hour (
--time=01:00:00). As QIIME2 uses threads based parallelization,
the job is requested to use one task (
--ntasks=1) where all cores need to be in the same node (
This one task will use eight cores as parallel threads
can use in total up to 16 GB of memory (
--mem=16G). Note that the number of cores to be used needs to be defined in
actual qiime command, too. That is done with Qiime option
--p-n-threads. In this case we use
variable that contains the cpus-pre-task value ( we could as well use
--p-n-threads 8 but then we have to remember
to change the value if the number of reserved CPUs is changed).
The job is submitted to the to the batch job system with
sbatch command. For example, if the batch job
file is named as qiime_job.sh then the submission command is:
Last edited Mon May 9 2022