Minimap2

Minimap2 is a fast general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It can be used for:

mapping of accurate short reads (preferably longer than 100 bases)
mapping 1kb genomic reads at error rate 15% (e.g. PacBio or Oxford Nanopore genomic reads)
mapping full-length noisy Direct RNA or cDNA reads
mapping and comparing assembly contigs or closely related full chromosomes of hundreds of megabases in length

Minimap2

License

Free to use and open source under MIT License.

Available

Puhti: 2.24, 2.28
Chipster graphical user interface

Usage

On Puhti, Minimap2 can be used as part of the biokit module collection:

module load biokit

The biokit module sets up a set of commonly used bioinformatics tools, including Minimap2. Note however that there are other bioinformatics tools on Puhti that have separate setup commands. Once biokit module is loaded, Minimap2 starts with the command:

minimap2

Without any options, minimap2 takes a reference database and a query sequence file as input and produce approximate mapping, without base-level alignment (i.e. no CIGAR), in the PAF format:

minimap2 ref.fa query.fq > approx-mapping.paf

If you wish to get the output in SAM format, you can use option -a.

For different data types, Minimap2 needs to be tuned for optimal performance and accuracy. With option -x you can use case specific parameter sets, pre-defined and recommended by the Minimap2 developers.

Map long noisy genomic reads (map-pb and map-ont)

PacBio subreads (map-db):

minimap2 -ax map-pb ref.fa pacbio-reads.fq > aln.sam

Oxford Nanopore reads (map-ont):

minimap2 -ax map-ont ref.fa ont-reads.fq > aln.sam

Map long mRNA/cDNA reads (splice)

PacBio Iso-seq/traditional cDNA

minimap2 -ax splice -uf ref.fa iso-seq.fq > aln.sam

Nanopore 2D cDNA-seq

minimap2 -ax splice ref.fa nanopore-cdna.fa > aln.sam

Nanopore Direct RNA-seq

minimap2 -ax splice -uf -k14 ref.fa direct-rna.fq > aln.sam

mapping against SIRV control

minimap2 -ax splice --splice-flank=no SIRV.fa SIRV-seq.fa

Find overlaps between long reads (ava-pb and aca-ont)

PacBio read overlap

minimap2 -x ava-pb reads.fq reads.fq > ovlp.paf

Oxford Nanopore read overlap

minimap2 -x ava-ont reads.fq reads.fq > ovlp.paf

Map short accurate genomic reads (sr)

Note, Minimap2 does not work well with short spliced reads.

single-end alignment

minimap2 -ax sr ref.fa reads-se.fq > aln.sam

paired-end alignment

minimap2 -ax sr ref.fa read1.fq read2.fq > aln.sam

paired-end alignment

minimap2 -ax sr ref.fa reads-interleaved.fq > aln.sam

Full genome/assembly alignment (asm5)

assembly to assembly

minimap2 -ax asm5 ref.fa asm.fa > aln.sam

Example batch script for Puhti

On Puhti, Minimap2 jobs should be run as batch jobs. Below is a sample batch job file for running a Minimap2 paired-end alignment on Puhti.

#!/bin/bash -l
#SBATCH --job-name=minimap2
#SBATCH --output=output_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=04:00:00
#SBATCH --partition=small
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --account=<project>
#SBATCH --mem=16000

module load biokit
minimap2 -t $SLURM_CPUS_PER_TASK -ax splice -uf ref.fa iso-seq.fq > aln.sam

In the batch job example above, one task (--ntasks=1) is executed. The Minimap2 job uses 8 cores (--cpus-per-task=8) with a total of 16 GB of memory (--mem=16000). The maximum duration of the job is four hours (--time=04:00:00). All the cores are assigned from one computing node (--nodes=1). In addition to the resource reservations, you have to define the billing project for your batch job. This is done by replacing the <project> with the name of your project. You can use command csc-projects to see what projects you have on Puhti.

You can submit the batch job file to the batch job system with the command:

sbatch batch_job_file.bash

See the Puhti user guide for more information about running batch jobs.

Support

CSC Service Desk

More information

More information about Minimap2 can be found from the Minimap2 home page.