Skip to content

Seqtk

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.

License

Free to use and open source under MIT License.

Available

  • Puhti: 1.3-r106, 1.4

Usage

Seqtk is included in the biokit module:

module load biokit

Alternatively, Seqtk can be loaded as an independent module:

module load seqtk/<version>

seqtk command syntax is:

seqtk <command> <arguments>

The available Seqtk commands are:

Command Function
seq common transformation of FASTA/Q
comp get the nucleotide composition of FASTA/Q
sample sub-sample sequences
subseq extract subsequences from FASTA/Q
fqchk fastq QC (base/quality summary)
mergepe interleave two PE FASTA/Q files
trimfq trim FASTQ using the Phred algorithm
hety regional heterozygosity
gc identify high- or low-GC regions
mutfa point mutate FASTA at specified positions
mergefa merge two FASTA/Q files
famask apply an X-coded FASTA to a source FASTA
dropse drop unpaired from interleaved PE FASTA/Q
rename rename sequence names
randbase choose a random base from hets
cutN cut sequence at long N
listhet extract the position of each het

Examples

Convert FASTQ to FASTA:

seqtk seq -a in.fq.gz > out.fa

Extract sequences with names in file name.lst, one sequence name per line:

seqtk subseq in.fq name.lst > out.fq

Extract sequences in regions contained in file reg.bed:

seqtk subseq in.fa reg.bed > out.fa

More information