Skip to content

Docs CSC now features an automatic Finnish translation. Click here for more information.

Warning!

Puhti and Mahti will be decommissioned after Roihu becomes available. Users should clean up unnecessary files and move any required data by the end of August 2026. See the Roihu data preparation instructions for details.

Puhti scratch is very full: keep only active data there and move or delete everything else. No new Puhti scratch quota will be granted.

Whisper

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Available

Faster-Whisper-XXL r245.4 is available in Puhti.

License

Faster-Whisper-XXL is licenced using MIT licence.

Usage

CSC users can easily install Whisper in their own Python virtual environments in Puhti and Mahti. In addition, Puhti has a pre-installed Faster-Whisper-XXL version of Whisper. This Whisper environment can be activated in Puhti with command:

module load whisper

Sample command:

whisper audio.mp3 --model medium 

Sample command with diarization enabled:

whisper interview.mp4 --model large --language French --threads 4 --diarize pyannote_v3.0 --diarize_threads 4 --num_speakers 2 -o interview_results

Example batch script

Whisper can use utilize GPU computing effectively. The example batch script below reserves one GPU for a Whisper job.

#!/bin/bash
#SBATCH --account=<project>
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=8G
#SBATCH --time=1:00:00
#SBATCH --gres=gpu:v100:1

module load whisper
srun whisper interview.mp4 --model large --language French --threads 4 --diarize pyannote_v3.0 --diarize_threads 4 --num_speakers 2 -o interview_results

More information