Running MaxQuant software on Puhti supercomputer
MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. More information about the software can be found here. High-performance computing environment like Puhti is a suitable place for running compute-intensive jobs using MaxQuant software in proteomics research.
MaxQuant is free to use, but each user needs to register and download MaxQuant from the developer site themselves.
This tutorial provides instructions for running MaxQuant software on Puhti.
Configure parameter file
Even if you are going to run the MaxQuant pipeline on Puhti,
you first have to configure different parameters of your MaxQuant
job on your local Windows machine. And then upload parameter file
mqpar.xml), raw data samples (i.e, .raw files) and sequence
file (i.e., .fasta file) to Puhti computing environment.
Edit XML configuration file
You have to make some modifications in parameter file (
mqpar.xml), which was for example created on a local windows machine, to comply with HPC environment.
These modifications include changes in :
- Windows paths into linux paths for sample files ( tip: search for
<filePaths>in XML file)
- Windows path into linux path for fasta sequence file (tip: search for
<fastaFilePath>in XML file)
- In the number of threads according to number of samples (tip: search for
<numThreads>in XML file)
Submit as a batch job to Puhti cluster
First login to Puhti computer (see instructions here)
Change to your project directory on Puhti and copy your input files there (tips on how to transfer files).
This is your project directory (on scratch) where your .xml files, .fasta file, and raw data files are located
- Learn how to enable MaxQuant environment
MaxQuant software actually also needs mono software to be able to run. With mono software, you can choose your version of MaxQuant. CSC provides a module for mono.
module load mono
Download your linux-compatible version of MaxQuant (e.g., v18.104.22.168) to your scratch directory on Puhti and run the following to verify that MaxQuant is installed properly:
mono MaxQuant\ 22.214.171.124/bin/MaxQuantCmd.exe --help
Note that the directory name contains a space, so you need to either escape it using backslash () or enclose the path in quotes. For ease of use, you may wish to rename the directory so it has e.g underscore instead of space.
Please note that the MaxQuant version you used to create .xml parameter configuration file must match with the version you use on linux environment to smoothly run it on a cluster environment. Other latest versions may work.
- Finally submit your script
Create a batch script according to the instructions for shared memory jobs
and make sure the script ends up in the same directory as your
file and other data files are located.
Just to facilitate writing your batch scripting process, you may use the following
minimal example script (calles say, e.g.,
maxquant.sh), to start with:
#!/bin/bash #SBATCH --job-name=maxquant #SBATCH --output=output_%j.txt #SBATCH --error=errors_%j.txt #SBATCH --account=project_xxx #SBATCH --time=01:20:00 #SBATCH --ntasks=1 #SBATCH --partition=small #SBATCH --cpus-per-task=6 #SBATCH --mem=16000 # load maxquant environment module load mono # adjust file paths here mono /path_of_MaxQuant/bin/MaxQuantCmd.exe /path/MaxQuant/mqpar.xml
and then modify resource allocations depending on the number samples. Submit your script as below:
maxquant job is finished, your output files will be in this same directory.
You can download example tutorial data for running MaxQuant as below:
and then untar the downloaded archive file as below:
tar -xavf MaxQuant_tutorial.tar.gz
The tutorial has example raw files and other necessary files to run MaxQuant for testing.
Look at the used resources once your job is finished
maxquant job is finished, you can check the utilization of computing resources
like memory and CPU usage efficiency.
This will help you tune with better parameters for efficient usage of computing resources.
You can use the following commands using job id:
seff <jobid> sacct –l –j <jobid> sacct -o jobid,jobname,maxrss,maxvmsize,state,elapsed -j <jobid>