Lab #4: Maximum Likelihood

Likelihood analysis using RAxML

Several programs implement the maximum likelihood method for molecular sequences, including programs like MolPHY, PAUP and PHYLIP, which are accurate, but usually limited to single gene datasets from up to a few dozens of taxa, and newer programs including FastTree, IQ-Tree, PhyML, and RAxML, which are much faster but use different heuristics and approximations. We will use RAxML-NG for this lab, a “faster, easier-to-use and more flexible” version of the popular RAxML program.

An important additional goal of this lab is to lean how to interact properly with the HPC-class cluster both in an interactive mode and using the Slurm Workload manager.

Using HPC-class in an interactive mode

When we log in to Nova, we interact with the head module. This is ok if we use simple UNIX commands, but would slow down the system dramatically (and bring down the wrath of other users), if we perform long computational tasks. Instead, we’ll use the salloc command to request access to compute nodes on the cluster. For this exercise use the salloc -p class-long -N 1 -n 2 -t 2:00:00 -A s2023.eeob.563.1 command to request 2 cores on 1 node for 2 hours.

Preliminaries

  1. raxml-ng is already installed in the shared class directory (how do you check the location of a program?);
  2. I also installed the nw-display tool that allows you to view phylogenetic trees on HPC-class;
  3. The data for this lab is in the course repository. Make sure to update it with git pull

RAxML-NG tutorial

Complete RAxML-NG tutorial by answering the questions in the “Now it’s your turn” section.

Using Slurm Workload manager

So far, we used the Slurm Workload Manager in an interactive mode. However, for longer jobs, the batch mode is preferred. In this case a job script should be created and submitted into queue by issuing: sbatch <job_script_file>.

Slurm Job Script Generator

The easiest way to create an Slurm job script is to use the Slurm job script generator. Choose the class option in the “Compute node type” as well as the number of compute nodes, number of processor cores per node, maximum time the job may run. Read specifications for the class partition while choosing these numbers. After your are done with the settings, copy the job script from the gray area and paste it in a local file.
Add commands for loading modules and running the programs at the bottom of the script.

Submitting your job

To submit the PBS script in the file myjob use the sbatch <job_script_file> command.
You may submit several jobs in succession if they use different output files. Jobs will be scheduled for queues* based on the resources requested. There are limits on each queue regarding the maximum number of simultaneous jobs and maximum number of processors that may be used by one user or class.

*In Slurm queues are called partitions. Only partitions for accelerator nodes need to be specified when submitting jobs. Otherwise Slurm will submit job into a partition based on the number of nodes and time requested.

  • To see the list of available partitions, issue: sinfo
  • For more details on partitions limits, issue: scontrol show partitions
  • To see the job queue, issue: squeue
  • To cancel job <job_id>, issue: scancel <job_id>


Content created by ISU-MolPhyl faculty at Iowa State University.
Hosted by GitHub Pages.
Jekyll theme based on Millidocs.
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 4.0 International License.