Lecture 11: Odyssey!!!

Date: 10/05/2017, Thursday

You are expected to finish 8+1 tiny tasks. They will help you get prepared for the final project!

Related resources:

Task 1: Command line on your laptop

Preparation

Read Session 4 note, especially Ryans’s tutorial if you didn’t come to Monday’s session.

After reading Chapter 1 to Chapter 5, you should at least know the following Linux commands

  • ls
  • pwd
  • mkdir
  • cd
  • mv
  • rm and rm -rf
  • cp and cp -r

If you choose vi/vim as your text editor, read Chapter 6. Then you should at least know the following vim commands

  • i
  • esc
  • :wq
  • :q!

Find your own tutorial if you choose other text editors.

Writing code in terminal

Task: Use vim or other command line text editer to create a matlab file hello.m with the content “disp(‘hello world!’)”

We use vim as an example.

First, create a text file by

vim hello.m

(If hello.m already exists, then it will just open that file)

Inside vim, type i to enter the Insert Mode.

Then type the code as usual. For example

disp('hello world!')

After writting the content, type esc to go back to Command Mode.

Finally, type :wq to save and quit vim.

Again, read Chapter 6 for more vim usages!

Tips: You can check the content of hello.m by a graphic editer. On Mac, you can use open ./ to open the graphic finder, and then open hello.m that you’ve just created. On Odyssey (See Task 2), there’s no graphic editor, so you will also use vim to check the file content.

Running MATLAB interactively in terminal

Windows users can jump to Task 2 because I am not sure if the following stuff would work.

Find the MATLAB executable path on your laptop. On Mac it should be something like

/Applications/MATLAB_R2017a.app/bin/matlab

Running the above command will open the traditional graphic version of MATLAB.

To only use the command line, add 3 options:

/Applications/MATLAB_R2017a.app/bin/matlab -nojvm -nosplash -nodesktop

Play with this command line version of MATLAB for a while. Type exit to quit.

Set shortcut

If you are tired with typing this long command, you can set

alias matlab='/Applications/MATLAB_R2017a.app/bin/matlab'

Then you can simply type matlab to launch the program. However, this shortcut will go away if you close the terminal. To make it a permanent configuration, add the above command to a system file called ~/.bash_profile. You can edit it by vim for example:

vim ~/.bash_profile

Running MATLAB scripts in terminal

cd to the directory where you saved the hello.m file. You can execute it by

matlab  -nojvm -nosplash -nodesktop
hello

Or you can use ‘-r’ to combine two commands together

matlab  -nojvm -nosplash -nodesktop -r hello

If you didn’t set shortcut, the full command would be

/Applications/MATLAB_R2017a.app/bin/matlab -nojvm -nosplash -nodesktop -r hello

(I actually prefer this command line version to the complicated graphic version!)

Task 2: Command line on Odyssey

Login

Login to Odyssey by

ssh am111uXXXX@login.rc.fas.harvard.edu

Check Odyssey website if you have any trouble.

Tips: You can open multiple terminals and login to Odyssey, if one is not enough for you.

Basic navigation

Repeat the basic Linux commands, but this time on Odyssey, not on your laptop.

You should see Mac and Linux (Odyssey) commands are almost identical.

File transfer

Use scp

You can transfer files by the built-in scp (security-copy) command. Make sure you are running this command on your laptop, not on odyssey.

From you laptop to Odyssey (first figure out your Odyssey home directory path by pwd)

scp local_file_path username@login.rc.fas.harvard.edu:/path_shown_by_pwd_on_Odyssey

Try to transfer *hello.m* that you wrote in Task 1 to Odyssey! You will be asked to enter your password again.

From to Odyssey to your laptop is just reversing the arguments

scp username@login.rc.fas.harvard.edu:/file_path_on_odyssey local_file_path

Use scp -r for transfering directory (similar to cp -r)

Use other tools

Use Filezilla if you need to transfer a lot of file!

Task 3: MATLAB on Odyssey

Load MATLAB

Load MATLAB by

module load matlab

(If you get an error, run source new-modules.sh and try again.)

It loads the lastest version by default. You can check the version by which

[username]$ which matlab
alias matlab='matlab -singleCompThread'
/n/sw/matlab-R2017a/bin/matlab

Or you can load a specific version

module load matlab/R2017a-fasrc01

Use this RC portal to find avaiable software and the corresponding loading command. Search for MATLAB. How many different verions do you see?

Run MATLAB

After loading MATLAB, you can run it by: (same as on your laptop)

matlab -nojvm -nosplash -nodesktop

The 3 options are crucial because there’s no graphical user interface on Odyssey.

Play with it, and type exit to quit.

Run hello.m by matlab -nojvm -nosplash -nodesktop -r hello.

Task 4: Interactive Job on Odyssey

After logging into Odyssey, you are on a home node with very few computational resources. For any serious computing work you need to switch to a compute node. The easiest way is to do this interactively (more about interative mode):

srun -t 0-0:30 -c 4 -N 1 --pty -p interact /bin/bash

Here we request 30 minutes of computing time (-t 0-0:30) on 4 CPUs (-c 4), on a single computer (-N 1), using interactive mode (--pty and /bin/bash).

Warning: Don’t request too many CPUs! This will make you wait for much longer.

-p interact only means you are requesting CPUs on the interactive partition, but doesn’t mean that you want it to run interactively. The following command starts interactive mode on the general partition (more about partition).

srun -t 0-0:30 -c 4 -N 1 --pty -p general /bin/bash

Then repeat what you’ve done in Task 3.

Task 5: Batch Job on Odyssey

If your job runs for hours or even days, you can submit it as a batch job, so you don’t need to keep your terminal open all the time. You are allowed to log out and go away while the job is runnning.

Create a file called runscript.sh with the following content. (you can use vim to create such a text file)

#!/bin/bash
#SBATCH -J Matlabjob1
#SBATCH -p general
#SBATCH -c 1 # single CPU
#SBATCH -t 00:05:00
#SBATCH --mem=400M # memory
#SBATCH -o %j.o # output filename
#SBATCH -e %j.e # error filename

## LOAD SOFTWARE ENV ##
source new-modules.sh
module purge
module load matlab/R2017a-fasrc01

## EXECUTE CODE ##
matlab -nojvm -nodisplay -nosplash -r hello

It just puts the options you’ve used in Task 4 into a text file.

Make sure runscript.sh is at the same directory as hello.m, then execute

sbatch runscript.sh

Use sacct to check job status. You should get some output files once it is finished. (more about submitting and monitoring jobs)

Tips: always test your code in interactive mode before submitting a batch job!

Task 6: Use MATLAB-parallel on your laptop

Make sure you’ve installed the parallel toolbox. To start the command line version, remove the -nojvm option when using parallel mode. (The original graphic version works as usual)

matlab -nosplash -nodesktop

Initialize parallel mode by

In [1]:
parpool('local', 2)
Starting parallel pool (parpool) using the 'local' profile ...
connected to 2 workers.

ans =

 Pool with properties:

            Connected: true
           NumWorkers: 2
              Cluster: local
        AttachedFiles: {}
          IdleTimeout: 30 minutes (30 minutes remaining)
          SpmdEnabled: true

Then run this script for several times to make sure you get speed-up by using parallel for-loop (parfor)

In [4]:
n = 1e9;

X = 0;
tic
for i = 1:n
    X = X + 1;
end
T = toc;
fprintf('serial time: %f; result: %d \n',T,X)

X = 0;
tic
parfor i = 1:n
    X = X + 1;
end
T = toc;
fprintf('parallel time: %f; result: %d \n',T,X)
serial time: 2.724932; result: 1000000000
parallel time: 1.748450; result: 1000000000

Tips: For command line version of MATLAB, save the code as parallel_timing.m, and then execute parallel_timing inside MATLAB.

Finally, quit the parallel mode

In [5]:
delete(gcp)

Task 7: Use MATLAB-parallel on Odyssey interactive mode

Repeat what you’ve done in Task 6, but on Odyssey. This might not be as straightforward as you expected!

You need to request enough memory for the parallel tool box

srun -t 0-0:30 -c 4 -N 1 --mem-per-cpu 4000 --pty -p interact /bin/bash

Environment variable SLURM_CPUS_PER_TASK tells you how many CPUs are available

echo $SLURM_CPUS_PER_TASK
4

For parallel support, you need to call matlab-default instead of matlab to launch the program, as described here.

module load matlab
matlab-default -nosplash -nodesktop

Inside MATLAB, you can again check the number of CPUs by

getenv('SLURM_CPUS_PER_TASK')
ans = '4'

Initialize parallel mode by (this is a general code for any number of CPUs)

parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK')) )

The initialization might take severals minutes on Odyssey. Eventually you should see something like

ans =

 Pool with properties:

            Connected: true
           NumWorkers: 4
              Cluster: local
        AttachedFiles: {}
          IdleTimeout: 30 minutes (30 minutes remaining)
          SpmdEnabled: true

Then, execute the parallel_timing.m script in Task 6. You should see a speed-up like that

>> parallel_timing
serial time: 12.228084; result: 1000000000
parallel time: 2.667366; result: 1000000000

Task 8: MATLAB-parallel as batch Job

Sightly modify the script parallel_timing.m in Task 6. Call it parallel_timing_batch.m this time.

parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK')))

n = 1e9;

X = 0;
tic
for i = 1:n
    X = X + 1;
end
T = toc;
fprintf('serial time: %f; result: %d \n',T,X)

X = 0;
tic
parfor i = 1:n
    X = X + 1;
end
T = toc;
fprintf('parallel time: %f; result: %d \n',T,X)

X = 0;
tic
parfor i = 1:n
    X = X + 1;
end
T = toc;
fprintf('parallel time: %f; result: %d \n',T,X)

delete(gcp)

Then, change the runscript.sh in Task 5 correspondingly

#!/bin/bash
#SBATCH -J timing
#SBATCH -o timing.out
#SBATCH -e timing.err
#SBATCH -N 1
#SBATCH -c 4
#SBATCH -t 0-00:20
#SBATCH -p general
#SBATCH --mem-per-cpu 8000

source new-modules.sh
module load matlab
srun -n 1 -c 4 matlab-default -nosplash -nodesktop -r parallel_timing_batch

Submit this job. It will take many minutes to finish. Do you get expected speed-up?

In timing.out, you should see something like

ans =

 Pool with properties:

            Connected: true
           NumWorkers: 4
              Cluster: local
        AttachedFiles: {}
          IdleTimeout: 30 minutes (30 minutes remaining)
          SpmdEnabled: true

serial time: 7.635188; result: 1000000000
parallel time: 5.901599; result: 1000000000
parallel time: 3.516169; result: 1000000000
Parallel pool using the 'local' profile is shutting down.

Explain why the second parfor is faster then the first parfor

Tips: Using batch job for this kind of small computation is definitely an overkill, as queuing and initializing would take much longer than actual compuation. You will probably use the interactive mode much more often in this class.

Bonus task: make your terminal prettier

Open ~/.bash_profile (for example vim ~/.bash_profile), add the following lines

For Mac

export CLICOLOR=1
export LSCOLORS=ExFxBxDxCxegedabagacad

For Linux (Odyssey)

alias ls="ls --color=auto"

Type source ~/.bash_profile or relaunch the terminal. Notice any difference?