Parallel Command Execution
Virtual Cluster Administrators may take advantage of ClusterShell to streamline the parallel execution of tasks. ClusterShell is pre-configured to integrate with the workload management system of the cluster through the use of Slurm Group Bindings which provides node selection logic based on partitions, node states, jobs, and user associations.
The configured SLURM groups are:
- partitions:
@sp
or@slurmpart
- state:
@st
or@slurmstate
- user:
@su
or@slurmuser
- job:
@sj
or@slurmjob
Node and Group Name Operations#
The cluset
(or nodeset
) command may be used for counting nodes, listing,
expanding and folding (compacting) node names and groups, and regrouping nodes.
cluset
provides a consistent logic for manipulating node names and is ideal
for shell scripts or command pipelines when those node names are produced as
outputs (e.g. sinfo -h
) or consumed as inputs (e.g. a bash
"for" loop).
Arithmetic operations such as union, intersection, and boolean xor logic are
described in the package documentation, along with examples of slicing, splitting,
and random selection of nodes. Extended patterns and wildcard matching are also
supported. The following somewhat trivial examples are intended to demonstrate
how this tool may be of advantage during interactive troubleshooting or
preliminary investigations:
-
Count the number of nodes in state "drng"
cluset -c @st:drng
-
Generate a compact listing of nodes in use by
adia
cluset -f @su:adia
-
Expand a list of all nodes in the "compute-gpu-1" partition
cluset -e @sp:compute-gpu-1
-
List nodes allocated to a specific job
cluset -e @sj:123123
cluset
Use Case#
During troubleshooting a workflow, it is typical to use the Job Server command line tool to summarize the disposition of jobs and subjobs. It can be useful to run a query to list jobs that were stopped, e.g.:
# list jobs and parse out the job host(s)
jsc list --status STOPPED --json |
jq -r '.jobHost | split(".") | .[0] | select( . != null)' \
>/tmp/stopped-hosts
More than a single job can fail on a given host, and in those cases,
cluset --fold
may be used to convert the list with duplicated host
entries into a set, in shorthand notation, where each host is only represented once.
Expanding this back out into a list separated by newlines allows the list
to be used for other purposes, such as parsing out the syslog data for each
host, which may be useful in performing a more structured analysis.
In this example, it is possible to use this pipeline to determine which hosts experienced preemption, and how many times each host was preempted...
</tmp/stopped-hosts \
cluset -f |
cluset -e -S '\n'|
xargs \
--max-args=1 \
--max-procs=6 \
--replace=% \
awk '$4 == "research-%" { print >> "/tmp/%.messages" }' /var/log/messages
# list the nodes that were preempted
sudo grep -l "Power key pressed" /tmp/*.messages
# see how often this seems to have happened over the period of time sampled in the logs
sudo grep -c "Power key pressed" /tmp/*.messages
Parallel Command Execution and Output Logging#
The clush
tool provides an environment for running commands in parallel and
optionally gathering output after all commands have completed. Output gathering
(using the -b
option) implicitly requires that the commands to be run are
non-interactive in nature. Nodes or groups are selected using the -w
option
and providing a list of nodes to target using the
group syntax described for the
cluset
command.
There are two primary workers discussed here, as they have the most relevance
to the standard Virtual Cluster deployment: exec
and ssh
(remote, default).
To explicitly set the worker, use the --worker
or -R
option.
The exec
worker is useful in batch processing, system maintenance tasks, and
for local resource monitoring, particularly when the tasks can be distributed over
multiple processors or cores on the local machine. This cuts down on the overall
network communication required to accomplish these operations. The ssh
worker
is suited for independent tasks that need to run on certain hosts, such as
updating a configuration, performing system updates, and gathering system logs or metrics.
Those interested in rsh
or pdsh
workers are referred to the official package documentation.
clush
use cases#
Performing installation and maintenance tasks on running nodes,
particularly in the case of a static infrastructure (i.e. no dynamic
autoscaling) is where clush
really shines. Authorized users may
perform tasks that require superuser privileges by specifying -m sudo
mode option.
The installation of Quantum Espresson on such a cluster might look something like the following:
# download QE as admin, in a shared home directory
curl -L -O https://github.com/QEF/q-e_schrodinger/releases/download/v7.2-2024-1/qe-bin.tar.gz
# install QE on compute partition
# create the destination directory
clush -m sudo -bw @sp:compute mkdir -p \
/opt/schrodinger/suite/installations/2024-1/qe-bin
# extract installation files
clush -m sudo -bw @sp:compute tar \
--directory=/opt/schrodinger/suite/installations/2024-1/qe-bin \
--no-same-owner --no-same-permissions --extract \
--file /home/admin/qe-bin.tar.gz
In another scenario, users have requested that a large vendor library be made available on the cluster. Due to the considerable amount of data it makes sense to put this on its own volume. A disk containing the data is attached to the frontend node, and this is exported to the rest of the cluster:
# attach the disk to the frontend
mount -o discard,defaults /dev/sde /opt/vendorlibrary
# add the following to /etc/exports
/opt/vendorlibrary *(rw,no_root_squash)
# export the filesystem
exportfs -a -r
The static filesystem information file may be updated on compute template(s).
Current cluster users will want access to this data.
Draining the partition(s) in order to use the updated template may not be practical.
In this scenario, clush
allows an administrator to cleanly create the desired
resources without draining the partition:
# create the mountpoint on the desired partition
clush -m sudo -bw @sp:driver mkdir -p /opt/vendorlibrary
# mount the shared volume
clush -m sudo -bw @sp:driver mount -t nfs frontend:/opt/vendorlibrary /opt/vendorlibrary