Skip to content

Parallel Command Execution

Virtual Cluster Administrators may take advantage of ClusterShell to streamline the parallel execution of tasks. ClusterShell is pre-configured to integrate with the workload management system of the cluster through the use of Slurm Group Bindings which provides node selection logic based on partitions, node states, jobs, and user associations.

The configured SLURM groups are:

  • partitions: @sp or @slurmpart
  • state: @st or @slurmstate
  • user: @su or @slurmuser
  • job: @sj or @slurmjob

Node and Group Name Operations#

The cluset (or nodeset) command may be used for counting nodes, listing, expanding and folding (compacting) node names and groups, and regrouping nodes. cluset provides a consistent logic for manipulating node names and is ideal for shell scripts or command pipelines when those node names are produced as outputs (e.g. sinfo -h) or consumed as inputs (e.g. a bash "for" loop). Arithmetic operations such as union, intersection, and boolean xor logic are described in the package documentation, along with examples of slicing, splitting, and random selection of nodes. Extended patterns and wildcard matching are also supported. The following somewhat trivial examples are intended to demonstrate how this tool may be of advantage during interactive troubleshooting or preliminary investigations:

  1. Count the number of nodes in state "drng"

    cluset -c @st:drng
    
  2. Generate a compact listing of nodes in use by adia

    cluset -f @su:adia
    
  3. Expand a list of all nodes in the "compute-gpu-1" partition

    cluset -e @sp:compute-gpu-1
    
  4. List nodes allocated to a specific job

    cluset -e @sj:123123
    

cluset Use Case#

During troubleshooting a workflow, it is typical to use the Job Server command line tool to summarize the disposition of jobs and subjobs. It can be useful to run a query to list jobs that were stopped, e.g.:

# list jobs and parse out the job host(s)
jsc list --status STOPPED --json |
    jq -r '.jobHost | split(".") | .[0] | select( . != null)' \
    >/tmp/stopped-hosts

More than a single job can fail on a given host, and in those cases, cluset --fold may be used to convert the list with duplicated host entries into a set, in shorthand notation, where each host is only represented once. Expanding this back out into a list separated by newlines allows the list to be used for other purposes, such as parsing out the syslog data for each host, which may be useful in performing a more structured analysis.

In this example, it is possible to use this pipeline to determine which hosts experienced preemption, and how many times each host was preempted...

</tmp/stopped-hosts \
    cluset -f |
    cluset -e -S '\n'|
    xargs \
    --max-args=1 \
    --max-procs=6 \
    --replace=% \
    awk '$4 == "research-%" { print >> "/tmp/%.messages" }' /var/log/messages

# list the nodes that were preempted
sudo grep -l "Power key pressed"  /tmp/*.messages

# see how often this seems to have happened over the period of time sampled in the logs
sudo grep -c "Power key pressed" /tmp/*.messages

Parallel Command Execution and Output Logging#

The clush tool provides an environment for running commands in parallel and optionally gathering output after all commands have completed. Output gathering (using the -b option) implicitly requires that the commands to be run are non-interactive in nature. Nodes or groups are selected using the -w option and providing a list of nodes to target using the group syntax described for the cluset command.

There are two primary workers discussed here, as they have the most relevance to the standard Virtual Cluster deployment: exec and ssh (remote, default). To explicitly set the worker, use the --worker or -R option. The exec worker is useful in batch processing, system maintenance tasks, and for local resource monitoring, particularly when the tasks can be distributed over multiple processors or cores on the local machine. This cuts down on the overall network communication required to accomplish these operations. The ssh worker is suited for independent tasks that need to run on certain hosts, such as updating a configuration, performing system updates, and gathering system logs or metrics. Those interested in rsh or pdsh workers are referred to the official package documentation.

clush use cases#

Performing installation and maintenance tasks on running nodes, particularly in the case of a static infrastructure (i.e. no dynamic autoscaling) is where clush really shines. Authorized users may perform tasks that require superuser privileges by specifying -m sudo mode option.

The installation of Quantum Espresson on such a cluster might look something like the following:

# download QE as admin, in a shared home directory
curl -L -O https://github.com/QEF/q-e_schrodinger/releases/download/v7.2-2024-1/qe-bin.tar.gz

# install QE on compute partition
# create the destination directory
clush -m sudo -bw @sp:compute mkdir -p \
    /opt/schrodinger/suite/installations/2024-1/qe-bin
# extract installation files
clush -m sudo -bw @sp:compute tar \
    --directory=/opt/schrodinger/suite/installations/2024-1/qe-bin \
    --no-same-owner --no-same-permissions --extract \
    --file /home/admin/qe-bin.tar.gz

In another scenario, users have requested that a large vendor library be made available on the cluster. Due to the considerable amount of data it makes sense to put this on its own volume. A disk containing the data is attached to the frontend node, and this is exported to the rest of the cluster:

# attach the disk to the frontend
mount -o discard,defaults /dev/sde /opt/vendorlibrary

# add the following to /etc/exports
/opt/vendorlibrary *(rw,no_root_squash)

# export the filesystem
exportfs -a -r

The static filesystem information file may be updated on compute template(s). Current cluster users will want access to this data. Draining the partition(s) in order to use the updated template may not be practical. In this scenario, clush allows an administrator to cleanly create the desired resources without draining the partition:

# create the mountpoint on the desired partition
clush -m sudo -bw @sp:driver mkdir -p /opt/vendorlibrary

# mount the shared volume
clush -m sudo -bw @sp:driver mount -t nfs frontend:/opt/vendorlibrary /opt/vendorlibrary