Skip to content

3. Performance Testing

3.1 nccl-test (single node)

  • All nodes perform standalone nccl-test. The logs will be returned to the log interface of each node in the monitoring dashboard.
  • Replace your $manager_ip with the actual IP address.
shell
pdsh -l root -R ssh -w ^hosts.txt "/podsys/scripts/run_nccl_test_single.sh $manager_ip"

3.2 nvbandwidth (single node)

shell
pdsh -l root -R ssh -w ^hosts.txt "/podsys/scripts/run_nvbandwidth.sh $manager_ip"

3.3 nccl-test (Rack-level)

After completing the rack-level installation, configure passwordless SSH login between the root users of all nodes. Refer to 4.1.3 Configure passwordless SSH between root users on all nodes

  1. From the management node, SSH into one of the compute nodes, for example, node01.
  2. Prepare the hosts.txt file. You may use the hosts.txt generated on the management node.
  3. Configure imex using the hosts.txt.
shell
pdsh -l root -R ssh -w ^hosts.txt "systemctl restart nvidia-imex.service"
  1. SSH into node01.
  2. Run the following command on node01:
shell
sudo su
  • Export Path
shell
export LD_LIBRARY_PATH=/podsys/build/ompi418/lib:/usr/local/cuda/lib64:/podsys/build/nccl_2.28.9-1+cuda13.0_aarch64/lib
  • Run nccl-test (all_reduce 72 GPUs)
shell
/podsys/build/ompi418/bin/mpirun --allow-run-as-root -np 72 -N 4 \
-hostfile /podsys/hosts.txt \
-x NCCL_DEBUG=WARN \
-x NCCL_NVLS_ENABLE=1 \
-x NCCL_SHM_DISABLE=1 \
-x UCX_NET_DEVICES=enP5p9s0 \
-x LD_LIBRARY_PATH /podsys/build/nccl-tests/build/all_reduce_perf \
-b 8 -e 32G -f 2 -g 1
  • Run nccl-test (alltoall 72 GPUs)
shell
/podsys/build/ompi418/bin/mpirun --allow-run-as-root -np 72 -N 4 \
-hostfile /podsys/hosts.txt \
-x NCCL_DEBUG=WARN \
-x NCCL_NVLS_ENABLE=1 \
-x NCCL_SHM_DISABLE=1 \
-x UCX_NET_DEVICES=enP5p9s0 \
-x LD_LIBRARY_PATH /podsys/build/nccl-tests/build/alltoall_perf \
-b 8 -e 32G -f 2 -g 1

Copyright © 2025 The PODsys Project. All rights reserved.