Skip to content

5. System Modifications Explained

Important Notice

While deploying clusters using PODsys, we've introduced several modifications to the original system setup. All these alterations are meticulously recorded within this documentation section.

Note

Unless specifically stated otherwise, the modifications apply consistently to both compute nodes and management nodes.

5.1 Modifications During System Installation

  • Uninstall the unattended-upgrades module to prevent automatic APT updates on the system.

  • Install Docker, NFS, ipmitools, cpupower, InfiniBand drivers, NVIDIA driver dependencies, and other necessary components.

  • Install the InfiniBand (IB) drivers, set up the openibd service to start automatically, and enable the opensmd service on the management node.

  • Disable the nouveau driver and install the NVIDIA drivers.

  • Setup a persistent service "load-nvidia-peermem".

txt
[Unit]
Description=Load nvidia_peermem Module
After=network.target

[Service]
ExecStart=/sbin/modprobe nvidia_peermem

[Install]
WantedBy=multi-user.target
  • Setup a persistent service "nvidia-smi -pm 1".
txt
[Unit]
Description=Enable nvidia-persistenced

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm 1
RemainAfterExit=yes

[Install]
WantedBy=default.target
  • Install CUDA and add the CUDA bin directory to the system's PATH environment variable.
  • Add the CUDA lib64 directory to the LD_LIBRARY_PATH environment variable.
txt
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
txt
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
  • Modify /etc/security/limits.conf and include the following content in the configuration file:
txt
* soft memlock unlimited
* hard memlock unlimited
root soft nofile 65536
root hard nofile 65536
* soft nofile 65536
* hard nofile 65536
* soft stack unlimited
* soft nproc unlimited
* hard stack unlimited
* hard nproc unlimited

Explanation of New Entries in limits.conf

The new entries in limits.conf allow processes to lock an unlimited amount of memory, set the number of file descriptors to 65536, permit processes to use stacks of any size, and remove restrictions on the number of processes on the system.

  • Add the podsys version information into /etc/podsys-release.
  • Add iommu.passthrough=1
txt
GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX iommu.passthrough=1"

5.2 Modification During Parallel Service Configuration Process

  • Configure NFS service
bash
[share directory]  *(rw,async,insecure,no_root_squash)

Copyright © 2025 The PODsys Project. All rights reserved.