Fading Coder

One Final Commit for the Last Sprint

Home > Tech > Content

Deployment and Configuration Steps for DDN Lustre Commercial Edition as HPC Cluster Parallel File System

Tech 2

Storage Array Configuration

The deployment uses 4 Sugon DS800-G30 disk arrays, 4 IO server nodes, and DDN Lustre commercial distributed file system to build a shared storage pool for the HPC cluster.

Each enclosure uses RAID 6 with dedicated hot spare for OST volumes, MDT volumes are deployed on 2 SSDs configured as RAID 1. When creating disk pools, select standard pool mode, logical volume stripe size is set to maximum 4M. Each RAID group is recommended to use 8-10 physical disks, no cross-enclosure RAID configuration, each RAID group is assigned 1 dedicated hot spare disk. The four disk arrays are configured per the above specification, with OST volumes mapped to all 4 IO nodes and MDT volumes mapped to the first two HA paired IO nodes.

DDN Lustre Dedicated OS Installation

Install the dedicated OS on all 4 IO nodes via server BMC remote management:

  1. Mount the es-5.0.0-server-centos-r3-x86_64.iso image as virtual media via BMC, boot from the virtual optical drive.
  2. If the default installation disk is not recognized, add the boot parameter install_dev=sda during installation startup.
  3. Default initial OS password is DDNSolutions4U, change to a temporary password for deployment, replace with a complex password after full configuration is complete.

The OS uses unattended installation, after deployment, configure management network IP addresses for all 4 IO nodes before proceeding with file system setup.

Pre-deployment Preparation

Update Emulex HBA Card Firmware

Use firmware package elxflashStandalone-linux-12.4.243.16-1.zip and firmware binary lancer_A12.4.243.11.grp to update:

unzip elxflashStandalone-linux-12.4.243.16-1.zip
cp lancer_A12.4.243.11.grp ./elxflashStandalone-linux-12.4.243.16-1/firmware/
cd elxflashStandalone-linux-12.4.243.16-1/lx
chmod +x elxflash.sh
./elxflash.sh /auto /upgrade /quiet

Verify IB Driver Version

The preinstalled OFED driver version on IO nodes is 4.5, ensure all HPC cluster client nodes use the same 4.5 OFED driver version to avoid compatibility issues. The deployment uses Y-split HDR 200G to 2x100G optical cables, so HCA card firmware update and IB switch split mode configuration are required.

Update HCA Card Firmware

Use ConnectX-6 firmware package fw-ConnectX6-rel-20_26_1040-MCX653105A-ECA_Ax-UEFI-14.19.14-FlexBoot-3.5.803.bin to update:

flint -d /dev/mst/mt4123_pciconf0 -i ./fw-ConnectX6-rel-20_26_1040-MCX653105A-ECA_Ax-UEFI-14.19.14-FlexBoot-3.5.803.bin burn
flint -d /dev/mst/mt4123_pciconf0 query | grep PSID

Configure IB Switch

For switch ports connected to split optical cables, enable port split mode, then start the opensmd service to initialize the IB subnet.

Build Custom Kernel Initramfs Image

The lpfc driver source package is elx-lpfc-12.2.299.13-1_rhel7u6.src.rpm, target kernel version is 3.10.0-957.12.2.el7_lustre.ddn1.x86_64:

rpm -ivh elx-lpfc-12.2.299.13-1_rhel7u6.src.rpm
cd /root/rpmbuild/SOURCES/
tar -xzf lpfcdriver-35-12.2.299.13.tar.gz
cd lpfcdriver-35-12.2.299.13/
make -j$(nproc)
TARGET_KERNEL_VER=3.10.0-957.12.2.el7_lustre.ddn1.x86_64
cp lpfc.ko /lib/modules/${TARGET_KERNEL_VER}/kernel/drivers/scsi/lpfc/
cd /lib/modules/${TARGET_KERNEL_VER}/kernel/drivers/scsi/lpfc/
mv lpfc.ko.xz lpfc.ko.xz.backup
xz -z lpfc.ko
cd /boot/
cp initramfs-${TARGET_KERNEL_VER}.img initramfs-${TARGET_KERNEL_VER}.img.bak
echo 'add_drivers+=" lpfc "' >> /etc/dracut.conf.d/lpfc.conf
dracut -f /boot/initramfs-${TARGET_KERNEL_VER}.img ${TARGET_KERNEL_VER}
for node in io2 io3 io4; do scp /boot/initramfs-${TARGET_KERNEL_VER}.img ${node}:/boot/; done
reboot

Configure Multipath

Copy the pre-customized multipath.conf file to /etc/ directory on all IO nodes. Adjust WWID and alias mappings starting at line 86 to match the volume names defined when provisioning LUNs on the disk arrays. Sync the modified configuration file across all 4 IO nodes, then run multipath -r to reload the configuration.

Modify Exascaler Configuration

Copy the template /etc/ddn/exascaler.conf to the working directory, modify entries containing hostnames (io1, io2, io3, io4) and management IP segments (10.10.x.x) starting at line 15 to match the actual deployment environment, leave entries before line 15 unchanged. Sync the modified configuration file to all IO nodes.

Add HA Cluster User

Remove the default hacluster user first, then recreate it with fixed UID/GID across all nodes:

userdel -r --force hacluster
mkdir -p /var/lib/heartbeat/cores
groupadd hacluster -g 499
useradd hacluster -u 499 -g 499 -c "Pacemaker HA User" -d /var/lib/heartbeat/cores/hacluster -s /sbin/nologin

Reboot all IO nodes after completing all pre-installation steps before proceeding with Lustre deployment.

Lustre Cluster Deployment

  1. Run the deployment script on every IO node:
es_install

Follow the interactive prompts, select yes for most configuration steps, select no for network restart prompts if network is already configured correctly.

  1. (Optional) If corosync service fails to start after initial deployment, generate corosync authentication key on the primary node and sync to all other IO nodes:
corosync-keygen
scp /etc/corosync/authkey io2:/etc/corosync/
scp /etc/corosync/authkey io3:/etc/corosync/
scp /etc/corosync/authkey io4:/etc/corosync/
  1. Initialize Pacemaker cluster on the two primary MDT nodes (io1 and io3):
config_pacemark

Verify cluster status with hastatus, confirm both nodes show online status to indicate successful cluster initialization.

  1. Start Lustre cluster resources:
# Start all cluster resources
cluster_resource --action start
# Stop all cluster resources (for maintenance)
cluster_resource --action stop

Verify MDT and OST recovery status after startup:

lustre_recovery_status.sh

Run hastatus to confirm all HA resources are in active state, server side deployment is complete at this point.

Client Compilation and Installation

All HPC management, login and compute nodes require Lustre client installation. First confirm the OFED driver version on client nodes is 4.5, matching the IO node version.

Lustre client compilation steps: Extract the Lustre source package, enter the source directory and run:

KERNEL_SRC=/usr/src/kernels/3.10.0-957.12.2.el7.x86_64
OFED_SRC=/usr/src/ofa_kernel/default
./configure --enable-client --disable-server --with-linux=${KERNEL_SRC} --with-o2ib=${OFED_SRC}
make -j$(nproc)
make rpms

Install the generated lustre-client and lustre-client-modules RPM packages on all client nodes. Install required dependency packages first if compilation errors occur.

Client mount command example:

mount -t lustre 12.12.12.22@o2ib:12.12.12.23@o2ib:/pfs /public

Lustre Reinstallation Procedure

  1. Unmount Lustre file system on all client nodes: umount /public
  2. Stop all Lustre cluster resources on IO nodes: cluster_resource --action stop
  3. Clear existing Pacemaker cluster configuration on all IO nodes: cibadmin -E --force
  4. Verify and delete existing Lustre related LVM volumes (MDT, OST) with lvdisplay and vgdisplay, use lvremove and vgremove to delete volumes, ensure no system volumes are deleted accidentally.
  5. Rerun the es_install deployment process to reinstall.

Common Troubleshooting

Mount failure with "no such device" error

  • Check if Lustre modules are loaded: lsmod | grep lnet && lsmod | grep lustre
  • If modules are not loaded, load them with modprobe lnet && modprobe lustre
  • If module load fails, verify lnet configuration in /etc/modprobe.d/lustre.conf
  • Confirm OFED driver version on client matches the version used to compile the client package, recompile and reinstall client packages if driver version mismatch exists.

Mount failure with MGS/file system name error

  • Verify MGS NID and file system name in the mount command match the server side configuration
  • Confirm Lustre modules are loaded correctly on the client
  • If the issue persists, recompile the Lustre client package and reinstall on the affected nodes.
Tags: DDN Lustre

Related Articles

Understanding Strong and Weak References in Java

Strong References Strong reference are the most prevalent type of object referencing in Java. When an object has a strong reference pointing to it, the garbage collector will not reclaim its memory. F...

Comprehensive Guide to SSTI Explained with Payload Bypass Techniques

Introduction Server-Side Template Injection (SSTI) is a vulnerability in web applications where user input is improper handled within the template engine and executed on the server. This exploit can r...

Implement Image Upload Functionality for Django Integrated TinyMCE Editor

Django’s Admin panel is highly user-friendly, and pairing it with TinyMCE, an effective rich text editor, simplifies content management significantly. Combining the two is particular useful for bloggi...

Leave a Comment

Anonymous

◎Feel free to join the discussion and share your thoughts.