Mailing list for future updates and reference: hpc-io-analysis@lists.uni-mainz.de
Investigating the I/O pattern and behaviour of HPC applications is necessary for designing and efficently utiliizng file systems. In this wiki we aim to cover and provide a recipe to use Darshan profiling tool and how we can exploit eBPF in Strace efficiently to extract the necessary information.
Darshan is a lightweight and scalable HPC I/O characterization tool. It is designed to capture the I/O behavior of applications and patterns file of accesses. Darshan heatmap models POSIX, MPI-IO and STDIO modes of I/O. DXT
The complete building guide can be found in the official documentation page of Darshan. The latest version can be downloaded here. We recommend using version 3.4.4 or later. Here we include the necessary arguments for running the applications.
tar -xvzf darshan-<version-number>.tar.gz
cd darshan-<version-number>/darshan-runtime
./prepare
mkdir -p build && cd build
../configure --with-jobid-env=SLURM_JOB_ID --with-mem-align=8 --enable-mmap-logs --prefix=$DARSHAN_INSTALL_PATH --with-log-path=<darshan log path> or --with-log-path-by-env=PWD,SLURM_JOB_ID,SLURM_SUBMIT_DIR CC=mpicc
make -j install
For Cray systems use --disable-cuserid
and also CC=cc
in the last line. Please refer to the official documentation page for the additional configuration options.
To make sure that the log directory has the required permissions to store the darshan logs the darshan-mk-log-dirs.pl
can be used. This will configure the path specified at configure time to include subdirectories and creates sticky permissions to enable multiple users to write to the same directory. This can be ignored if using with-log-path-by-env
option.
Using darshan-runtime, we can instrument the application code using the characterization tool. The instrumentation can be done by either cimpile time wrapping or dynamic library preloading with LD_PRELOAD
. To use dynamic library set LD_PRELOAD
environmental variable to the fill path to the Darshan shared library. This is done in command line or job script of the application for mpirun
, mpiexec
, and srun
:
mpiexec -n 4 -x LD_PRELOAD=$DARSHAN_INSTALL_PATH/lib/libdarshan.so ./app
srun -n 4 --export=LD_PRELOAD=$DARSHAN_INSTALL_PATH/lib/libdarshan.so ./app
It is recommended to add ALL
to the script to also include the other variables defined in the job script. There is a vast number of configuration options that can be found on the documentation website. There is also an option to add those configurations during the runtime:
#!/bin/bash
#SBATCH --nodes=xx
#SBATCH --ntasks-per-node=xx
#SBATCH --time=xxx
#SBATCH --job-name darshan
module load <mpi>
export DARSHAN_CONFIG_PATH=CONFIG_PATH
export DARSHAN_DISABLE_SHARED_REDUCTION=1
export DXT_ENABLE_IO_TRACE=1 #this is optional, DXT traces require a big memory buffer
srun -n 4 --export=ALL,LD_PRELOAD=$DARSHAN_INSTALL_PATH/lib/libdarshan.so ./app
For non MPI applications please add export DARSHAN_ENABLE_NONMPI=1
to your environmental variables.
Some darshan configuration like maximum record count can only be set from the configuration file. In the configuration file please insert the count number and also the maximum memory that darshan profiler uses:
MAX_RECORDS 65536 POSIX,MPI-IO,STDIO,DXT #adjust and increase based on your modules
MODMEM 1024 #you can increase these numbers based on your application
MOD_DISABLE LUSTRE
Darshan DXT module provides a complete trace of MPI-IO and POSIX read/write APIs. To enable DXT tracing please add export DXT_ENABLE_IO_TRACE=1
to your variables. It is important to note this parameter when the run involves many small read and writes. Enabling the DXT tracing increases the memory usage of the profiling therefore it is necessary to be careful of the specified amount of memory in the config file.