MonSTer
An out-of-the-box monitoring framework for HPC systems, adopted by Dell's Omnia project.
MonSTer is an “out-of-the-box” monitoring tool for high-performance computing platforms. It integrates telemetry from BMCs (via Redfish), resource managers (Slurm), and time-series databases to give administrators a unified, queryable view of cluster health.
Presented at CLUSTER’20 and adopted by Dell’s Omnia project.
Source: github.com/nsfcac/MonSTer