MonSTer

An out-of-the-box monitoring framework for HPC systems, adopted by Dell's Omnia project.

MonSTer is an “out-of-the-box” monitoring tool for high-performance computing platforms. It integrates telemetry from BMCs (via Redfish), resource managers (Slurm), and time-series databases to give administrators a unified, queryable view of cluster health.

Presented at CLUSTER’20 and adopted by Dell’s Omnia project.

Source: github.com/nsfcac/MonSTer