Profiling Power Consumption of Jobs with SLURM

Date:

Power, temperature and performance have all become first-order design constraints for High Performance Computing platforms. These three features influence each other, and affect the architecture and resource scheduling designs. In our previous research, we have proposed a methodology for profiling power consumption at job level based on the assumption that power usage is proportional to the CPU usage. However, this methodology lacks accuracy and cannot be applied for the application that mainly stresses the network. On the other hand, the resource and job management system (e.g. SLURM) has knowledge of both the underlying resources and jobs needs, and it is the best candidate for monitoring and controlling power consumption of jobs. In this talk, we will discuss SLURM architecture and a framework proposed by Yiannis et.al. developed upon SLURM, which allows energy accounting per job with power profiling capabilities along with parameters for energy control features based on static frequency scaling of the CPUs.

Download slides here