The IEEE Cluster2020 Experience, MonSTer Review and Future Work

Date:

MonSTer is an out-of-the-box monitoring tool for high performance computing systems that has been in development for over a year. After several rounds of iterations and optimizations, we have achieved up to 25x performance improvements over the initial implementation, allowing for near real-time acquisition and visualization of monitoring data. This research study has been published at the IEEE Cluster 2020 conference and was presented to the community last week. In this talk, I will share some of my experience from attending the conference and review the work we have done. In addition, I would like to discuss several research directions in the context of “Integrated visualizing, monitoring, and managing HPC systems”.

Download slides here