Jie Li ☕️

Jie Li

Ph.D. candidate in Computer Science

DISCL @ Texas Tech University

Jie Li is a Ph.D. candidate in Computer Science at Texas Tech University, where he is a member of the Data-Intensive Scalable Computing Laboratory (DISCL) under the guidance of Dr. Yong Chen. Jie’s research interests lie in the field of High-Performance Computing (HPC), including HPC systems monitoring, automation, and management, operational data analytics, job scheduling, and system architecture. He also has a keen interest in parallel and distributed computing and computer architecture. Jie completed his Master of Science degree in Computer Science from Texas Tech University in 2019. Prior to that, he earned a bachelor’s degree in architecture.

Experience

 
 
 
 
 
Data-Intensive Scalable Computing Laboratory (DISCL), TTU
Research Assistant
September 2019 – Present Lubbock, TX
  • Research and Publication: Conducted innovative research in High-Performance Computing, Computer Architecture, and Parallel and Distributed Computing. Authored and published research papers in reputable academic conferences and journals.
  • Mentorship and Education: Mentored both graduate and undergraduate students in their independent research studies. Provided guidance on research topics, project development, and data analysis.
  • Software Development and Collaboration: Played an integral role in developing and maintaining research software and tools. Wrote, tested, and documented code for various projects. Contributed to open-source software initiatives, fostering collaborative innovation.
  • Server Administration: Managed two high-end servers (Hugo and Alita) hosted at the High-Performance Computing Center. Oversaw server configuration, maintenance, and software management. Ensured consistent server availability and reliability while troubleshooting issues as they arose.
 
 
 
 
 
Lawrence Berkeley National Laboratory (LBNL)
Graduate Student Intern
June 2021 – August 2023 Berkeley, CA (Summer)
  • Data Integration and Analysis: Integrated HPC monitoring data from diverse sources (LDMS, DCGM, Slurm, VictoriaMetrics) for comprehensive analysis of system-wide architectural efficiency, including CPU, GPU, DRAM, and HBM2 resource utilization. Identified critical trends and patterns within the data to drive insights into system performance, with a focus on NERSC’s Cori and Perlmutter.
  • Machine Learning Expertise: Conducted in-depth statistical analysis of job-level monitoring data. Applied a variety of machine learning models, including SVC, LinearSVC, Decision Tree, and Random Forests, to analyze jobs based on time-series features.
  • Innovative Data Processing: Pioneered a novel approach by encoding time-series monitoring data as images and trained a Convolutional Neural Network (CNN) to classify and predict job applications with high accuracy.
  • Simulation and System Design: Designed and implemented a discrete event simulator to study resource management and job scheduling in HPC systems, with a specific focus on systems with disaggregated memory resources.
 
 
 
 
 
Teaching, Learning and Professional Development Center (TLPDC), TTU
Graduate Student Programmer
August 2018 – August 2019 Lubbock, TX
  • Website Maintenance and Communication: Maintained and updated TLPDC web pages, ensuring a fresh and relevant online presence. Facilitated communication with software application providers to meet product requirements efficiently.
  • Database Management and Security: Managed the MySQL database with precision, safeguarding valuable data assets. Implemented robust backup strategies to protect against data loss. Proactively addressed and resolved database access issues to maintain uninterrupted operations.

Talks

Advanced Visualization and Data Analysis of HPC Cluster and User Application Behavior
A Project Presentation at SC21.
Advanced Visualization and Data Analysis of HPC Cluster and User Application Behavior