MAC: Memory access coalescer for 3D-stacked memory

Published in Proceedings of the 48th International Conference on Parallel Processing, 2019

Recommended citation: Wang, Xi, Antonino Tumeo, John D. Leidel, Jie Li, and Yong Chen. "MAC: Memory access coalescer for 3D-stacked memory." In Proceedings of the 48th International Conference on Parallel Processing, pp. 1-10. 2019. https://artlands.github.io/files/wang-icpp-2019.pdf

Emerging data-intensive applications, such as graph analytics and data mining, exhibit irregular memory access patterns. Research has shown that with these memory-bound applications, traditional cache-based processor architectures, which exploit locality and regular patterns to mitigate the memory-wall issue, are inefficient. Meantime, novel 3D-stacked memory devices, such as Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM), promise significant increases in bandwidth that appear extremely appealing for memory-bound applications. However, conventional memory interfaces designed for cache-based architectures and JEDEC DDR devices fit poorly with the 3D-stacked memory, which leads to significant under-utilization of the promised high bandwidth.

As a response to these issues, in this paper we propose MAC (Memory Access Coalescer), a coalescing unit for the 3D-stacked memory. We discuss the design and implementation of MAC, in the context of a custom designed cache-less architecture targeted at data-intensive, irregular applications. Through a custom simulation infrastructure based on the RISC-V toolchain, we show that MAC achieves a coalescing efficiency of 52.85% on average. It improves the performance of the memory system by 60.73% on average for a large set of irregular workloads.

Download paper here

Recommended citation:

@inproceedings{pham2019mtsad,
  @inproceedings{wang2019mac,
  title={MAC: Memory access coalescer for 3D-stacked memory},
  author={Wang, Xi and Tumeo, Antonino and Leidel, John D and Li, Jie and Chen, Yong},
  booktitle={Proceedings of the 48th International Conference on Parallel Processing},
  pages={1--10},
  year={2019}
}