Optimization of DRAM based PIM Architecture for Energy-Efficient Deep Neural Network Training

Abstract

Deep Neural Network (DNN) training consumes high-energy. On the other hand, DNNs deployed on edge devices demand very high-energy efficiency. In this context, Processing-in-Memory (PIM) is an emerging compute paradigm that bridges the memory-computation gap to improve the energy-efficiency. DRAMs are one such memory type employed for designing energy-efficient PIM architectures for DNN training. One of the major issues of DRAM-PIM architectures designed for DNN training is the high number of internal data accesses within a bank between the memory arrays and the PIM computation units (e.g. 51% more than inference). These internal data accesses in the state-of-the-art DRAM PIM architectures consume very high energy compared to computation units. Hence, it is important to reduce the internal data access energy within the DRAM bank for further improving the energy efficiency of DRAMPIM architectures. We present three novel optimizations that together reduce the internal data access energy up to 81.54%. Our first optimization modifies the bank data access circuit to enable partial accesses of data instead of the conventional fixed granularity accesses, thereby exploiting the available sparsity during training. The second optimization is to have a dedicated low-energy region within the DRAM bank that has low capacitive load of global wires and shorter data movement. Finally, we propose a 12-bit high dynamic range floating-point format called TinyFloat that reduces the total number of data access energy by 20% compared to IEEE 754 half and single precision.

Publication
2022 IEEE International Symposium on Circuits and Systems (ISCAS)