A Weighted Current Summation Based Mixed Signal DRAM-PIM Architecture for Deep Neural Network Inference

Abstract

Processing-in-Memory (PIM) is an emerging approach to bridge the memory-computation gap. One of the major challenges of PIM architectures in the scope of Deep Neural Network (DNN) inference is the implementation of area-intensive Multiply-Accumulate (MAC) units in memory technologies, especially for DRAM-based PIMs. The DRAM architecture restricts the integration of DNN computation units near the area optimized commodity DRAM Sub-Array (SA) or Primary Sense Amplifier (PSA) region, where the data parallelism is maximum and the data movement cost is minimum. In this paper, we present a novel DRAM-based PIM architecture that is based on bit-decomposed MAC operation and Weighted Current Summation (WCS) technique to implement the MAC unit with minimal additional circuitry in the PSA region by leveraging on mixed-signal design. The architecture presents a two-stage design that employs light-weight current mirror based analog units near the SAs in the PSA region, whereas all the other substantial logic is integrated near the bank peripheral region. Hence, our architecture attains a balance between the data parallelism, data movement energy and area optimization. For an 8-bit CNN inference, our novel 8Gb DRAM PIM device achieves a peak performance of 142.8GOPS while consuming a power of 756.76mW, which results in an energy efficiency of 188.8GOPS/W. The area overhead of such an 8Gb device for a 2ynm DRAM technology is 12.63% in comparison to a commodity 8Gb DRAM device.

Publication
IEEE Journal on Emerging and Selected Topics in Circuits and Systems (Volume: 12, Issue: 2, June 2022)