Movatterモバイル変換

Computational RAM

From Wikipedia, the free encyclopedia

Random-access memory with processing elements integrated on the same chip

Computer memory anddata storage types
General Memory cell Memory coherence Cache coherence Memory hierarchy Memory access pattern Memory map Secondary storage MOS memory floating-gate Continuous availability Areal density (computer storage) Block (data storage) Object storage Direct-attached storage Network-attached storage Storage area network Block-level storage Single-instance storage Data Structured data Unstructured data Big data Metadata Data compression Data corruption Data cleansing Data degradation Data integrity Data security Data validation Data validation and reconciliation Data recovery Storage Data cluster Directory Shared resource File sharing File system Clustered file system Distributed file system Distributed file system for cloud Distributed data store Distributed database Database Data bank Data storage Data store Data deduplication Data structure Data redundancy Replication (computing) Memory refresh Storage record Information repository Knowledge base Computer file Object file File deletion File copying Backup Core dump Hex dump Data communication Information transfer Temporary file Copy protection Digital rights management Volume (computing) Boot sector Master boot record Volume boot record GUID Partition Table Disk array Disk image Disk mirroring Disk aggregation Disk partitioning Memory segmentation Locality of reference Logical disk Storage virtualization Virtual memory Memory-mapped file Software entropy Software rot In-memory database In-memory processing Persistence (computer science) Persistent data structure RAID Non-RAID drive architectures Memory paging Bank switching Grid computing Cloud computing Cloud storage Fog computing Edge computing Dew computing The law Martiels law
Volatile
RAM Hardware cache CPU cache Scratchpad memory DRAM eDRAM SDRAM SGRAM DDR GDDR LPDDR QDRSRAM EDO DRAM XDR DRAM RDRAM HBM SRAM 1T-SRAM ReRAM QRAM Content-addressable memory (CAM) Computational RAM VRAM Dual-ported RAM Video RAM (dual-ported DRAM)
Historical DC3MWCP (1946–1947) Delay-line memory (1947) Mellon optical memory (1951) Selectron tube (1952) Dekatron T-RAM (2009) Z-RAM (2002–2010)
Non-volatile
ROM Diode matrix MROM PROM EPROM EEPROM ROM cartridge Solid-state storage (SSS) Flash memory is used in: Solid-state drive (SSD) Solid-state hybrid drive (SSHD) USB flash drive IBM FlashSystem Flash Core Module Memory card Memory Stick CompactFlash PC Card MultiMediaCard SD card SIM card SmartMedia Universal Flash Storage SxS MicroP2 XQD card Programmable metallization cell
NVRAM Memistor Memristor PCM (3D XPoint) MRAM Electrochemical RAM (ECRAM) Nano-RAM CBRAM
Early-stageNVRAM FeRAM ReRAM FeFET memory
Analog recording Phonograph cylinder Phonograph record Quadruplex videotape Vision Electronic Recording Apparatus Magnetic recording Magnetic storage Magnetic tape Magnetic-tape data storage Tape drive Tape library Digital Data Storage (DDS) Videotape Cassette tape Linear Tape-Open Betamax 8 mm video format DV MiniDV MicroMV U-matic VHS S-VHS VHS-C D-VHS Hard disk drive
Optical 3D optical data storage Optical disc LaserDisc Compact Disc Digital Audio (CDDA) CD CD Video CD-R CD-RW Video CD Super Video CD Mini CD Nintendo optical discs CD-ROM Hyper CD-ROM DVD DVD+R DVD-Video DVD card DVD-RAM MiniDVD HD DVD Blu-ray Ultra HD Blu-ray Holographic Versatile Disc WORM
In development CBRAM Racetrack memory NRAM Millipede memory ECRAM Patterned media Holographic data storage Electronic quantum holography 5D optical data storage DNA digital data storage Universal memory Time crystal Quantum memory UltraRAM
Historical Paper data storage (1725) Punched card (1725) Punched tape (1725) Plugboard Drum memory (1932) Magnetic-core memory (1949) Plated-wire memory (1957) Core rope memory (1960s) Thin-film memory (1962) Disk pack (1962) Twistor memory (~1968) Bubble memory (~1970) Floppy disk (1971)
v t e

Computational RAM (C-RAM) israndom-access memory withprocessing elements integrated on the same chip. This enables C-RAM to be used as aSIMD computer. It also can be used to more efficiently use memory bandwidth within a memory chip. The general technique of doing computations in memory is called Processing-In-Memory (PIM).

Overview

[edit]

The most influential implementations of computational RAM came fromThe Berkeley IRAM Project. Vector IRAM (V-IRAM) combinesDRAM with avector processor integrated on the same chip.^[1]

Reconfigurable Architecture DRAM (RADram) isDRAM withreconfigurable computing FPGA logic elements integrated on the same chip.^[2]SimpleScalar simulations show that RADram (in a system with a conventional processor) can give orders of magnitude better performance on some problems than traditional DRAM (in a system with the same processor).

Someembarrassingly parallel computational problems are already limited by thevon Neumann bottleneck between the CPU and the DRAM.Some researchers expect that, for the same total cost, a machine built from computational RAM will run orders of magnitude faster than a traditional general-purpose computer on these kinds of problems.^[3]

As of 2011, the "DRAM process" (few layers; optimized for high capacitance) and the "CPU process" (optimized for high frequency; typically twice as manyBEOL layers as DRAM; since each additional layer reduces yield and increases manufacturing cost, such chips are relatively expensive per square millimeter compared to DRAM) is distinct enough that there are three approaches to computational RAM:

starting with a CPU-optimized process and a device that uses much embedded SRAM, add an additional process step (making it even more expensive per square millimeter) to allow replacing the embedded SRAM with embedded DRAM (eDRAM), giving ≈3x area savings on the SRAM areas (and so lowering net cost per chip).
starting with a system with a separate CPU chip and DRAM chip(s), add small amounts of "coprocessor" computational ability to the DRAM, working within the limits of the DRAM process and adding only small amounts of area to the DRAM, to do things that would otherwise be slowed down by the narrow bottleneck between CPU and DRAM: zero-fill selected areas of memory, copy large blocks of data from one location to another, find where (if anywhere) a given byte occurs in some block of data, etc. The resulting system—the unchanged CPU chip, and "smart DRAM" chip(s)—is at least as fast as the original system, and potentially slightly lower in cost. The cost of the small amount of extra area is expected to be more than paid back in savings in expensive test time, since there is now enough computational capability on a "smart DRAM" for a wafer full of DRAM to do most testing internally in parallel, rather than the traditional approach of fully testing one DRAM chip at a time with an expensive externalautomatic test equipment.^[1]
starting with a DRAM-optimized process, tweak the process to make it slightly more like the "CPU process", and build a (relatively low-frequency, but low-power and very high bandwidth) general-purpose CPU within the limits of that process.

Some CPUs designed to be built on a DRAM process technology (rather than a "CPU" or "logic" process technology specifically optimized for CPUs) includeThe Berkeley IRAM Project, TOMI Technology^[4]^[5]and theAT&T DSP1.

Because a memory bus to off-chip memory has many times the capacitance of an on-chip memory bus, a system with separate DRAM and CPU chips can have several times theenergy consumption of an IRAM system with the samecomputer performance.^[1]

Because computational DRAM is expected to run hotter than traditional DRAM,and increased chip temperatures result in faster charge leakage from the DRAM storage cells,computational DRAM is expected to require more frequentDRAM refresh.^[2]

Processor-in-/near-memory

[edit]

This sectionrelies excessively onreferences toprimary sources. Please improve this section by addingsecondary or tertiary sources.
Find sources: "Computational RAM" – news ·newspapers ·books ·scholar ·JSTOR(August 2012) (Learn how and when to remove this message)

This sectionneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources in this section. Unsourced material may be challenged and removed.
Find sources: "Computational RAM" – news ·newspapers ·books ·scholar ·JSTOR(August 2012) (Learn how and when to remove this message)

Aprocessor-in-/near-memory (PINM) refers to acomputer processor (CPU) tightly coupled tomemory, generally on the samesilicon chip.

The chief goal of merging the processing and memory components in this way is to reducememory latency and increasebandwidth. Alternatively reducing the distance that data needs to be moved reduces the power requirements of a system.^[6] Much of the complexity (and hencepower consumption) in current processors stems from strategies to deal with avoiding memory stalls.

Examples

[edit]

In the 1980s, a tiny CPU that executedFORTH was fabricated into aDRAM chip to improve PUSH and POP.FORTH is astack-oriented programming language and this improved its efficiency.

Thetransputer also had large on chip memory given that it was made in the early 1980s making it essentially a processor-in-memory.

Notable PIM projects include theBerkeley IRAM project (IRAM) at theUniversity of California, Berkeley^[7] project and theUniversity of Notre Dame PIM^[8] effort.

DRAM-based PIM taxonomy

[edit]

DRAM-based near-memory and in-memory designs can be categorized into four groups:

DIMM-level approaches place the processing units near memory chips. These approaches require minimal/no change in the data layout(e.g., Chameleon,^[9] and RecNMP^[10] ).
Logic-layer-level approaches embed processing units in the logic layer of 3D stack memories and can benefit from the high bandwidth of 3D stack memories (e.g., TOP_PIM^[11])
Bank-level approaches place processing units inside the memory layers, near each bank. UPMEM and Samsung's PIM^[12] are examples of these approaches
Subarray-level approaches process data inside each subarray. The Subarray-level approaches provide the highest access parallelism but often perform only simple operations, such as bitwise operations on an entire memory row (e.g., DRISA^[13]) or sequential processing of the memory row using a single-world ALU (e.g., Fulcrum^[14])

References

[edit]

^^a ^b ^cChristoforos E. Kozyrakis,Stylianos Perissakis,David Patterson,Thomas Anderson, et al."Scalable Processors in the Billion-Transistor Era: IRAM".IEEEComputer (magazine).1997.says"Vector IRAM ...can operate as a parallel built-in self-test engine forthe memory array, significantly reducing the DRAMtesting time and the associated cost."
^^a ^bMark Oskin,Frederic T. Chong, and Timothy Sherwood."Active Pages: A Computation Model for Intelligent Memory"Archived 2017-09-22 at theWayback Machine.1998.
^Daniel J. Bernstein."Historical notes on mesh routing in NFS".2002."programming a computational RAM"
^"TOMI the milliwatt microprocessor"^{[permanent dead link]}
^Yong-Bin Kim and Tom W. Chen."Assessing Merged DRAM/Logic Technology".1998."Archived copy"(PDF). Archived fromthe original(PDF) on 2011-07-25. Retrieved2011-11-27.{{cite web}}: CS1 maint: archived copy as title (link)[1]
^"GYRFALCON STARTS SHIPPING AI CHIP".electronics-lab. 2018-10-10. Retrieved5 December 2018.
^IRAM
^"PIM". Archived fromthe original on 2015-11-09. Retrieved2015-05-26.
^Hadi Asghari-Moghaddam, et al., "Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems".
^Liu Ke, et al., "RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing".
^Dongping, Zhang, et al., "TOP-PIM: Throughput-oriented programmable processing in memory".
^Sukhan Lee, et al., "Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product".
^Shuangchen Li, et al.,"DRISA: A dram-based reconfigurable in-situ accelerator".
^Marzieh Lenjani, et al., "Fulcrum: a Simplified Control and Access Mechanismtoward Flexible and Practical In-situ Accelerators".

Bibliography

[edit]

Duncan Elliott, Michael Stumm, W. Martin Snelgrove, Christian Cojocaru, Robert McKenzie, "Computational RAM: Implementing Processors in Memory",IEEE Design and Test of Computers, vol. 16, no. 1, pp. 32–41, Jan–Mar 1999.doi:10.1109/54.748803.