DOI:10.1109/HCS55958.2022.9895532 - Corpus ID: 252551808
Meteor Lake and Arrow Lake Intel Next-Gen 3D Client Architecture Platform with Foveros
@article{Gomes2022MeteorLA, title={Meteor Lake and Arrow Lake Intel Next-Gen 3D Client Architecture Platform with Foveros}, author={Wilfred Gomes and Slade Morgan and Boyd Phelps and Tim Wilson and Erik G. Hallnor}, journal={2022 IEEE Hot Chips 34 Symposium (HCS)}, year={2022}, pages={1-40}, url={https://api.semanticscholar.org/CorpusID:252551808}}- Wilfred GomesSlade MorganErik G. Hallnor
- Published inIEEE Hot Chips Symposium21 August 2022
- Computer Science, Engineering
23 Citations
Towards Real-Time LLM Inference on Heterogeneous Edge Platforms
- Rakshith JayanthNeelesh GuptaSouvik KunduDeepak A. MathaikuttyViktor K. Prasanna
- 2024
Computer Science, Engineering
2024 IEEE 31st International Conference on High…
This work aims to address the gap in methodologies for maximizing performance through the distribution of model inference on a heterogeneous edge platform by proposing methodologies for maximizing performance through the distribution of model inference on a heterogeneous edge platform.
HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing
- Jinghan HuangJiaqi LouNam Sung Kim
- 2024
Engineering, Computer Science
HAL, consisting of a hardware-based load balancer and an intelligent load balancing policy implemented inside the SNIC, can improve the system-wide energy efficiency and throughput of the server running these functions by 31% and 10%, respectively, without notably increasing the tail latency.
EIGEN: Enabling Efficient 3DIC Interconnect with Heterogeneous Dual-Layer Network-on-Active-Interposer
- Siyao JiaBo JiaoHaozhe ZhuChixiao ChenQi LiuMing Liu
- 2025
Engineering, Computer Science
This work proposes a heterogeneous dual-layer interconnect architecture, EIGEN, for chiplet-interposer systems, along with a reinforcement learning (RL)-based routing framework, which can provide efficient and flexible communication for chiplet-based 3DICs.
Benchmarking Edge AI Platforms for High-Performance ML Inference
- Rakshith JayanthNeelesh GuptaViktor K. Prasanna
- 2024
Computer Science
A comprehensive study comparing the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions finds that the Neural Processing Unit (NPU) excels in matrix-vector multiplication and some neural network tasks and GPU-based inference performs best with large dimensions and batch sizes.
Performance-Energy Characterization of ML Inference on Heterogeneous Edge AI Platforms
- Palash KohliRakshith JayanthNeelesh GuptaHaoyang FanViktor K. Prasanna
- 2025
Computer Science, Engineering
This study benchmarks a state-of-the-art heterogeneous platform consisting of a multicore CPU, integrated GPU and NPU, on its performance on various ML workloads and observes that hybrid NPU-GPU implementation of LLM inference is superior to CPU-only inference in terms of energy consumption and end-to-end (E2E) latency.
37.4 SHINSAI: A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory
- Bo JiaoHaozhe ZhuMing Liu
- 2025
Engineering, Materials Science
A reusable active TSV-interposer, known as SHINSAI, in 28nm-CMOS/TSV-middle/micro-bump (μbump) technology is presented, enabling the transformation of traditional multicore monolithic systems into compact two-layer stacking architectures.
Architecting Selective Refresh based Multi-Retention Cache for Heterogeneous System (ARMOUR)
- Sukarn AgarwalShounak ChakrabortyMagnus Själander
- 2023
Engineering, Computer Science
ARMOUR, a mechanism for efficient management of memory accesses to a multi-retention LLC, where based on the initial requester the cache blocks are allocated in the high (CPU) or low (GPU) retention zone, and blocks that are about to expire are either refreshed or written back (GPU).
Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization
- Luca ColagrandeLuca Benini
- 2025
Computer Science, Engineering
This work presents a detailed, cycle-accurate quantitative analysis of the offload overheads on Occamy, an open-source massively parallel RISC-V based heterogeneous MPSoC, and proposes a quantitative model to estimate the runtime of selected applications accounting for the offload overheads.
Hello SME! Generating Fast Matrix Multiplication Kernels Using the Scalable Matrix Extension
- Stefan RemkeAlexander Breuer
- 2024
Computer Science, Engineering
SC24-W: Workshops of the International Conference…
This paper presents an in-depth study of SME on M4, and designs a just-in-time code generator for SME-based small matrix multiplications that outperform the vendor-optimized BLAS implementation in almost all tested configurations.
Into the Third Dimension: Architecture Exploration Tools for 3D Reconfigurable Acceleration Devices
- Andrew BoutrosFatemehsadat MahmoudiAmin MohagheghStephen MoreVaughn Betz
- 2023
Engineering, Computer Science
The RAD-Gen framework is extended by integrating an upgraded version of the COFFE automatic transistor sizing tool that supports 7 nm FinFETs with a more accurate, metal-aware area model for newer process technologies and new tools in RAD-Gen for modeling the inter-die connections and power distribution networks of 3D architectures are implemented.
One Reference
Intel 4 CMOS Technology Featuring Advanced FinFET Transistors optimized for High Density and High-Performance Computing
A new advanced CMOS FinFET technology, Intel 4, is introduced that extends Moore’s law by offering 2X area scaling of the high performance logic library and greater than 20% performance gain at…
Related Papers
Showing 1 through 3 of 0 Related Papers