Meteor Lake and Arrow Lake Intel Next-Gen 3D Client Architecture Platform with Foveros

@article{Gomes2022MeteorLA,  title={Meteor Lake and Arrow Lake Intel Next-Gen 3D Client Architecture Platform with Foveros},  author={Wilfred Gomes and Slade Morgan and Boyd Phelps and Tim Wilson and Erik G. Hallnor},  journal={2022 IEEE Hot Chips 34 Symposium (HCS)},  year={2022},  pages={1-40},  url={https://api.semanticscholar.org/CorpusID:252551808}}

Wilfred GomesSlade MorganErik G. Hallnor
Published inIEEE Hot Chips Symposium21 August 2022
Computer Science, Engineering

View on IEEE

doi.org

23 Citations

Highly Influential Citations

Background Citations

Methods Citations

Topics

Foveros (opens in a new tab)Intel (opens in a new tab)Meteor (opens in a new tab)

23 Citations

Towards Real-Time LLM Inference on Heterogeneous Edge Platforms

Rakshith JayanthNeelesh GuptaSouvik KunduDeepak A. MathaikuttyViktor K. Prasanna

Computer Science, Engineering

2024 IEEE 31st International Conference on High…

2024

This work aims to address the gap in methodologies for maximizing performance through the distribution of model inference on a heterogeneous edge platform by proposing methodologies for maximizing performance through the distribution of model inference on a heterogeneous edge platform.

HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing

Jinghan HuangJiaqi LouNam Sung Kim

Engineering, Computer Science

2024 ACM/IEEE 51st Annual International Symposium…

2024

HAL, consisting of a hardware-based load balancer and an intelligent load balancing policy implemented inside the SNIC, can improve the system-wide energy efficiency and throughput of the server running these functions by 31% and 10%, respectively, without notably increasing the tail latency.

EIGEN: Enabling Efficient 3DIC Interconnect with Heterogeneous Dual-Layer Network-on-Active-Interposer

Siyao JiaBo JiaoHaozhe ZhuChixiao ChenQi LiuMing Liu

Engineering, Computer Science

2025 IEEE International Symposium on High…

2025

This work proposes a heterogeneous dual-layer interconnect architecture, EIGEN, for chiplet-interposer systems, along with a reinforcement learning (RL)-based routing framework, which can provide efficient and flexible communication for chiplet-based 3DICs.

Benchmarking Edge AI Platforms for High-Performance ML Inference

Rakshith JayanthNeelesh GuptaViktor K. Prasanna

Computer Science

2024 IEEE High Performance Extreme Computing…

2024

A comprehensive study comparing the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions finds that the Neural Processing Unit (NPU) excels in matrix-vector multiplication and some neural network tasks and GPU-based inference performs best with large dimensions and batch sizes.

[PDF]

Performance-Energy Characterization of ML Inference on Heterogeneous Edge AI Platforms

Palash KohliRakshith JayanthNeelesh GuptaHaoyang FanViktor K. Prasanna

Computer Science, Engineering

2025 IEEE High Performance Extreme Computing…

2025

This study benchmarks a state-of-the-art heterogeneous platform consisting of a multicore CPU, integrated GPU and NPU, on its performance on various ML workloads and observes that hybrid NPU-GPU implementation of LLM inference is superior to CPU-only inference in terms of energy consumption and end-to-end (E2E) latency.

37.4 SHINSAI: A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory

Bo JiaoHaozhe ZhuMing Liu

Engineering, Materials Science

2025 IEEE International Solid-State Circuits…

2025

A reusable active TSV-interposer, known as SHINSAI, in 28nm-CMOS/TSV-middle/micro-bump (μbump) technology is presented, enabling the transformation of traditional multicore monolithic systems into compact two-layer stacking architectures.

Architecting Selective Refresh based Multi-Retention Cache for Heterogeneous System (ARMOUR)

Sukarn AgarwalShounak ChakrabortyMagnus Själander

Engineering, Computer Science

2023 60th ACM/IEEE Design Automation Conference…

2023

ARMOUR, a mechanism for efficient management of memory accesses to a multi-retention LLC, where based on the initial requester the cache blocks are allocated in the high (CPU) or low (GPU) retention zone, and blocks that are about to expire are either refreshed or written back (GPU).

Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization

Luca ColagrandeLuca Benini

Computer Science, Engineering

IEEE Transactions on Parallel and Distributed…

2025

This work presents a detailed, cycle-accurate quantitative analysis of the offload overheads on Occamy, an open-source massively parallel RISC-V based heterogeneous MPSoC, and proposes a quantitative model to estimate the runtime of selected applications accounting for the offload overheads.

[PDF]

Hello SME! Generating Fast Matrix Multiplication Kernels Using the Scalable Matrix Extension

Stefan RemkeAlexander Breuer

Computer Science, Engineering

SC24-W: Workshops of the International Conference…

2024

This paper presents an in-depth study of SME on M4, and designs a just-in-time code generator for SME-based small matrix multiplications that outperform the vendor-optimized BLAS implementation in almost all tested configurations.

[PDF]

Into the Third Dimension: Architecture Exploration Tools for 3D Reconfigurable Acceleration Devices

Andrew BoutrosFatemehsadat MahmoudiAmin MohagheghStephen MoreVaughn Betz

Engineering, Computer Science

2023 International Conference on Field…

2023

The RAD-Gen framework is extended by integrating an upgraded version of the COFFE automatic transistor sizing tool that supports 7 nm FinFETs with a more accurate, metal-aware area model for newer process technologies and new tools in RAD-Gen for modeling the inter-die connections and power distribution networks of 3D architectures are implemented.

One Reference

Intel 4 CMOS Technology Featuring Advanced FinFET Transistors optimized for High Density and High-Performance Computing

B. SellS. AnN. Young

Engineering, Computer Science

2022 IEEE Symposium on VLSI Technology and…

2022

A new advanced CMOS FinFET technology, Intel 4, is introduced that extends Moore’s law by offering 2X area scaling of the high performance logic library and greater than 20% performance gain at…

Related Papers

Showing 1 through 3 of 0 Related Papers

Movatterモバイル変換

Meteor Lake and Arrow Lake Intel Next-Gen 3D Client Architecture Platform with Foveros

Topics

23 Citations

Towards Real-Time LLM Inference on Heterogeneous Edge Platforms

HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing

EIGEN: Enabling Efficient 3DIC Interconnect with Heterogeneous Dual-Layer Network-on-Active-Interposer

Benchmarking Edge AI Platforms for High-Performance ML Inference

Performance-Energy Characterization of ML Inference on Heterogeneous Edge AI Platforms

37.4 SHINSAI: A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory

Architecting Selective Refresh based Multi-Retention Cache for Heterogeneous System (ARMOUR)

Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization

Hello SME! Generating Fast Matrix Multiplication Kernels Using the Scalable Matrix Extension

Into the Third Dimension: Architecture Exploration Tools for 3D Reconfigurable Acceleration Devices

One Reference

Intel 4 CMOS Technology Featuring Advanced FinFET Transistors optimized for High Density and High-Performance Computing

Related Papers