Meteor Lake and Arrow Lake Intel Next-Gen 3D Client Architecture Platform with Foveros

@article{Gomes2022MeteorLA,  title={Meteor Lake and Arrow Lake Intel Next-Gen 3D Client Architecture Platform with Foveros},  author={Wilfred Gomes and Slade Morgan and Boyd Phelps and Tim Wilson and Erik G. Hallnor},  journal={2022 IEEE Hot Chips 34 Symposium (HCS)},  year={2022},  pages={1-40},  url={https://api.semanticscholar.org/CorpusID:252551808}}

23 Citations

Towards Real-Time LLM Inference on Heterogeneous Edge Platforms

This work aims to address the gap in methodologies for maximizing performance through the distribution of model inference on a heterogeneous edge platform by proposing methodologies for maximizing performance through the distribution of model inference on a heterogeneous edge platform.

HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing

HAL, consisting of a hardware-based load balancer and an intelligent load balancing policy implemented inside the SNIC, can improve the system-wide energy efficiency and throughput of the server running these functions by 31% and 10%, respectively, without notably increasing the tail latency.

EIGEN: Enabling Efficient 3DIC Interconnect with Heterogeneous Dual-Layer Network-on-Active-Interposer

This work proposes a heterogeneous dual-layer interconnect architecture, EIGEN, for chiplet-interposer systems, along with a reinforcement learning (RL)-based routing framework, which can provide efficient and flexible communication for chiplet-based 3DICs.

Benchmarking Edge AI Platforms for High-Performance ML Inference

A comprehensive study comparing the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions finds that the Neural Processing Unit (NPU) excels in matrix-vector multiplication and some neural network tasks and GPU-based inference performs best with large dimensions and batch sizes.

Performance-Energy Characterization of ML Inference on Heterogeneous Edge AI Platforms

This study benchmarks a state-of-the-art heterogeneous platform consisting of a multicore CPU, integrated GPU and NPU, on its performance on various ML workloads and observes that hybrid NPU-GPU implementation of LLM inference is superior to CPU-only inference in terms of energy consumption and end-to-end (E2E) latency.

37.4 SHINSAI: A 586mm2 Reusable Active TSV Interposer with Programmable Interconnect Fabric and 512Mb 3D Underdeck Memory

A reusable active TSV-interposer, known as SHINSAI, in 28nm-CMOS/TSV-middle/micro-bump (μbump) technology is presented, enabling the transformation of traditional multicore monolithic systems into compact two-layer stacking architectures.

Architecting Selective Refresh based Multi-Retention Cache for Heterogeneous System (ARMOUR)

ARMOUR, a mechanism for efficient management of memory accesses to a multi-retention LLC, where based on the initial requester the cache blocks are allocated in the high (CPU) or low (GPU) retention zone, and blocks that are about to expire are either refreshed or written back (GPU).

Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization

This work presents a detailed, cycle-accurate quantitative analysis of the offload overheads on Occamy, an open-source massively parallel RISC-V based heterogeneous MPSoC, and proposes a quantitative model to estimate the runtime of selected applications accounting for the offload overheads.

Hello SME! Generating Fast Matrix Multiplication Kernels Using the Scalable Matrix Extension

This paper presents an in-depth study of SME on M4, and designs a just-in-time code generator for SME-based small matrix multiplications that outperform the vendor-optimized BLAS implementation in almost all tested configurations.

Into the Third Dimension: Architecture Exploration Tools for 3D Reconfigurable Acceleration Devices

The RAD-Gen framework is extended by integrating an upgraded version of the COFFE automatic transistor sizing tool that supports 7 nm FinFETs with a more accurate, metal-aware area model for newer process technologies and new tools in RAD-Gen for modeling the inter-die connections and power distribution networks of 3D architectures are implemented.

One Reference

Intel 4 CMOS Technology Featuring Advanced FinFET Transistors optimized for High Density and High-Performance Computing

    B. SellS. AnN. Young
    Engineering, Computer Science
    2022 IEEE Symposium on VLSI Technology and…
  • 2022
A new advanced CMOS FinFET technology, Intel 4, is introduced that extends Moore’s law by offering 2X area scaling of the high performance logic library and greater than 20% performance gain at

Related Papers

Showing 1 through 3 of 0 Related Papers