Movatterモバイル変換

[0]ホーム

Jump to content

ARM big.LITTLE

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromDynamIQ)

Heterogeneous computing architecture

Cortex A57/A53 MPCore big.LITTLE CPU chip

ARM big.LITTLE is aheterogeneous computing architecture developed byArm Holdings, coupling relatively battery-saving and slower processor cores (LITTLE) with relatively more powerful and power-hungry ones (big). The intention is to create amulti-core processor that can adjust better to dynamic computing needs and use less power thanclock scaling alone. ARM's marketing material promises up to a 75% savings in power usage for some activities.^[1] Most commonly, ARM big.LITTLE architectures are used to create amulti-processor system-on-chip (MPSoC).

In October 2011, big.LITTLE was announced along with theCortex-A7, which was designed to bearchitecturally compatible with theCortex-A15.^[2] In October 2012 ARM announced theCortex-A53 andCortex-A57 (ARMv8-A) cores, which are also intercompatible to allow their use in a big.LITTLE chip.^[3] ARM later announced theCortex-A12 atComputex 2013 followed by theCortex-A17 in February 2014. Both the Cortex-A12 and the Cortex-A17 can also be paired in a big.LITTLE configuration with the Cortex-A7.^[4]^[5]

The problem that big.LITTLE solves

[edit]

For a given library ofCMOS logic, active power increases as the logic switches more per second, while leakage increases with the number of transistors. So, CPUs designed to run fast are different from CPUs designed to save power. When a very fastout-of-order CPU is idling at very low speeds, a CPU with much less leakage (fewer transistors) could do the same work. For example, it might use a smaller (fewer transistors)memory cache, or a simpler microarchitecture such as removingout-of-order execution. big.LITTLE is a way to optimize for both cases: Power and speed, in the same system.

In practice, a big.LITTLE system can be surprisingly inflexible. One issue is the number and types of power and clock domains that the IC provides. These may not match the standard power management features offered by an operating system. Another is that the CPUs no longer have equivalent abilities, and matching the right software task to the right CPU becomes more difficult. Most of these problems are being solved by making the electronics and software more flexible.

Run-state migration

[edit]

There are three ways^[6] for the different processor cores to be arranged in a big.LITTLE design, depending on thescheduler implemented in thekernel.^[7]

Clustered switching

[edit]

The clustered model approach is the first and simplest implementation, arranging the processor into identically sized clusters of "big" or "LITTLE" cores. The operating system scheduler can only see one cluster at a time; when theload on the whole processor changes between low and high, the system transitions to the other cluster. All relevant data are then passed through the commonL2 cache, the active core cluster is powered off and the other one is activated. A Cache Coherent Interconnect (CCI) is used. This model has been implemented in theSamsung Exynos 5 Octa (5410).^[8]

In-kernel switcher (CPU migration)

[edit]

CPU migration via the in-kernel switcher (IKS) involves pairing up a 'big' core with a 'LITTLE' core, with possiblymany identical pairs in one chip. Each pair operates as one so-termedvirtual core, and only one real core is (fully) powered up and running at a time. The 'big' core is used when the demand is high and the 'LITTLE' core is employed when demand is low. When demand on the virtual core changes (between high and low), the incoming core is powered up,running state is transferred, the outgoing is shut down, and processing continues on the new core. Switching is done via thecpufreq framework. A complete big.LITTLE IKS implementation was added in Linux 3.11. big.LITTLE IKS is an improvement of cluster migration (§ Clustered switching), the main difference being that each pair is visible to the scheduler.

A more complex arrangement involves a non-symmetric grouping of 'big' and 'LITTLE' cores. A single chip could have one or two 'big' cores and many more 'LITTLE' cores, or vice versa. Nvidia created something similar to this with the low-power 'companion core' in theirTegra 3 System-on-Chip.

Heterogeneous multi-processing (global task scheduling)

[edit]

Big.Little heterogeneous multi-processing

The most powerful use model of big.LITTLE architecture isheterogeneous multi-processing (HMP), which enables the use of all physical cores at the same time.Threads withhigh priority or computational intensity can in this case be allocated to the "big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores.^[9]

This model has been implemented in theSamsung Exynos starting with the Exynos 5 Octa series (5420, 5422, 5430),^[10]^[11] andApple A series processors starting with theApple A11.^[12]

Scheduling

[edit]

The paired arrangement allows for switching to be done transparently to theoperating system using the existingdynamic voltage andfrequency scaling (DVFS) facility. The existing DVFS support in the kernel (e.g.cpufreq in Linux) will simply see a list of frequencies/voltages and will switch between them as it sees fit, just like it does on the existing hardware. However, the low-end slots will activate the 'Little' core and the high-end slots will activate the 'Big' core. This is the early solution provided by Linux's "deadline" CPU scheduler (not to be confused with the I/O scheduler with the same name) since 2012.^[13]

Alternatively, all the cores may be exposed to thekernel scheduler, which will decide where each process/thread is executed. This will be required for the non-paired arrangement but could possibly also be used on the paired cores. It poses unique problems for the kernel scheduler, which, at least with modern commodity hardware, has been able to assume all cores in aSMP system are equal rather than heterogeneous. A 2019 addition to Linux 5.0 calledEnergy Aware Scheduling is an example of a scheduler that considers cores differently.^[14]^[15]

Advantages of global task scheduling

[edit]

Finer-grained control of workloads that are migrated between cores. Because the scheduler is directly migrating tasks between cores, kerneloverhead is reduced andpower savings can be correspondingly increased.
Implementation in the scheduler also makes switching decisions faster than in the cpufreq framework implemented in IKS.
The ability to easily support non-symmetrical clusters (e.g. with 2 Cortex-A15 cores and 4 Cortex-A7 cores).
The ability to use all cores simultaneously to provide improved peak performance throughput of the SoC compared to IKS.

Successor

[edit]

In May 2017, ARM announced DynamIQ as the successor to big.LITTLE.^[16] DynamIQ is expected to allow for more flexibility and scalability when designing multi-core processors. In contrast to big.LITTLE, it increases the maximum number of cores in a cluster to 8 for Armv8.2 CPUs, 12 for Armv9 and 14 for Armv9.2^[17] and allows for varying core designs within a single cluster, and up to 32 total clusters. The technology also offers more fine grained per core voltage control and faster L2 cache speeds. However, DynamIQ is incompatible with previous ARM designs and is initially only supported by theCortex-A75 andCortex-A55 CPU cores and their successors.