Movatterモバイル変換

[0]ホーム

Jump to content

Hyper-threading

Edit links

From Wikipedia, the free encyclopedia

Proprietary simultaneous multithreading implementation by Intel

In this high-level depiction of HTT, instructions are fetched from RAM (differently colored boxes represent the instructions of four differentprocesses), decoded and reordered by the front end (white boxes representpipeline bubbles), and passed to the execution core capable of executing instructions from two different programs during the sameclock cycle.^[1]^[2]^[3]

Hyper-threading (officially calledHyper-Threading Technology orHT Technology and abbreviated asHTT orHT) isIntel'sproprietary simultaneous multithreading (SMT) implementation used to improveparallelization of computations (doing multiple tasks at once) performed onx86 microprocessors. It was introduced onXeon serverprocessors in February 2002 and onPentium 4 desktop processors in November 2002.^[4] Since then, Intel has included this technology inItanium,Atom, andCore series CPUs, among others.^[5]

For eachprocessor core that is physically present, theoperating system addresses two virtual (logical) cores and shares the workload between them when possible. The main function of hyper-threading is to increase the number of independent instructions in the pipeline; it takes advantage ofsuperscalar architecture, in which multiple instructions operate on separate datain parallel. With HTT, one physical core appears as two processors to the operating system, allowingconcurrent scheduling of two processes per core. In addition, two or more processes can use the same resources: If resources for one process are not available, then another process can continue if its resources are available.

In addition to requiring simultaneous multithreading support in the operating system, hyper-threading can be properly utilized only with an operating system specifically optimized for it.^[6]

Overview

[edit]

A 3 GHz model of the Intel Pentium 4 processor that incorporates Hyper-Threading Technology^[7]

Hyper-Threading Technology is a form of simultaneousmultithreading technology introduced by Intel, while the concept behind the technology has been patented bySun Microsystems. Architecturally, a processor with Hyper-Threading Technology consists of two logical processors per core, each of which has its own processor architectural state. Each logical processor can be individually halted, interrupted or directed to execute a specified thread, independently from the other logical processor sharing the same physical core.^[8]

Unlike a traditional dual-processor configuration that uses two separate physical processors, the logical processors in a hyper-threaded core share the execution resources. These resources include the execution engine, caches, and system bus interface; the sharing of resources allows two logical processors to work with each other more efficiently, and allows a logical processor to borrow resources from a stalled logical core (assuming both logical cores are associated with the same physical core). A processor stalls when it must wait for data it has requested, in order to finish processing the present thread. The degree of benefit seen when using a hyper-threaded, or multi-core, processor depends on the needs of the software, and how well it and the operating system are written to manage the processor efficiently.^[8]

Hyper-threading works by duplicating certain sections of the processor—those that store thearchitectural state—but not duplicating the mainexecution resources. This allows a hyper-threading processor to appear as the usual "physical" processor plus an extra "logical" processor to the host operating system (HTT-unaware operating systems see two "physical" processors), allowing the operating system to schedule two threads or processes simultaneously and appropriately. When execution resources in a hyper-threaded processor are not in use by the current task, and especially when the processor is stalled, those execution resources can be used to execute another scheduled task. (The processor may stall due to acache miss,branch misprediction, ordata dependency.)^[9]

This technology is transparent to operating systems and programs. The minimum that is required to take advantage of hyper-threading issymmetric multiprocessing (SMP) support in the operating system, since the logical processors appear no different to the operating system than physical processors.

It is possible to optimize operating system behavior on multi-processor, hyper-threading capable systems. For example, consider an SMP system with two physical processors that are both hyper-threaded (for a total of four logical processors). If the operating system's threadscheduler is unaware of hyper-threading, it will treat all four logical processors the same. If only two threads are eligible to run, it might choose to schedule those threads on the two logical processors that happen to belong to the same physical processor. That processor would be extremely busy, and would share execution resources, while the other processor would remain idle, leading to poorer performance than if the threads were scheduled on different physical processors. This problem can be avoided by improving the scheduler to treat logical processors differently from physical processors, which is, in a sense, a limited form of the scheduler changes required forNUMA systems.

History

[edit]

The first published paper describing what is now known as hyper-threading in a general purpose computer was written by Edward S. Davidson and Leonard. E. Shar in 1973.^[10]

Denelcor, Inc. introducedmulti-threading with theHeterogeneous Element Processor (HEP) in 1982. The HEP pipeline could not hold multiple instructions from the same process. Only one instruction from a given process was allowed to be present in the pipeline at any point in time. Should an instruction from a given process block the pipe, instructions from other processes would continue after the pipeline drained.

The US patent for the technology behind hyper-threading was granted to Kenneth Okin atSun Microsystems in November 1994. At that time,CMOS process technology was not advanced enough to allow for a cost-effective implementation.^[11]

Intel implemented hyper-threading on an x86 architecture processor in 2002 with the Foster MP-basedXeon. It was also included on the 3.06 GHz Northwood-based Pentium 4 in the same year, and then remained as a feature in every Pentium 4 HT, Pentium 4 Extreme Edition and Pentium Extreme Edition processor since. The Intel Core & Core 2 processor lines (2006) that succeeded the Pentium 4 model line didn't utilize hyper-threading. The processors based on theCore microarchitecture did not have hyper-threading because the Core microarchitecture was a descendant of the olderP6 microarchitecture. The P6 microarchitecture was used in earlier iterations of Pentium processors, namely, thePentium Pro,Pentium II andPentium III (plus theirCeleron &Xeon derivatives at the time).Windows 2000 SP3 andWindows XP SP1 have added support for hyper-threading.

Intel released theNehalem microarchitecture (Core i7) in November 2008, in which hyper-threading made a return. The first generation Nehalem processors contained four physical cores and effectively scaled to eight threads. Since then, both two- and six-core models have been released, scaling four and twelve threads respectively.^[12] EarlierIntel Atom cores were in-order processors, sometimes with hyper-threading ability, for low power mobile PCs and low-price desktop PCs.^[13] TheItanium 9300 launched with eight threads per processor (two threads per core) through enhanced hyper-threading technology. The next model, the Itanium 9500 (Poulson), features a 12-wide issue architecture, with eight CPU cores with support for eight more virtual cores via hyper-threading.^[14] The Intel Xeon 5500 server chips also utilize two-way hyper-threading.^[15]^[16]

Performance claims

[edit]

According to Intel, the first hyper-threading implementation used only 5% moredie area than the comparable non-hyperthreaded processor, but the performance was 15–30% better.^[17]^[18] Intel claims up to a 30% performance improvement compared with an otherwise identical, non-simultaneous multithreading Pentium 4.Tom's Hardware states: "In some cases a P4 running at 3.0 GHz with HT on can even beat a P4 running at 3.6 GHz with HT turned off."^[19] Intel also claims significant performance improvements with a hyper-threading-enabled Pentium 4 processor in some artificial-intelligence algorithms.

Overall the performance history of hyper-threading was a mixed one in the beginning. As one commentary on high-performance computing from November 2002 notes:^[20]

Hyper-Threading can improve the performance of someMPI applications, but not all. Depending on the cluster configuration and, most importantly, the nature of the application running on the cluster, performance gains can vary or even be negative. The next step is to use performance tools to understand what areas contribute to performance gains and what areas contribute to performance degradation.

As a result, performance improvements are very application-dependent;^[21] however, when running two programs that require full attention of the processor, it can actually seem like one or both of the programs slows down slightly when Hyper-Threading Technology is turned on.^[22] This is due to thereplay system of the Pentium 4 tying up valuable execution resources, equalizing the processor resources between the two programs, which adds a varying amount of execution time. The Pentium 4 "Prescott" and the Xeon "Nocona" processors received a replay queue that reduces execution time needed for the replay system and completely overcomes the performance penalty.^[23]

According to a November 2009 analysis by Intel, performance impacts of hyper-threading result in increased overall latency in case the execution of threads does not result in significant overall throughput gains, which vary^[21] by the application. In other words, overall processing latency is significantly increased due to hyper-threading, with the negative effects becoming smaller as there are more simultaneous threads that can effectively use the additional hardware resource utilization provided by hyper-threading.^[24] A similar performance analysis is available for the effects of hyper-threading when used to handle tasks related to managing network traffic, such as for processinginterrupt requests generated bynetwork interface controllers (NICs).^[25] Another paper claims no performance improvements when hyper-threading is used for interrupt handling.^[26]

Drawbacks

[edit]

When the first HT processors were released, many operating systems were not optimized for hyper-threading technology (e.g. Windows 2000 and Linux older than 2.4).^[27]

In 2006, hyper-threading was criticised for energy inefficiency.^[28] For example,ARM (a specialized, low-power, CPU design company), stated that simultaneous multithreading can use up to 46% more power than ordinary dual-core designs. Furthermore, they claimed that SMT increasescache thrashing by 42%, whereasdual core results in a 37% decrease.^[29]

In 2010, ARM said it might include simultaneous multithreading in its future chips;^[30] however, this was rejected in favor of their 2012 64-bit design.^[31] ARM produced SMT cores in 2018.^[32]

In 2013, Intel dropped SMT in favor ofout-of-order execution for itsSilvermont processor cores, as they found this gave better performance with better power efficiency than a lower number of cores with SMT.^[33]

In 2017, it was revealed that Intel'sSkylake andKaby Lake processors had a bug in their implementation of hyper-threading that could cause data loss.^[34]Microcode updates were later released to address the issue.^[35]

In 2019, withCoffee Lake, Intel temporarily moved away from including hyper-threading in mainstream Core i7 desktop processors except for highest-end Core i9 parts or Pentium Gold CPUs.^[36] It also began to recommend disabling hyper-threading, asnew CPU vulnerability attacks were revealed which could be mitigated by disabling HT.^[37]

Security

[edit]

In May 2005,Colin Percival demonstrated that a malicious thread on a Pentium 4 can use a timing-basedside-channel attack to monitor thememory access patterns of another thread with which it shares a cache, allowing the theft of cryptographic information. This is not actually atiming attack, as the malicious thread measures the time of only its own execution. Potential solutions to this include the processor changing its cache eviction strategy or the operating system preventing the simultaneous execution, on the same physical core, of threads with different privileges.^[38] In 2018 theOpenBSD operating system disabled hyper-threading "in order to avoid data potentially leaking from applications to other software" caused by theForeshadow/L1TF vulnerabilities.^[39]^[40] In 2019 aset of vulnerabilities led to security experts recommending the disabling of hyper-threading on all devices.^[41]

References

[edit]

^Stokes, Jon (3 October 2002)."Introduction to Multithreading, Superthreading and Hyperthreading".Ars Technica. pp. 2–3. Retrieved30 September 2015.
^Deborah T. Marr; Frank Binns; David L. Hill; Glenn Hinton; David A. Koufaty; J. Alan Miller; Michael Upton (12 December 2006)."Hyper-Threading Technology Architecture and Microarchitecture"(PDF).cs.sfu.ca. Archived fromthe original(PDF) on 23 September 2015. Retrieved30 September 2015.
^Anand Lal Shimpi (5 October 2012)."The Haswell Front End – Intel's Haswell Architecture Analyzed".AnandTech. Archived fromthe original on 7 October 2012. Retrieved30 September 2015.
^"Intel Pentium 4 3.06GHz CPU with Hyper-Threading Technology: Killing Two Birds with a Stone." X-bit labs. Archived fromthe original on 31 May 2014. Retrieved4 June 2014.
^"Intel® Hyper-Threading Technology (Intel® HT Technology)". Intel. Retrieved24 October 2021.
^Intel Required Components Interchangeability List for the Intel Pentium 4 Processor with HT Technology, includes list of Operating Systems that include optimizations for Hyper-Threading Technology; they are Windows XP Professional 64, Windows XP MCE, Windows XP Home, Windows XP Professional, some versions of Linux such as COSIX Linux 4.0, RedHat Linux 9 (Professional and Personal versions), RedFlag Linux Desktop 4.0 and SuSe Linux 8.2 (Professional and Personal versions)
^"Intel Processor Spec Finder: SL6WK".
^^a ^bThomadakis, Michael E. (17 March 2011)."The Architecture of the Nehalem Processor and Nehalem-EP SMP Platforms"(PDF). Texas A&M University. p. 23. Archived fromthe original(PDF) on 11 August 2014. Retrieved21 March 2014.
^Hennessy, John L.; Patterson, David A. (7 December 2017).Computer Architecture: A Quantitative Approach.Asanović, Krste, Bakos, Jason D., Colwell, Robert P., Bhattacharjee, Abhishek, 1984-, Conte, Thomas M., 1964- (Sixth ed.). Cambridge, MA.ISBN 978-0128119051.OCLC 983459758.{{cite book}}: CS1 maint: location missing publisher (link)
^"A multiminiprocessor system implemented through pipelining", by Leonard Shar and Edward Davidson, IEEE Computer, Feb. 1974, pp. 42-51, vol. 7https://www.computer.org/csdl/magazine/co/1974/02/4251/13rRUyoyhIt
^Okin, Kenneth (1 November 1994),United States Patent: 5361337 - Method and apparatus for rapidly switching processes in a computer system, archived fromthe original on 21 September 2015, retrieved24 May 2016
^"Extreme Gaming with the Intel® Core™ i7 Processor Extreme Edition".www.intel.com. Archived fromthe original on 1 December 2008.
^"Intel® Atom™ Processor Microarchitecture". Intel.com. 18 March 2011. Retrieved5 April 2011.
^"Intel Discloses New Itanium Poulson Features". Tomshardware.com. 24 August 2011. Retrieved2 July 2017.
^"Server Processor Index Page". Intel.com. 18 March 2011. Retrieved5 April 2011.
^"Intel Xeon Processor 5500 Series". Intel.com. Retrieved5 April 2011.
^"Hyper-Threading Technology"(PDF).Intel Technology Journal.06 (1). 14 February 2012.ISSN 1535-766X. Archived fromthe original(PDF) on 19 October 2012.
^"How to Determine the Effectiveness of Hyper-Threading Technology with an Application".software.intel.com. 28 April 2011. Archived fromthe original on 2 February 2010.
^"Summary: In Some Cases The P4 3.0HT Can Even Beat The 3.6 GHz Version : Single CPU in Dual Operation: P4 3.06 GHz with Hyper-Threading Technology". Tomshardware.com. 14 November 2002. Retrieved5 April 2011.
^Tau Leng; Rizwan Ali; Jenwei Hsieh; Christopher Stanton (November 2002)."A Study of Hyper-Threading in High-Performance Computing Clusters"(PDF). Dell. p. 4. Retrieved12 November 2012.
^^a ^bJoel Hruska (24 July 2012)."Maximized performance: Comparing the effects of Hyper-Threading, software updates".extremetech.com. Retrieved2 March 2015.
^"CPU Performance Evaluation - Benchmark - Pentium 4 2.8 and 3.0".users.telenet.be. Archived fromthe original on 24 February 2021. Retrieved12 April 2011.
^"Replay: Unknown Features of the NetBurst Core. Page 15".Replay: Unknown Features of the NetBurst Core. Xbitlabs. Archived fromthe original on 14 May 2011. Retrieved24 April 2011.
^Valles, Antonio (20 November 2009)."Performance Insights to Intel Hyper-Threading Technology".Intel. Archived fromthe original on 17 February 2015. Retrieved26 February 2015.
^"Network Tuning and Performance".calomel.org. 12 November 2013. Retrieved26 February 2015.
^"Linux kernel documentation: Scaling in the Linux Networking Stack".kernel.org. 1 December 2014. Retrieved2 March 2015.Per-cpu load can be observed using the mpstat utility, but note that on processors with hyperthreading (HT), each hyperthread is represented as a separate CPU. For interrupt handling, HT has shown no benefit in initial tests, so limit the number of queues to the number of CPU cores in the system.
^"Hyper-Threading Technology – Operating systems that include optimizations for Hyper-Threading Technology". Intel.com. 19 September 2011. Retrieved29 February 2012.
^Sustainable Practices: Concepts, Methodologies, Tools and Applications. Information Resources Management Association. December 2013. p. 666.ISBN 9781466648524.
^"ARM is no fan of HyperThreading". theinquirer.net. 2 August 2006. Archived from the original on 6 September 2009. Retrieved29 February 2012.
^Jermoluk, Tom (13 October 2010)."About MIPS and MIPS | TOP500 Supercomputing Sites".Top500.org. Archived fromthe original on 13 June 2011. Retrieved5 April 2011.
^"ARM launches first 64bit processor core for servers and smartphones".Tech Design Forum. 30 October 2012.
^"Arm launches first SMT-capable Cortex core | bit-tech.net".bit-tech.net. Retrieved2 December 2023.
^Rik Myslewski (8 May 2013)."Deep inside Intel's first viable mobile processor: Silvermont".The Register. Retrieved13 January 2014.
^Chirgwin, Richard (25 June 2017)."Intel's Skylake and Kaby Lake CPUs have nasty hyper-threading bug".The Register. Retrieved4 July 2017.
^"Skylake, Kaby Lake Chips Have a Crash Bug with Hyperthreading Enabled".Ars Technica. 26 June 2017. Retrieved25 November 2017.
^Cutress, Ian (23 April 2019)."Intel 9th Gen Core Processors: All the Desktop and Mobile 45W CPUs Announced".AnandTech. Archived fromthe original on 23 April 2019.
^Armasu, Lucian (14 May 2019)."Intel's New Spectre-Like Flaw Affects Chips Made Since 2008".Tom's Hardware. Archived fromthe original on 4 August 2019.
^Percival, Colin (14 May 2005)."Cache Missing for Fun and Profit"(PDF).Daemonology.net. Retrieved14 June 2016.
^"OpenBSD disables Intel's hyper-threading over CPU data leak fears". Retrieved24 August 2018.
^"'Disable SMT/Hyperthreading in all Intel BIOSes' - MARC".marc.info. Retrieved24 August 2018.
^Greenberg, Andy (14 May 2019)."Meltdown Redux: Intel Flaw Lets Hackers Siphon Secrets from Millions of PCs".WIRED. Retrieved14 May 2019.

External links

[edit]

Intel Demonstrates Breakthrough Processor Design, a press release from August 2001
Intel – high level overview of Hyper-threading
Hyper-threading on MSDN Magazine
introductory article from Ars Technica
US Patent Number 4,847,755
Merom, Conroe, Woodcrest lose HyperThreading
ZDnet: Hyperthreading hurts server performance, say developers
ARM is no fan of HyperThreading - Outlines problems of SMT solutions
The Impact of Hyper-Threading on Processor Resource Utilization in Production Applications

Processor technologies

Models

Architecture

Instruction set
architectures

Types	Orthogonal instruction set CISC RISC Application-specific EDGE TRIPS VLIW EPIC MISC OISC NISC ZISC VISC architecture Quantum computing Comparison Addressing modes
Instruction sets	Motorola 68000 series VAX PDP-11 x86 ARM Stanford MIPS MIPS MIPS-X Power POWER PowerPC Power ISA Clipper architecture SPARC SuperH DEC Alpha ETRAX CRIS M32R Unicore Itanium OpenRISC RISC-V MicroBlaze LMC System/3x0 S/360 S/370 S/390 z/Architecture Tilera ISA VISC architecture Epiphany architecture Others

Execution

Instruction pipelining	Pipeline stall Operand forwarding Classic RISC pipeline
Hazards	Data dependency Structural Control False sharing
Out-of-order	Scoreboarding Tomasulo's algorithm Reservation station Re-order buffer Register renaming Wide-issue
Speculative	Branch prediction Memory dependence prediction

Parallelism

Level	Bit Bit-serial Word Instruction Pipelining Scalar Superscalar Task Thread Process Data Vector Memory Distributed
Multithreading	Temporal Simultaneous Hyperthreading Simultaneous and heterogenous Speculative Preemptive Cooperative
Flynn's taxonomy	SISD SIMD Array processing (SIMT) Pipelined processing Associative processing SWAR MISD MIMD SPMD

Processor
performance

Transistor count
Instructions per cycle (IPC)
- Cycles per instruction (CPI)
Instructions per second (IPS)
Floating-point operations per second (FLOPS)
Transactions per second (TPS)
Synaptic updates per second (SUPS)
Performance per watt (PPW)
Cache performance metrics
Computer performance by orders of magnitude

Types

By application	Embedded system Microprocessor Microcontroller Mobile Ultra-low-voltage ASIP Soft microprocessor
Systems on chip	System on a chip (SoC) Multiprocessor (MPSoC) Cypress PSoC Network on a chip (NoC)
Hardware accelerators	Coprocessor AI accelerator Graphics processing unit (GPU) Image processor Vision processing unit (VPU) Physics processing unit (PPU) Digital signal processor (DSP) Tensor Processing Unit (TPU) Secure cryptoprocessor Network processor Baseband processor

Word size

Core count

Components

Functional units	Arithmetic logic unit (ALU) Address generation unit (AGU) Floating-point unit (FPU) Memory management unit (MMU) Load–store unit Translation lookaside buffer (TLB) Branch predictor Branch target predictor Integrated memory controller (IMC) Memory management unit Instruction decoder
Logic	Combinational Sequential Glue Logic gate Quantum Array
Registers	Processor register Status register Stack register Register file Memory buffer Memory address register Program counter
Control unit	Hardwired control unit Instruction unit Data buffer Write buffer Microcode ROM Counter
Datapath	Multiplexer Demultiplexer Adder Multiplier CPU Binary decoder Address decoder Sum-addressed decoder Barrel shifter
Circuitry	Integrated circuit 3D Mixed-signal Power management Boolean Digital Analog Quantum Switch

Power
management

v t e Intel technology
Platforms	Centrino Centrino 2 Viiv MID Tablet CULV Ultrabook Skulltrail NUC Galileo Edison Curie Evo
Discontinued	Common Building Block MultiProcessor Specification Intel Communication Streaming Architecture Intel Inboard 386 Intel Play MMC-1 MMC-2
Current	Advanced Programmable Interrupt Controller CNVi Intel Turbo Boost vPro Intel Secure Key Intel Management Engine Active Management Technology AMT versions High-bandwidth Digital Content Protection High Definition Audio Hub Architecture Rapid Storage Technology SpeedStep Serial Digital Video Out Host Embedded Controller Interface Hyper-threading Omni-Path Platform Environment Control Interface QuickPath Interconnect Platform Controller Hub System Management Bus Thunderbolt Ultra Path Interconnect
Upcoming	Silicon Photonics Link

v t e Parallel computing
General	Distributed computing Parallel computing Parallel algorithm Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Cache stampede Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing