AltiVec

AltiVec is a single-precisionfloating point and integerSIMD instruction set designed and owned byApple,IBM, andFreescale Semiconductor (formerlyMotorola's Semiconductor Products Sector) — theAIM alliance. It is implemented on versions of thePowerPC processor architecture, including Motorola'sG4,IBM'sG5 andPOWER6 processors, andP.A. Semi'sPWRficient PA6T. AltiVec is atrademark owned solely by Freescale, so the system is also referred to asVelocity Engine by Apple andVMX (Vector Multimedia Extension) by IBM and P.A. Semi.

While AltiVec refers to an instruction set, the implementations in CPUs produced by IBM and Motorola are separate in terms of logic design. To date, no IBM core has included an AltiVec logic design licensed from Motorola or vice versa.

AltiVec is a standard part of thePower ISA v.2.03^[1] specification. It was never formally a part of the PowerPC architecture until this specification although it usedPowerPC instruction formats and syntax and occupied theopcode space expressly allocated for such purposes.

Comparison to x86-64 SSE

edit

Both VMX/AltiVec andSSE feature 128-bit vector registers that can represent sixteen 8-bit signed or unsigned chars, eight 16-bit signed or unsigned shorts, four 32-bit ints or four32-bit floating-point variables. Both providecache-control instructions intended to minimizecache pollution when working on streams of data.

They also exhibit important differences. UnlikeSSE2, VMX/AltiVec supports a specialRGB "pixel" data type, but it does not operate on 64-bit double-precision floats, and there is no way to move data directly between scalar andvector registers. In keeping with the "load/store" model of the PowerPC'sRISC design, the vector registers, like the scalar registers, can only be loaded from and stored to memory. However, VMX/AltiVec provides a much more complete set of "horizontal" operations that work across all the elements of a vector; the allowable combinations of data type and operations are much more complete. Thirty-two 128-bit vector registers are provided, compared to eight for SSE and SSE2 (extended to 16 inx86-64), and most VMX/AltiVec instructions take three register operands compared to only two register/register or register/memory operands onIA-32.

VMX/AltiVec is also unique in its support for a flexible vectorpermute instruction, in which each byte of a resulting vector value can be taken from any byte of either of two other vectors, parametrized by yet another vector. This allows for sophisticated manipulations in a single instruction.

Recent versions^[when?] of theGNU Compiler Collection (GCC),IBM VisualAge compiler and other compilers provideintrinsics to access VMX/AltiVec instructions directly fromC andC++ programs. As of version 4, the GCC also includesauto-vectorization capabilities that attempt to intelligently create VMX/Altivec accelerated binaries without the need for the programmer to use intrinsics directly. The "vector" type keyword is introduced to permit the declaration of native vector types, e.g., "vector unsigned char foo;" declares a 128-bit vector variable named "foo" containing sixteen 8-bit unsigned chars. The full complement of arithmetic and binary operators is defined on vector types so that the normal C expression language can be used to manipulate vector variables. There are also overloaded intrinsic functions such as "vec_add" that emit the appropriateopcode based on the type of the elements within the vector, and very strong type checking is enforced. In contrast, the Intel-defined data types for IA-32 SIMD registers declare only the size of the vector register (128 or 64 bits) and in the case of a 128-bit register, whether it contains integers or floating-point values. The programmer must select the appropriate intrinsic for the data types in use, e.g., "_mm_add_epi16(x,y)" for adding two vectors containing eight 16-bit integers.

Development history

edit

The Power Vector Media Extension (VMX) was developed between 1996 and 1998 by a collaborative project between Apple, IBM, and Motorola. Apple was the primary customer for Power Vector Media Extension (VMX) until Apple switched to Intel-made, x86-based CPUs on June 6, 2005. They used it to acceleratemultimedia applications such asQuickTime,iTunes and key parts of Apple'sMac OS X including in theQuartz graphics compositor. Other companies such as Adobe used AltiVec to optimize their image-processing programs such asAdobe Photoshop. Motorola was the first to supply AltiVec enabled processors starting with their G4 line. AltiVec was also used in some embedded systems for high-performance digital signal processing.

IBM consistently left VMX out of their earlierPOWER microprocessors, which were intended for server applications where it was not very useful. ThePOWER6 microprocessor, introduced in 2007, implements AltiVec. The last desktop microprocessor from IBM, thePowerPC 970 (dubbed the "G5" by Apple) also implemented AltiVec with hardware similar to that of thePowerPC 7400.

AltiVec is a brandname trademarked by Freescale (previously Motorola) for the standardCategory:Vector part of thePower ISA v.2.03^[1] specification. This category is also known as VMX (used by IBM), and "Velocity Engine" (a brand name previously used by Apple).

TheCell Broadband Engine, used in (amongst other things) thePlayStation 3, also supports Power Vector Media Extension (VMX) in its PPU, with the SPU ISA being enhanced but architecturally similar.

Freescale is bringing an enhanced version of AltiVec toe6500 basedQorIQ processors.

VMX128

edit

IBM enhanced VMX for use inXenon (Xbox 360) and called this enhancement VMX128. The enhancements comprise new routines targeted at gaming (accelerating 3D graphics and game physics)^[2] and a total of 128 registers. VMX128 is not entirely compatible with VMX/Altivec, as a number of integer operations were removed to make space for the larger register file and additional application-specific operations.^[3]^[4]

VSX (Vector Scalar Extension)

edit

Power ISA v2.06 introduced VSX vector-scalar instructions^[5] which extend SIMD processing for the Power ISA to support up to 64 registers, with support for regular floating point, decimal floating point and vector execution.POWER7 is the first Power ISA processor to implement Power ISA v2.06.

New instructions are introduced by IBM under the Vector Media Extension category for integer operations as part of the VSX extension in Power ISA 2.07.

New integer vector instructions were introduced by IBM following the VMX encodings as part of the VSX extension in Power ISA v3.0. Shall be introduced withPOWER9 processors.^[6]

Issues

edit

In C++, the standard way of accessing AltiVec support is mutually exclusive with the use of the Standard Template Libraryvector<> class template due to the treatment of "vector" as a reserved word when the compiler does not implement the context-sensitive keyword version of vector. However, it may be possible to combine them using compiler-specific workarounds; for instance, in GCC one may do#undef vector to remove thevector keyword, and then use the GCC-specific__vector keyword in its place.

AltiVec prior to Power ISA 2.06 with VSX lacks loading from memory using a type's natural alignment. For example, the code below requires special handling for Power6 and below when the effective address is not 16-byte aligned. The special handling adds 3 additional instructions to a load operation when VSX is not available.

#include<altivec.h>typedef__vectorunsignedcharuint8x16_p;typedef__vectorunsignedintuint32x4_p;...intmain(intargc,char*argv){/* Natural alignment of vals is 4; and not 16 as required */unsignedintvals[4]={1,2,3,4};uint32x4_pvec;#if defined(__VSX__) || defined(_ARCH_PWR8)vec=vec_xl(0,vals);#elseconstuint8x16_pperm=vec_lvsl(0,vals);constuint8x16_plow=vec_ld(0,vals);constuint8x16_phigh=vec_ld(15,vals);vec=(uint32x4_p)vec_perm(low,high,perm);#endif}

AltiVec prior to Power ISA 2.06 with VMX lacks 64-bit integer support. Developers who wish to operate on 64-bit data will develop routines from 32-bit components. For example, below are examples of 64-bitadd andsubtract in C using a vector with four 32-bit words on abig-endian machine. The permutes move the carry and borrow bits from columns 1 and 3 to columns 0 and 2 like in school-book math. A little-endian machine would need a different mask.

#include<altivec.h>typedef__vectorunsignedcharuint8x16_p;typedef__vectorunsignedintuint32x4_p;.../* Performs a+b as if the vector held two 64-bit double words */uint32x4_padd64(constuint32x4_pa,constuint32x4_pb){constuint8x16_pcmask={4,5,6,7,16,16,16,16,12,13,14,15,16,16,16,16};constuint32x4_pzero={0,0,0,0};uint32x4_pcy=vec_addc(vec1,vec2);cy=vec_perm(cy,zero,cmask);returnvec_add(vec_add(vec1,vec2),cy);}/* Performs a-b as if the vector held two 64-bit double words */uint32x4_psub64(constuint32x4_pa,constuint32x4_pb){constuint8x16_pbmask={4,5,6,7,16,16,16,16,12,13,14,15,16,16,16,16};constuint32x4_pamask={1,1,1,1};constuint32x4_pzero={0,0,0,0};uint32x4_pbw=vec_subc(vec1,vec2);bw=vec_andc(amask,bw);bw=vec_perm(bw,zero,bmask);returnvec_sub(vec_sub(vec1,vec2),bw);}

Power ISA 2.07 used in Power8 finally provided the 64-bit double words. A developer working with Power8 needs only to perform the following.

#include<altivec.h>typedef__vectorunsignedlonglonguint64x2_p;.../* Performs a+b using native vector 64-bit double words */uint64x2_padd64(constuint64x2_pa,constuint64x2_pb){returnvec_add(a,b);}/* Performs a-b using native vector 64-bit double words */uint64x2_psub64(constuint64x2_pa,constuint64x2_pb){returnvec_sub(a,b);}