Changes

Jump to: navigation, search

AVX

5,425 bytes added, 14:00, 9 August 2018
Created page with "'''Home * Hardware * x86 * AVX''' '''Advanced Vector Extensions''' (AVX) is a 256 bit extension to the x86 and x86-64 SSE, SSE2, [[SSE3]..."
'''[[Main Page|Home]] * [[Hardware]] * [[x86]] * AVX'''

'''Advanced Vector Extensions''' (AVX) is a 256 bit extension to the [[x86]] and [[x86-64]] [[SSE]], [[SSE2]], [[SSE3]], [[SSSE3]], and [[SSE4]] [[SIMD and SWAR Techniques|SIMD]] instruction sets, announced by [[Intel]] in March 2008, and first released in January, 2011 with Intel's [https://en.wikipedia.org/wiki/Sandy_Bridge_%28microarchitecture%29 Sandy Bridge] architecture. With the [https://en.wikipedia.org/wiki/Bulldozer_%28microarchitecture%29 Bulldozer microarchitecture], [[AVX]] is also available on [[AMD]] processors <ref>[https://support.amd.com/TechDocs/26568.pdf AMD64 Architecture Programmer’s Manual, Volume 4: 128-Bit and 256-Bit Media Instructions] (pdf)</ref>, along with their own [[XOP]] extension on Bulldozer only.

AVX supports 256-bit wide SIMD registers (YMM0-YMM7 in operating modes that are 32-bit or less, YMM0-YMM15 in 64-bit mode) to keep floating point [[Array|vectors]] of either 8 [[Float|floats]] or 4 [[Double|doubles]] inside one register. The lower 128 bits of the YMM registers are aliased to the respective 128-bit XMM registers. AVX employs an instruction encoding scheme using a new [https://en.wikipedia.org/wiki/VEX_prefix VEX prefix], allowing a three-operand SIMD instruction format, where the destination register is distinct from the two source operands.

=Advantages of AVX=
AVX introduces expanded 256-bit versions of floating point instructions, which are typically not useful for chess programming. Though it does not yet expand the integer instructions to 256-bit, AVX does provide VEX-encoded versions of existing SSE 128-bit instructions. For instance, bitwise logical and:
{| class="wikitable"
|-
! Set
! Instruction
! Operation
|-
! SSE2
| '''pand''' xmm1, xmm2/m128
| xmm1 := xmm1 & xmm2
|-
! AVX
| '''vpand''' xmm1, xmm2, xmm3/m128
| xmm1 := xmm2 & xmm3
|}

Though AVX does not yet support 256-bit integer operations, there are some benefits to using it. 3-operand support can be used to eliminate many "move" instructions, which otherwise can take up significant execution resources.

Additionally, when using xmm registers numbered 8 and higher, the AVX encoding of an SSE instruction is often one byte smaller, due to the more compact nature of the VEX encoding scheme. Finally, the ymm registers offer double the register space: even if the top halves aren't used for computation, they might be suitable as temporary storage space, avoiding the use of a scratch buffer or the stack.

While AVX can do 32-byte loads and stores, no CPU (as of Sandy Bridge) actually has a 32-byte load or store unit; such loads and stores are done simply by doing two separate 16-byte memory operations internally. Thus, AVX is no faster for memory operations (yet).

=AVX on non-Intel CPUs=
AMD's [https://en.wikipedia.org/wiki/Bulldozer_%28microarchitecture%29 Bulldozer] does not benefit from 3-operand in the same way that Intel's AVX-supporting CPU, Sandy Bridge, does. Bulldozer has a "move elimination" feature that resolves SIMD move instructions separately from the main execution pipeline. On Bulldozer, 3-operand support can still help reduce code size and reduce dispatch bottlenecks, but usually does not help performance much.

Additionally, Bulldozer only has a 128-bit floating-point execution unit, so 256-bit floating point operations are no faster than 128-bit ones, and sometimes actually slower. Nevertheless, some functions might still benefit from the extra register space.

=Mixing AVX and SSE=
Besides 3-operand support, the primary difference between the AVX and SSE encodings of an SSE instruction is that the AVX version clears the unused portion of the ymm register (the top 128 bits), while the SSE version does not modify it. Intel strongly advises against mixing SSE 128-bit instructions and AVX 256-bit instructions, as this "mode-switching" can cost upwards of 70 clock cycles. However, mixing SSE 128-bit and AVX 128-bit is okay, as is mixing AVX 128-bit and AVX 256-bit.

In order to safely switch modes, Intel recommends using '''vzeroupper''' after using 256-bit AVX instructions and before using 128-bit SSE instructions, if the two are being used in the same program.

=AVX2=
''see main article [[AVX2]]''

=See also=
* [[AltiVec]]
* [[AVX2]]
* [[AVX-512]]
* [[BMI1]]
* [[BMI2]]
* [[MMX]]
* [[SIMD and SWAR Techniques]]
* [[SSE]]
* [[SSE2]]
* [[SSE3]]
* [[SSSE3]]
* [[SSE4]]

=Manuals=
* [https://computing.llnl.gov/tutorials/linux_clusters/Intro_to_Intel_AVX.pdf Introduction to Intel® Advanced Vector Extensions] by [http://clomont.com/ Chris Lomont]
* [https://support.amd.com/TechDocs/26568.pdf AMD64 Architecture Programmer’s Manual, Volume 4: 128-Bit and 256-Bit Media Instructions] (pdf)

=External Links=
* [https://en.wikipedia.org/wiki/Advanced_Vector_Extensions Advanced Vector Extensions from Wikipedia]
* [https://en.wikipedia.org/wiki/VEX_prefix VEX prefix From Wikipedia]
* [https://software.intel.com/en-us/articles/intel-software-development-emulator/ Intel Software Development Emulator], which can be used to experiment with AVX and AVX2 on a CPU that doesn't support them.
* [https://software.intel.com/sites/landingpage/IntrinsicsGuide/ Intel Intrinsics Guide]
* [https://software.intel.com/en-us/articles/using-avx-without-writing-avx-code Using AVX Without Writing AVX Code]

=References=
<references />

'''[[x86|Up one Level]]'''

Navigation menu