SSE4
SSE4 is a set of Intel and AMD ambiguous and almost disjoint x86 instruction set extensions, SSE4.1, SSE4.2 both by Intel, and SSE4a by AMD.
Contents
Intel
SSE4.1
Intel introduced SSE4.1 with the Penryn Core 2 brand of the Core microarchitecture in 2007 with 47 new instructions.
Mnemonic | Description | C-Intrinsic | ||
---|---|---|---|---|
pcmpeqq | packed compare equal qword | _m128i | _mm_cmpeq_epi64 | (_m128i a, _m128i b) |
see Vulnerable on distant Checks with SSE4.
SSE4.2
SSE4.2 of the Nehalem-based Core i7 was introduced in 2008 with 7 new instructions.
STTNI
SSE4.2 includes five String and Text New Instructions (STTNI) working on 128-bit XMM SIMD as well as general prupose registers and flags to perform character searches and comparison on two operands of 16 bytes at a time , i.e. PCMPESTRI (Packed Compare Explicit Length Strings, Return Index) [1].
ATAI
Popcnt and crc32, working on general purpose registers, were dubbed Application-Targeted Accelerator Instructions (ATAI) as subset of SSE4.2 [2] [3], but should considered as disjoint instruction set concerning SSE4 compiler optimizations.
Mnemonic | Description | C-Intrinsic | ||
---|---|---|---|---|
popcnt | Population Count | int | _mm_popcnt_u64 | (unsigned _int64 a) |
AMD SSE4a
SSE4a was introduced by AMD with the K10 (Barcelona) microarchitecture.
SIMD
Two new SIMD instructions, working on XMM registers were combined mask-shift instructions (EXTRQ/INSERTQ) and scalar streaming store instructions (MOVNTSD/MOVNTSS). These instructions are not available in Intel's SSE4.
Advanced Bit Manipulation
The two important instructions work on general purpose registers. Leading Zero Count was not available in Intel's Application-Targeted Accelerator Instructions of SSE4.2, but later incorporated with BMI.
Mnemonic | Description | C-Intrinsic | ||
---|---|---|---|---|
lzcnt | Leading Zero Count | unsigned _int64 | _lzcnt64 | (unsigned _int64 a) |
popcnt | Population Count | unsigned _int64 | _popcnt64 | (unsigned _int64 a) |
See also
- AltiVec
- AVX
- BMI
- MMX
- SIMD and SWAR Techniques
- SSE
- SSE2
- SSE3
- SSSE3
- SSE5
- TBM
- Vulnerable on distant Checks with SSE4
- XOP
Manuals
- Intel® SSE4 Programming Reference (pdf)
- Software Optimization Guide for AMD Family 10h and 12h Processors (pdf)
Forum Posts
- using Popcount and Prefetch with SSE4 hardware support by Engin Üstün, CCC, May 19, 2012 » Population Count, Memory
External Links
- SSE4 from Wikipedia
- MSDN - Streaming SIMD Extensions 4 Instructions
- MSDN - SSE4A and Advanced Bit Manipulation Intrinsics
- SSEPlus Project Documentation
- Agner`s CPU blog by Agner Fog
- Intel Intrinsics Guide
References
- ↑ PCMPESTRI — Packed Compare Explicit Length Strings, Return Index
- ↑ MSDN - Streaming SIMD Extensions 4 Instructions, 2.3 SSE4.2 INSTRUCTION SET, 2.3.3. Application-Targeted Accelerator Instructions
- ↑ Application Targeted Accelerators Intrinsics