SSE4 is a set of Intel and AMD ambiguous and almost disjoint x86 instruction set extensions, SSE4.1, SSE4.2 both by Intel, and SSE4a by AMD.



Intel introduced SSE4.1 with the Penryn Core 2 brand of the Core microarchitecture in 2007 with 47 new instructions.

Mnemonic Description C-Intrinsic
pcmpeqq packed compare equal qword _m128i _mm_cmpeq_epi64 (_m128i a, _m128i b)

see Vulnerable on distant Checks with SSE4.


SSE4.2 of the Nehalem-based Core i7 was introduced in 2008 with 7 new instructions.


SSE4.2 includes five String and Text New Instructions (STTNI) working on 128-bit XMM SIMD as well as general prupose registers and flags to perform character searches and comparison on two operands of 16 bytes at a time , i.e. PCMPESTRI (Packed Compare Explicit Length Strings, Return Index) [1].


Popcnt and crc32, working on general purpose registers, were dubbed Application-Targeted Accelerator Instructions (ATAI) as subset of SSE4.2 [2] [3], but should considered as disjoint instruction set concerning SSE4 compiler optimizations.

Mnemonic Description C-Intrinsic
popcnt Population Count int _mm_popcnt_u64 (unsigned _int64 a)


SSE4a was introduced by AMD with the K10 (Barcelona) microarchitecture.


Two new SIMD instructions, working on XMM registers were combined mask-shift instructions (EXTRQ/INSERTQ) and scalar streaming store instructions (MOVNTSD/MOVNTSS). These instructions are not available in Intel's SSE4.

Advanced Bit Manipulation

The two important instructions work on general purpose registers. Leading Zero Count was not available in Intel's Application-Targeted Accelerator Instructions of SSE4.2, but later incorporated with BMI.

Mnemonic Description C-Intrinsic
lzcnt Leading Zero Count unsigned _int64 _lzcnt64 (unsigned _int64 a)
popcnt Population Count unsigned _int64 _popcnt64 (unsigned _int64 a)

