Changes

Jump to: navigation, search

SSE4

4,871 bytes added, 13:13, 9 August 2018
Created page with "'''Home * Hardware * x86 * SSE4''' '''SSE4''' is a set of Intel and AMD ambiguous and almost disjoint x86 instruction set extensions, [https://e..."
'''[[Main Page|Home]] * [[Hardware]] * [[x86]] * SSE4'''

'''SSE4''' is a set of [[Intel]] and [[AMD]] ambiguous and almost disjoint x86 instruction set extensions, [https://en.wikipedia.org/wiki/SSE4#SSE4.1 SSE4.1], [https://en.wikipedia.org/wiki/SSE4#SSE4.2 SSE4.2] both by Intel, and [https://en.wikipedia.org/wiki/SSE4#SSE4a SSE4a] by AMD.

=Intel=
==SSE4.1==
Intel introduced SSE4.1 with the [https://en.wikipedia.org/wiki/Penryn_%28microarchitecture%29#Penryn Penryn] [https://en.wikipedia.org/wiki/Intel_Core_2 Core 2] brand of the [https://en.wikipedia.org/wiki/Core_%28microarchitecture%29 Core microarchitecture] in 2007 with 47 new instructions.
{| class="wikitable"
|-
! Mnemonic
! Description
|
! C-Intrinsic
|
|-
| pcmpeqq
| packed compare equal [[Quad Word|qword]]
| _m128i
| [http://msdn.microsoft.com/en-us/library/bb513998.aspx _mm_cmpeq_epi64]
| (_m128i a, _m128i b)
|}

''see [[King Pattern#SSE4|Vulnerable on distant Checks with SSE4]].''
==SSE4.2==
[https://en.wikipedia.org/wiki/SSE4#SSE4.2 SSE4.2] of the [https://en.wikipedia.org/wiki/Nehalem_%28microarchitecture%29 Nehalem-based] [https://en.wikipedia.org/wiki/Intel_Core_i7 Core i7] was introduced in 2008 with 7 new instructions.

===STTNI===
SSE4.2 includes five ''String and Text New Instructions'' (STTNI) working on 128-bit XMM SIMD as well as general prupose registers and flags to perform character searches and comparison on two operands of 16 bytes at a time , i.e. PCMPESTRI (Packed Compare Explicit Length Strings, Return Index) <ref>[http://www.felixcloutier.com/x86/PCMPESTRI.html PCMPESTRI — Packed Compare Explicit Length Strings, Return Index]</ref>.

===ATAI===
Popcnt and crc32, working on general purpose registers, were dubbed Application-Targeted Accelerator Instructions (ATAI) as subset of SSE4.2 <ref>[http://msdn.microsoft.com/en-us/library/bb892950.aspx MSDN - Streaming SIMD Extensions 4 Instructions], 2.3 SSE4.2 INSTRUCTION SET, 2.3.3. Application-Targeted Accelerator Instructions</ref> <ref>[https://software.intel.com/en-us/node/524195 Application Targeted Accelerators Intrinsics]</ref>, but should considered as disjoint instruction set concerning SSE4 compiler optimizations.

{| class="wikitable"
|-
! Mnemonic
! Description
|
! C-Intrinsic
|
|-
| popcnt
| [[Population Count]]
| int
| [http://msdn.microsoft.com/en-us/library/bb531475.aspx _mm_popcnt_u64]
| (unsigned _int64 a)
|}
<span id="SSE4a"></span>
=AMD SSE4a=
[https://en.wikipedia.org/wiki/SSE4#SSE4a SSE4a] was introduced by AMD with the [https://en.wikipedia.org/wiki/AMD_K10 K10] (Barcelona) microarchitecture.

==SIMD==
Two new SIMD instructions, working on XMM registers were combined mask-shift instructions (EXTRQ/INSERTQ) and scalar streaming store instructions (MOVNTSD/MOVNTSS). These instructions are not available in Intel's SSE4.
<span id="ABM"></span>
==Advanced Bit Manipulation==
The two important instructions work on general purpose registers. [[BitScan#LeadingZeroCount|Leading Zero Count]] was not available in Intel's Application-Targeted Accelerator Instructions of [[SSE4#SSE4.2|SSE4.2]], but later incorporated with [[BMI1#LZCNT|BMI]].

{| class="wikitable"
|-
! Mnemonic
! Description
|
! C-Intrinsic
|
|-
| lzcnt
| [[BitScan#LeadingZeroCount|Leading Zero Count]]
| unsigned _int64
| [http://msdn.microsoft.com/en-us/library/bb384809.aspx _lzcnt64]
| (unsigned _int64 a)
|-
| popcnt
| [[Population Count]]
| unsigned _int64
| [http://msdn.microsoft.com/en-us/library/bb385231.aspx _popcnt64]
| (unsigned _int64 a)
|}

=See also=
* [[AltiVec]]
* [[AVX]]
* [[BMI1|BMI]]
* [[MMX]]
* [[SIMD and SWAR Techniques]]
* [[SSE]]
* [[SSE2]]
* [[SSE3]]
* [[SSSE3]]
* [[SSE5]]
* [[TBM]]
* [[King Pattern#SSE4|Vulnerable on distant Checks with SSE4]]
* [[XOP]]

=Manuals=
* [http://www.info.univ-angers.fr/~richer/ens/l3info/ao/intel_sse4.pdf Intel® SSE4 Programming Reference] (pdf)
* [https://support.amd.com/techdocs/40546.pdf Software Optimization Guide for AMD Family 10h and 12h Processors] (pdf)

=Forum Posts=
* [http://www.talkchess.com/forum/viewtopic.php?t=43771 using Popcount and Prefetch with SSE4 hardware support] by [[Engin Üstün]], [[CCC]], May 19, 2012 » [[Population Count]], [[Memory]]

=External Links=
* [https://en.wikipedia.org/wiki/SSE4 SSE4 from Wikipedia]
* [http://msdn.microsoft.com/en-us/library/bb892950.aspx MSDN - Streaming SIMD Extensions 4 Instructions]
* [http://msdn.microsoft.com/en-us/library/bb892945.aspx MSDN - SSE4A and Advanced Bit Manipulation Intrinsics]
* [http://sseplus.sourceforge.net/index.html SSEPlus Project Documentation]
* [http://www.agner.org/optimize/blog/ Agner`s CPU blog] by [http://www.agner.org/ Agner Fog]
* [http://software.intel.com/sites/landingpage/IntrinsicsGuide/ Intel Intrinsics Guide]

=References=
<references />

'''[[x86|Up one Level]]'''

Navigation menu