Changes

Jump to: navigation, search

General Setwise Operations

1,248 bytes removed, 20:38, 3 May 2018
m
no edit summary
vpand ymm0, ymm1, ymm2 ; AVX2 ymm0 = ymm1 & ymm2
</pre>
[[SSE2]]-intrinsic [http://msdn2.microsoft.com/en-us/library/6d1txsa8%28VS.80%29.aspx [SSE2#_mm_and_si128|_mm_and_si128]].[[AVX2]]-intrinsic [https://software.intel.com/en-us/node/695037 [AVX2#_mm256_and_si256|_mm256_and_si256]]
[[AVX-512]] has [[AVX-512#VPTERNLOG|VPTERNLOG]]
vpor ymm0, ymm1, ymm2 ; AVX2 ymm0 = ymm1 | ymm2
</pre>
[[SSE2]]-intrinsic [http://msdn2.microsoft.com/en-us/library/ew8ty0db%28VS.80%29.aspx [SSE2#_mm_or_si128|_mm_or_si128].][[AVX2]]-intrinsic [https://software.intel.com/en-us/node/523912 [AVX2#_mm256_or_si256|_mm256_or_si256]]
[[AVX-512]] has [[AVX-512#VPTERNLOG|VPTERNLOG]]
'''x86-mnemonics'''
Available as general purpose instruction.
<pre> not rax ; rax = ~rax</pre>
[[AVX-512]] has [[AVX-512#VPTERNLOG|VPTERNLOG]]
vpandn ymm0, ymm1, ymm2 ; AVX xmm0 = ~xmm1 & xmm2
</pre>
[[SSE2]]-intrinsic [http://msdn2.microsoft.com/en-us/library/1beaceh8%28VS.80%29.aspx [SSE2#_mm_andnot_si128|_mm_andnot_si128].][[AVX2]]-intrinsic [https://software.intel.com/en-us/node/523911 [AVX2#_mm256_andnot_si256|_mm256_andnot_si256]]
[[AVX-512]] has [[AVX-512#VPTERNLOG|VPTERNLOG]]
vpxor ymm0, ymm1, ymm2 ; AVX2 ymm0 = ymm1 ^ ymm2
</pre>
[[SSE2]]-intrinsic [http://msdn2.microsoft.com/en-us/library/fzt08www%28VS.80%29.aspx [SSE2#_mm_xor_si128|_mm_xor_si128].][[AVX2]]-intrinsic [https://software.intel.com/en-us/node/683570 [AVX2#_mm256_xor_si256|_mm256_xor_si256]]
[[AVX-512]] has [[AVX-512#VPTERNLOG|VPTERNLOG]]
psrlq xmm0, xmm1 ; SSE2 xmm0 >>= xmm1
psllq xmm0, xmm1 ; SSE2 xmm0 <<= xmm1
vpshlq xmm0, xmm1, xmm2 ; XOP xmm0 = xmm1 >>/<< xmm2 ; Individual, generalized shifts
vpshlb xmm0, xmm1, xmm2 ; XOP xmm0 = xmm1 >>/<< xmm2 ; Individual, generalized shifts of 16 bytes
vpsrlvq ymm0, ymm1, ymm2 ; AVX2 ymm0 = ymm1 >> ymm2 ; Individual shifts
vpsllvq ymm0, ymm1, ymm2 ; AVX2 ymm0 = ymm1 << ymm2 ; Individual shifts
</pre>
[[SSE2]]-intrinsics with variable register or constant immediate shift amounts, working on vectors of two bitboards:
* [http://msdn2.microsoft.com/en-us/library/yf6cf9k8%28VS.80%29.aspx [SSE2#_mm_srl_epi64|_mm_srl_epi64]]* [http://msdn2.microsoft.com/en-us/library/btdyeyt1%28VS.80%29.aspx [SSE2#_mm_srli_epi64|_mm_srli_epi64]* [http://msdn2.microsoft.com/en-us/library/6ta9dffd%28VS.80%29.aspx _mm_sll_epi64]* [http://msdn2.microsoft.com/en-us/library/da6131h7%28VS.80%29.aspx _mm_slli_epi64] [[XOP]] has [[XOPSSE2#Shifts_mm_sll_epi64|individual, generalized shifts_mm_sll_epi64]] for each of two bitboards and also has byte-wise shifts* [http://msdn.microsoft.com/en-us/library/gg466456 _mm_shl_epi64[SSE2#_mm_slli_epi64|_mm_slli_epi64]* [http://msdn.microsoft.com/en-us/library/gg466458 _mm_shl_epi8]
[[AVX2]] has [[AVX2#IndividualShifts|individual shifts]] for each of four bitboards:
* [https://software.intel.com/en-us/node/695097 [AVX2#_mm256_sllv_epi64|_mm256_sllv_epi64]]* [https://software.intel.com/en-us/node/695103 [AVX2#_mm256_srlv_epi64|_mm256_srlv_epi64]]
<span id="OneStepOnly"></span>
==One Step Only==
</pre>
Most processors have rotate instructions, but are not supported by standard programming languages like [[C]] or [[Java]]. Some compilers provide [http://msdn2.microsoft.com/en-us/library/5cc576c4.aspx intrinsic], processor specific functions.
<pre>
U64 rotateLeft (U64 x, int s) {return _rotl64(x, s);}
'''x86-Instructions'''
[[x86]] processor provides a bit-test instruction family (bt, bts, btr, btc) with 32- and 64-bit operands. They may be used implicitly by compiler optimization or explicitly by inline assembler or compiler intrinsics. Take care that they are applied on local variables likely registers rather than memory references:
* [http://msdn2.microsoft.com/en[x86-us/library/h65k4tze%28VS.80%29.aspx 64#_bittest64|_bittest64]]* [http://msdn2.microsoft.com/en[x86-us/library/z56sc6y4%28VS.80%29.aspx 64#_bittestandset64|_bittestandset64]]* [http://msdn2.microsoft.com/en[x86-us/library/zbdxdb11%28VS.80%29.aspx 64#_bittestandcomplement64|_bittestandcomplement64]]* [http://msdn2.microsoft.com/en[x86-us/library/hd0hzyf8%28VS.80%29.aspx 64#_bittestandreset64|_bittestandreset64]]
<span id="UpdateByMove"></span>
==Update by Move==
'''x86-mnemonics'''
<pre>
neg rax; rax = -rax; rax *= -1
</pre>
'''Increment of Complement'''
The two's complement is defined as a value, we need to add to the original value to get 2^64 which is an "overflowed" zero - since all 64-bit values are implicitly modulo 2^64. Thus, the two's complement is defined as '''ones' complement plus one''':
<pre> -x == ~x + 1</pre>
That fulfills the condition that x + (-x) == 2 ^ bitsize (2 ^ 64) which overflows to zero:
<pre>
'''Complement of Decrement'''
Replacing x by x - 1 in the increment of complement formula, leaves another definition - two's complement or Negation is also the ones' complement of the ones' decrement:
<pre> -x == ~(x - 1)</pre>
Thus, we can reduce subtraction by addition and ones' complement:
<pre>
x - y == ~(~x + y)
</pre>
 
'''Bitwise Copy/Invert'''
The two's complement may also defined by a bitwise copy-loop from right (LSB) to left (MSB):
blsi rax, rbx ; BMI1 rax = rbx & -rbx
</pre>
[[BMI1]]-intrinsic [http://www.felixcloutier.com/x86/BLSI.html [BMI1#_blsi_u3264|_blsi_u32/64].]
[[AMD|AMD's]] [[x86-64]] expansion [[TBM]] further has a [[TBM#BLSIC|Isolate Lowest Set Bit and Complement]] instruction, which applies [[General Setwise Operations#DeMorganslaws|De Morgan's law]] to get the complement of the LS1B:
blsr rax, rbx ; BMI1 rax = rbx & (rbx - 1)
</pre>
[[BMI1]]-intrinsic [http://www.felixcloutier.com/x86/BLSR.html [BMI1#_blsr_u3264|_blsr_u32/64].]
<span id="LS1BSeparation"></span>
===Separation===
tzmsk rax, rbx ; TBM: rax = ~rbx & (rbx - 1)
</pre>
[[BMI1]]-intrinsic [https://software.intel.com/en-us/node/514041 [BMI1#_blsmsk_u3264|_blsmsk_u32/64].]
===Smearing===

Navigation menu