Changes

Jump to: navigation, search

BMI2

14,118 bytes added, 12:43, 18 May 2018
Created page with "'''Home * Hardware * x86-64 * BMI2''' '''BMI2''',<br/> an x86-64 expansion of bit-manipulation instructions by Intel..."
'''[[Main Page|Home]] * [[Hardware]] * [[x86-64]] * BMI2'''

'''BMI2''',<br/>
an x86-64 expansion of [[Bit-Twiddling#BitManipulation|bit-manipulation]] instructions by [[Intel]]. Like [[BMI1]], BMI2 employs [https://en.wikipedia.org/wiki/VEX_prefix VEX prefix] encoding to support three-operand syntax with non-destructive source operands on 32- or 64-bit general-purpose registers. Along with [[AVX2]], BMI2 was expected to be part of [[Intel|Intel's]] [https://en.wikipedia.org/wiki/Haswell_%28microarchitecture%29 Haswell] architecture planned for 2013, but was not yet available in one of the first tested Haswell generations of mid 2013 as reported by Andreas Stiller from the German c't magazine <ref>[http://de.linkedin.com/pub/andreas-stiller/a/381/aa9 Andreas Stiller] ('''2013'''). ''[http://www.heise.de/ct/inhalt/2013/14/114/ Der Rechenkünstler]''. [http://www.heise.de/ct/ c't Magazin für Computertechnik] 14/2013, p. 114-119 (German)</ref>. BMI2 requires bit 8 set in EBX of [https://en.wikipedia.org/wiki/CPUID CPUID] with EAX=07H, ECX=0H <ref>[http://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family How to detect New Instruction support in the 4th generation Intel® Core™ processor family | Intel® Developer Zone] by [http://software.intel.com/de-de/user/76418 Max Locktyukhin], August 05, 2013</ref>. In 2017, BMI2 was further incorporated in [[AMD|AMD's]] [https://en.wikipedia.org/wiki/Zen_(microarchitecture) Zen-architecture] but so far with a slow implementation of critical instructions such as [[#PDEP|PDEP]] and [[#PEXT|PEXT]] <ref>[http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=32016 Ryzen Fritz Chess Benchmarks ?] by ralunger, [[Computer Chess Forums|Rybka Forum]], March 03, 2017</ref> <ref>[https://www.reddit.com/r/Amd/comments/60i6er/ryzen_and_bmi2_strange_behavior_and_high_latencies/ Ryzen and BMI2: Strange behavior and high latencies] by DonnieTinyHands, [https://en.wikipedia.org/wiki/Reddit Reddit], March 20, 2017 » [[AMD]], [[#PEXT|BMI2 PEXT]]</ref> .

=Instructions=
Beside the instructions explained in more detail below, there is MULX, Unsigned Multiply, and RORX, SARX/SHLX/SHRX, that is rotate and shifts without affecting processor flags. PEXT and PDEP <ref>[http://www.informatik.uni-trier.de/~ley/pers/hd/l/Lee:Ruby_B= Ruby Lee], [http://www.informatik.uni-trier.de/~ley/pers/hd/s/Shi:Zhijie_Jerry.html Zhijie Shi], [http://www.informatik.uni-trier.de/~ley/pers/hd/y/Yang:Xiao.html Xiao Yang] ('''2001'''). ''Efficient Permutation Instructions for Fast Software Cryptography''. Micro, IEEE, Vol. 21, No. 6, [http://palms.ee.princeton.edu/PALMSopen/lee01efficient.pdf pdf]</ref> allow for an alternative of [[Kindergarten Bitboards|kindergarten]]- or [[Magic Bitboards|magic bitboards]], and clearly makes maintaining [[Rotated Bitboards|rotated bitboards]] obsolete <ref>[http://www.talkchess.com/forum/viewtopic.php?t=40333 Haswell New Instructions] by [[Zach Wegner]], [[CCC]], September 09, 2011</ref>.
<span id="BZHI"></span>
==BZHI==
Zero High Bits Starting with Specified Bit Position <ref>[http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/lin/intref_cls/common/intref_avx2_bzhi_u.htm _bzhi_u32/u64]</ref>.
<pre>
dest ::= src & ((1 << index)-1);

unsigned __int64 _bzhi_u64(unsigned __int64 src, unsigned __int32 index);
</pre>
<span id="PDEP"></span>
==PDEP==
Parallel Bits Deposit <ref>[http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/lin/intref_cls/common/intref_avx2_pdep_u.htm _pdep_u32/u64]</ref>. May be used to map [[First Rank Attacks|first rank attacks]] to any rank, file, or diagonal on the chess board.

===Intrinsic Prototype===
<pre>
unsigned __int64 pdep_u64(unsigned __int64 src, unsigned __int64 mask);
</pre>
===Sample===
<pre>
SRC1 ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐
│S63│S62│S61│S60│S59│....│ S7│ S6│ S5│ S4│ S3│ S2│ S1│ S0│
└───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘

SRC2 ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐
(mask) │ 0 │ 0 │ 0 │ 1 │ 0 │0...│ 1 │ 0 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ (f.i. 4 bits set)
└───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘

DEST ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐
│ 0 │ 0 │ 0 │ S3│ 0 │0...│ S2│ 0 │ S1│ 0 │ 0 │ S0│ 0 │ 0 │
└───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘
</pre>
===Serial Implementation===
in [[C]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=48220&start=1 Re: PEXT Bitboards] by [[Ronald de Man]], [[CCC]], June 07, 2013</ref>:
<pre>
U64 _pdep_u64(U64 val, U64 mask) {
U64 res = 0;
for (U64 bb = 1; mask; bb += bb) {
if (val & bb)
res |= mask & -mask;
mask &= mask - 1;
}
return res;
}
</pre>
<span id="PEXT"></span>
==PEXT==
Parallel Bits Extract <ref> [http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/lin/intref_cls/common/intref_avx2_pext_u.htm _pext_u32/u64]</ref>. Great to get the [[First Rank Attacks#TheOuterSquares|inner six bit]] occupancy of any rank, file, or diagonal as index of six consecutive lower bit, or the up to 12 bits for a [[#PEXTBitboards|magic attack lookup]].

===Intrinsic Prototype===
<pre>
unsigned __int64 _pext_u64(unsigned __int64 src, unsigned __int64 mask);
</pre>
===Sample===
<pre>
SRC1 ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐
│S63│S62│S61│S60│S59│....│ S7│ S6│ S5│ S4│ S3│ S2│ S1│ S0│
└───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘

SRC2 ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐
(mask) │ 0 │ 0 │ 0 │ 1 │ 0 │0...│ 1 │ 0 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ (f.i. 4 bits set)
└───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘

DEST ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐
│ 0 │ 0 │ 0 │ 0 │ 0 │0...│ 0 │ 0 │ 0 │ 0 │S60│ S7│ S5│ S2│
└───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘
</pre>
===Serial Implementation===
in [[C]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=48220&start=2 Re: PEXT Bitboards] by [[Gerd Isenberg]], [[CCC]], June 07, 2013</ref>, quite similar to [[#PDEP|PDEP]] in traversing the mask, and only two expressions, "mask & -mask" and "bb" swapped:
<pre>
U64 _pext_u64(U64 val, U64 mask) {
U64 res = 0;
for (U64 bb = 1; mask; bb += bb) {
if ( val & mask & -mask )
res |= bb;
mask &= mask - 1;
}
return res;
}
</pre>
<span id="PEXTBitboards"></span>
=Applications=
==PEXT Bitboards==
Fancy '''PEXT Bitboards''' as replacement for [[Magic Bitboards#Fancy|Fancy Magic Bitboards]]. The relevant up to four ray occupancies are mapped to a dense index range by using the [[#PEXT|PEXT]] instruction:
<pre>
U64 arrAttacks [...]; // ~840 KByte all rook and bishop attacks
U64 arrRookMask [64]; // 10..12 relevant occupancy bits per rook square
U64 arrBishopMask[64]; // 5.. 9 relevant occupancy bits per bishop square
U32 arrRookBase [64]; // arrAttacks base offset per rook square
U32 arrBishopBase[64]; // arrAttacks base offset per bishop square

U64 rookAttack(U64 occ, enumSquare sq) {
return arrAttacks[arrRookBase[sq] + _pext_u64(occ, arrRookMask[sq])];
}

U64 bishopAttack(U64 occ, enumSquare sq) {
return arrAttacks[arrBishopBase[sq] + _pext_u64(occ, arrBishopMask[sq])];
}
</pre>
<span id="PDEPBitboards"></span>
==PEXT/PDEP Bitboards==
An idea for denser sliding attack tables is also using [[#PDEP|PDEP]], dividing the table size of the [[#PEXTBitboards|PEXT Bitboards]] by four but some trailing computation. Looking up [[Word|16-bit words]] instead of bitboards - to get the scattered bits of the up to four ray attacks per square in an extracted form mapped to a 16-bit word, which then becomes the attack bitboard by performing a deposit instruction with an appropriate second mask per square. The technique was introduced by [[Zach Wegner]] in [[CCC]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=40333 Haswell New Instructions] by [[Zach Wegner]], [[CCC]], September 09, 2011</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=40333&start=12 Re: Haswell New Instructions] by [[Gerd Isenberg]], [[CCC]], September 12, 2011</ref> and implemented by [[Ronald de Man]] with a table of slightly more than 210 KiB <ref>[http://www.talkchess.com/forum/viewtopic.php?t=49611&start=1 Re: 152k rook and bishop attacks using PEXT and PDEP] by [[Ronald de Man]], [[CCC]], October 06, 2013</ref> <ref>[https://github.com/syzygy1/tb/blob/master/src/bmi2.h tb/src/bmi2.h at master · syzygy1/tb · GitHub] by [[Ronald de Man]]</ref>. Ronald's code was tested by [[Jean-Francois Romang]] using [[Stockfish]] as testbed, giving promising results <ref>[http://www.talkchess.com/forum/viewtopic.php?t=51879&start=2 Re: Stockfish haswell optimized build] by [[Jean-Francois Romang]], [[CCC]], April 06, 2014</ref> <ref>[https://github.com/jromang/Stockfish/blob/bmi2/src/bitboard.h#L255 Stockfish/src/bitboard.h at bmi2 · jromang/Stockfish · GitHub] by [[Jean-Francois Romang]]</ref>.

==Syzygy Generator==
[[Ronald de Man|Ronald de Man's]] generator for [[Syzygy Bases]] can take profit of [[#PDEP|PDEP]] and [[#PEXT|PEXT]] instructions, or to use their serial implementations to further reduce the size of his already compact [[Endgame Tablebases|endgame tablebases]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=48220&start=1 Re: PEXT Bitboards] by [[Ronald de Man]], [[CCC]], June 07, 2013</ref>.

=See also=
* [[AVX]]
* [[AVX2]]
* [[BitScan]]
* [[Bit-Twiddling]]
* [[BMI1]]
* [[General Setwise Operations]]
* [[TBM]]

=Manuals=
* [http://software.intel.com/file/36945 Intel AVX and AVX2 Programming Reference] (pdf)

=Forum Posts=
* [http://www.talkchess.com/forum/viewtopic.php?t=40333 Haswell New Instructions] by [[Zach Wegner]], [[CCC]], September 09, 2011
* [http://www.randombit.net/bitbashing/2012/06/22/haswell_bit_permutations.html Bit manipulations using BMI2] by [http://www.randombit.net/ Jack Lloyd], [http://www.randombit.net/ / :: bitbashing], June 22, 2012
* [http://www.talkchess.com/forum/viewtopic.php?t=48220 PEXT Bitboards] by [[Gerd Isenberg]], [[CCC]], June 07, 2013
* [http://www.talkchess.com/forum/viewtopic.php?t=49611 152k rook and bishop attacks using PEXT and PDEP] by [[Lasse Hansen]], [[CCC]], October 06, 2013
* [http://www.talkchess.com/forum/viewtopic.php?t=51879 Stockfish haswell optimized build] by [[Jean-Francois Romang]], [[CCC]], April 06, 2014 » [[Stockfish]]
* [http://www.talkchess.com/forum/viewtopic.php?t=58910 smaller tables for PEXT-style attack getters] by [[Wylie Garvin]], [[CCC]], January 13, 2016
* [http://www.talkchess.com/forum/viewtopic.php?t=63978 BMI2 intrinsics in gcc] by [[Álvaro Begué]], [[CCC]], May 14, 2017
* [http://rybkaforum.net/cgi-bin/rybkaforum/topic_show.pl?tid=32016 Ryzen Fritz Chess Benchmarks ?] by ralunger, [[Computer Chess Forums|Rybka Forum]], March 03, 2017
* [https://www.reddit.com/r/Amd/comments/60i6er/ryzen_and_bmi2_strange_behavior_and_high_latencies/ Ryzen and BMI2: Strange behavior and high latencies] by DonnieTinyHands, [https://en.wikipedia.org/wiki/Reddit Reddit], March 20, 2017 » [[AMD]], [[#PEXT|BMI2 PEXT]]
* [http://www.talkchess.com/forum/viewtopic.php?t=65191 BMI2 PEXT idea for attacks generation] by [[Mark Lefler]], [[CCC]], September 16, 2017
* [http://www.talkchess.com/forum/viewtopic.php?t=66737&start=4 Re: Komodo 11.3] by [[Mark Lefler]], [[CCC]], March 04, 2018 » [[AMD]], [[#PEXT|BMI2 PEXT]], [[Komodo#11|Komodo 11.3]]

=External Links=
* [https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets Bit Manipulation Instruction Sets from Wikipedia]
* [http://semipublic.comp-arch.net/wiki/Compress_Bits_Under_Mask Compress Bits Under Mask - CompArch]
* [http://software.intel.com/en-us/tags/36092 BMI2 | Intel® Developer Zone]
* [http://www.randombit.net/bitbashing/2012/06/22/haswell_bit_permutations.html Bit manipulations using BMI2 — bitbashing], June 22, 2012
* [http://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family How to detect New Instruction support in the 4th generation Intel® Core™ processor family | Intel® Developer Zone] by [http://software.intel.com/de-de/user/76418 Max Locktyukhin], August 05, 2013
* [http://users.atw.hu/instlatx64/GenuineIntel00306C3_Haswell_InstLatX64.txt Haswell Instructions Latency]
* [http://software.intel.com/sites/landingpage/IntrinsicsGuide/ Intel Intrinsics Guide]

=References=
<references />

'''[[x86-64|Up one Level]]'''

Navigation menu