Difference between revisions of "BMI2"
GerdIsenberg (talk | contribs) (Created page with "'''Home * Hardware * x86-64 * BMI2''' '''BMI2''',<br/> an x86-64 expansion of bit-manipulation instructions by Intel...") |
GerdIsenberg (talk | contribs) |
||
Line 123: | Line 123: | ||
=Forum Posts= | =Forum Posts= | ||
+ | ==2011 ...== | ||
* [http://www.talkchess.com/forum/viewtopic.php?t=40333 Haswell New Instructions] by [[Zach Wegner]], [[CCC]], September 09, 2011 | * [http://www.talkchess.com/forum/viewtopic.php?t=40333 Haswell New Instructions] by [[Zach Wegner]], [[CCC]], September 09, 2011 | ||
* [http://www.randombit.net/bitbashing/2012/06/22/haswell_bit_permutations.html Bit manipulations using BMI2] by [http://www.randombit.net/ Jack Lloyd], [http://www.randombit.net/ / :: bitbashing], June 22, 2012 | * [http://www.randombit.net/bitbashing/2012/06/22/haswell_bit_permutations.html Bit manipulations using BMI2] by [http://www.randombit.net/ Jack Lloyd], [http://www.randombit.net/ / :: bitbashing], June 22, 2012 | ||
* [http://www.talkchess.com/forum/viewtopic.php?t=48220 PEXT Bitboards] by [[Gerd Isenberg]], [[CCC]], June 07, 2013 | * [http://www.talkchess.com/forum/viewtopic.php?t=48220 PEXT Bitboards] by [[Gerd Isenberg]], [[CCC]], June 07, 2013 | ||
* [http://www.talkchess.com/forum/viewtopic.php?t=49611 152k rook and bishop attacks using PEXT and PDEP] by [[Lasse Hansen]], [[CCC]], October 06, 2013 | * [http://www.talkchess.com/forum/viewtopic.php?t=49611 152k rook and bishop attacks using PEXT and PDEP] by [[Lasse Hansen]], [[CCC]], October 06, 2013 | ||
− | * [http://www.talkchess.com/forum/viewtopic.php?t=51879 Stockfish haswell optimized build] by [[Jean-Francois Romang]], [[CCC]], April 06, 2014 » [[Stockfish]] | + | * [http://www.talkchess.com/forum/viewtopic.php?t=51879 Stockfish haswell optimized build] by [[Jean-Francois Romang]], [[CCC]], April 06, 2014 » [[Stockfish]] |
+ | ==2015 ...== | ||
* [http://www.talkchess.com/forum/viewtopic.php?t=58910 smaller tables for PEXT-style attack getters] by [[Wylie Garvin]], [[CCC]], January 13, 2016 | * [http://www.talkchess.com/forum/viewtopic.php?t=58910 smaller tables for PEXT-style attack getters] by [[Wylie Garvin]], [[CCC]], January 13, 2016 | ||
* [http://www.talkchess.com/forum/viewtopic.php?t=63978 BMI2 intrinsics in gcc] by [[Álvaro Begué]], [[CCC]], May 14, 2017 | * [http://www.talkchess.com/forum/viewtopic.php?t=63978 BMI2 intrinsics in gcc] by [[Álvaro Begué]], [[CCC]], May 14, 2017 | ||
Line 134: | Line 136: | ||
* [http://www.talkchess.com/forum/viewtopic.php?t=65191 BMI2 PEXT idea for attacks generation] by [[Mark Lefler]], [[CCC]], September 16, 2017 | * [http://www.talkchess.com/forum/viewtopic.php?t=65191 BMI2 PEXT idea for attacks generation] by [[Mark Lefler]], [[CCC]], September 16, 2017 | ||
* [http://www.talkchess.com/forum/viewtopic.php?t=66737&start=4 Re: Komodo 11.3] by [[Mark Lefler]], [[CCC]], March 04, 2018 » [[AMD]], [[#PEXT|BMI2 PEXT]], [[Komodo#11|Komodo 11.3]] | * [http://www.talkchess.com/forum/viewtopic.php?t=66737&start=4 Re: Komodo 11.3] by [[Mark Lefler]], [[CCC]], March 04, 2018 » [[AMD]], [[#PEXT|BMI2 PEXT]], [[Komodo#11|Komodo 11.3]] | ||
+ | * [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=67432 Ryzen 2 and BMI2?] by [[Steve Maughan]], [[CCC]], May 13, 2018 » [[AMD]] | ||
=External Links= | =External Links= |
Revision as of 19:44, 18 May 2018
Home * Hardware * x86-64 * BMI2
BMI2,
an x86-64 expansion of bit-manipulation instructions by Intel. Like BMI1, BMI2 employs VEX prefix encoding to support three-operand syntax with non-destructive source operands on 32- or 64-bit general-purpose registers. Along with AVX2, BMI2 was expected to be part of Intel's Haswell architecture planned for 2013, but was not yet available in one of the first tested Haswell generations of mid 2013 as reported by Andreas Stiller from the German c't magazine [1]. BMI2 requires bit 8 set in EBX of CPUID with EAX=07H, ECX=0H [2]. In 2017, BMI2 was further incorporated in AMD's Zen-architecture but so far with a slow implementation of critical instructions such as PDEP and PEXT [3] [4] .
Contents
Instructions
Beside the instructions explained in more detail below, there is MULX, Unsigned Multiply, and RORX, SARX/SHLX/SHRX, that is rotate and shifts without affecting processor flags. PEXT and PDEP [5] allow for an alternative of kindergarten- or magic bitboards, and clearly makes maintaining rotated bitboards obsolete [6].
BZHI
Zero High Bits Starting with Specified Bit Position [7].
dest ::= src & ((1 << index)-1); unsigned __int64 _bzhi_u64(unsigned __int64 src, unsigned __int32 index);
PDEP
Parallel Bits Deposit [8]. May be used to map first rank attacks to any rank, file, or diagonal on the chess board.
Intrinsic Prototype
unsigned __int64 pdep_u64(unsigned __int64 src, unsigned __int64 mask);
Sample
SRC1 ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐ │S63│S62│S61│S60│S59│....│ S7│ S6│ S5│ S4│ S3│ S2│ S1│ S0│ └───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘ SRC2 ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐ (mask) │ 0 │ 0 │ 0 │ 1 │ 0 │0...│ 1 │ 0 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ (f.i. 4 bits set) └───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘ DEST ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐ │ 0 │ 0 │ 0 │ S3│ 0 │0...│ S2│ 0 │ S1│ 0 │ 0 │ S0│ 0 │ 0 │ └───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘
Serial Implementation
U64 _pdep_u64(U64 val, U64 mask) { U64 res = 0; for (U64 bb = 1; mask; bb += bb) { if (val & bb) res |= mask & -mask; mask &= mask - 1; } return res; }
PEXT
Parallel Bits Extract [10]. Great to get the inner six bit occupancy of any rank, file, or diagonal as index of six consecutive lower bit, or the up to 12 bits for a magic attack lookup.
Intrinsic Prototype
unsigned __int64 _pext_u64(unsigned __int64 src, unsigned __int64 mask);
Sample
SRC1 ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐ │S63│S62│S61│S60│S59│....│ S7│ S6│ S5│ S4│ S3│ S2│ S1│ S0│ └───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘ SRC2 ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐ (mask) │ 0 │ 0 │ 0 │ 1 │ 0 │0...│ 1 │ 0 │ 1 │ 0 │ 0 │ 1 │ 0 │ 0 │ (f.i. 4 bits set) └───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘ DEST ┌───┬───┬───┬───┬───┐ ┌───┬───┬───┬───┬───┬───┬───┬───┐ │ 0 │ 0 │ 0 │ 0 │ 0 │0...│ 0 │ 0 │ 0 │ 0 │S60│ S7│ S5│ S2│ └───┴───┴───┴───┴───┘ └───┴───┴───┴───┴───┴───┴───┴───┘
Serial Implementation
in C [11], quite similar to PDEP in traversing the mask, and only two expressions, "mask & -mask" and "bb" swapped:
U64 _pext_u64(U64 val, U64 mask) { U64 res = 0; for (U64 bb = 1; mask; bb += bb) { if ( val & mask & -mask ) res |= bb; mask &= mask - 1; } return res; }
Applications
PEXT Bitboards
Fancy PEXT Bitboards as replacement for Fancy Magic Bitboards. The relevant up to four ray occupancies are mapped to a dense index range by using the PEXT instruction:
U64 arrAttacks [...]; // ~840 KByte all rook and bishop attacks U64 arrRookMask [64]; // 10..12 relevant occupancy bits per rook square U64 arrBishopMask[64]; // 5.. 9 relevant occupancy bits per bishop square U32 arrRookBase [64]; // arrAttacks base offset per rook square U32 arrBishopBase[64]; // arrAttacks base offset per bishop square U64 rookAttack(U64 occ, enumSquare sq) { return arrAttacks[arrRookBase[sq] + _pext_u64(occ, arrRookMask[sq])]; } U64 bishopAttack(U64 occ, enumSquare sq) { return arrAttacks[arrBishopBase[sq] + _pext_u64(occ, arrBishopMask[sq])]; }
PEXT/PDEP Bitboards
An idea for denser sliding attack tables is also using PDEP, dividing the table size of the PEXT Bitboards by four but some trailing computation. Looking up 16-bit words instead of bitboards - to get the scattered bits of the up to four ray attacks per square in an extracted form mapped to a 16-bit word, which then becomes the attack bitboard by performing a deposit instruction with an appropriate second mask per square. The technique was introduced by Zach Wegner in CCC [12] [13] and implemented by Ronald de Man with a table of slightly more than 210 KiB [14] [15]. Ronald's code was tested by Jean-Francois Romang using Stockfish as testbed, giving promising results [16] [17].
Syzygy Generator
Ronald de Man's generator for Syzygy Bases can take profit of PDEP and PEXT instructions, or to use their serial implementations to further reduce the size of his already compact endgame tablebases [18].
See also
Manuals
Forum Posts
2011 ...
- Haswell New Instructions by Zach Wegner, CCC, September 09, 2011
- Bit manipulations using BMI2 by Jack Lloyd, / :: bitbashing, June 22, 2012
- PEXT Bitboards by Gerd Isenberg, CCC, June 07, 2013
- 152k rook and bishop attacks using PEXT and PDEP by Lasse Hansen, CCC, October 06, 2013
- Stockfish haswell optimized build by Jean-Francois Romang, CCC, April 06, 2014 » Stockfish
2015 ...
- smaller tables for PEXT-style attack getters by Wylie Garvin, CCC, January 13, 2016
- BMI2 intrinsics in gcc by Álvaro Begué, CCC, May 14, 2017
- Ryzen Fritz Chess Benchmarks ? by ralunger, Rybka Forum, March 03, 2017
- Ryzen and BMI2: Strange behavior and high latencies by DonnieTinyHands, Reddit, March 20, 2017 » AMD, BMI2 PEXT
- BMI2 PEXT idea for attacks generation by Mark Lefler, CCC, September 16, 2017
- Re: Komodo 11.3 by Mark Lefler, CCC, March 04, 2018 » AMD, BMI2 PEXT, Komodo 11.3
- Ryzen 2 and BMI2? by Steve Maughan, CCC, May 13, 2018 » AMD
External Links
- Bit Manipulation Instruction Sets from Wikipedia
- Compress Bits Under Mask - CompArch
- BMI2 | Intel® Developer Zone
- Bit manipulations using BMI2 — bitbashing, June 22, 2012
- How to detect New Instruction support in the 4th generation Intel® Core™ processor family | Intel® Developer Zone by Max Locktyukhin, August 05, 2013
- Haswell Instructions Latency
- Intel Intrinsics Guide
References
- ↑ Andreas Stiller (2013). Der Rechenkünstler. c't Magazin für Computertechnik 14/2013, p. 114-119 (German)
- ↑ How to detect New Instruction support in the 4th generation Intel® Core™ processor family | Intel® Developer Zone by Max Locktyukhin, August 05, 2013
- ↑ Ryzen Fritz Chess Benchmarks ? by ralunger, Rybka Forum, March 03, 2017
- ↑ Ryzen and BMI2: Strange behavior and high latencies by DonnieTinyHands, Reddit, March 20, 2017 » AMD, BMI2 PEXT
- ↑ Ruby Lee, Zhijie Shi, Xiao Yang (2001). Efficient Permutation Instructions for Fast Software Cryptography. Micro, IEEE, Vol. 21, No. 6, pdf
- ↑ Haswell New Instructions by Zach Wegner, CCC, September 09, 2011
- ↑ _bzhi_u32/u64
- ↑ _pdep_u32/u64
- ↑ Re: PEXT Bitboards by Ronald de Man, CCC, June 07, 2013
- ↑ _pext_u32/u64
- ↑ Re: PEXT Bitboards by Gerd Isenberg, CCC, June 07, 2013
- ↑ Haswell New Instructions by Zach Wegner, CCC, September 09, 2011
- ↑ Re: Haswell New Instructions by Gerd Isenberg, CCC, September 12, 2011
- ↑ Re: 152k rook and bishop attacks using PEXT and PDEP by Ronald de Man, CCC, October 06, 2013
- ↑ tb/src/bmi2.h at master · syzygy1/tb · GitHub by Ronald de Man
- ↑ Re: Stockfish haswell optimized build by Jean-Francois Romang, CCC, April 06, 2014
- ↑ Stockfish/src/bitboard.h at bmi2 · jromang/Stockfish · GitHub by Jean-Francois Romang
- ↑ Re: PEXT Bitboards by Ronald de Man, CCC, June 07, 2013