Changes

Newer edit →

Bitboard Serialization

12,076 bytes added, 16:26, 7 May 2018

'''[[Main Page|Home]] * [[Board Representation]] * [[Bitboards]] * Bitboard Serialization'''

[[FILE:EschersEncounter.jpg|border|right|thumb|276px|link=http://www.mcescher.com/Gallery/back-bmp/LW331.jpg|[[Arts#Escher|M. C. Escher]], Encounter, 1944 <ref>[http://www.mcescher.com/Gallery/gallery-back.htm Picture gallery "Back in Holland 1941 - 1954"] from [http://www.mcescher.com/ The Official M.C. Escher Website]</ref> ]]

'''Bitboard Serialization''' refers to the transformation of a bitboard with up to 64 one-bits set into a list of up to 64 bit-indices aka [[Squares|square indices]] of a [[8x8 Board|8x8 board]] - for instance to process [[Target Square|move-target]] sets for [[Move Generation|move generation]]. This is done in two phases, isolating none-empty [https://en.wikipedia.org/wiki/Subset subsets] and then transforming those more versatile subsets into lists, either bit by bit, by applying a [https://en.wikipedia.org/wiki/Bisection bisection] scheme, where finally [[Word|words]] or [[Byte|bytes]] may act as index of a pre-calculated database, or by [[Hash Table#PerfectHashing|perfect hashing]] of square lists by subsets with a limited maximum popularity, for instance move-target sets of a [[King|king]] or [[Knight|knight]] even with [[Hash Table#MinimalPerfectHashing|minimal perfect hashing]].

=Isolating Subsets=
The process of isolating subsets is performed by [[General Setwise Operations#Intersection|intersection]], for single populated subsets with the [[General Setwise Operations#TheTwosComplement|two's complement]].

==Single Bits==
This obvious loop approach is similar to [[Population Count#BrianKernighansway|Brian Kernighan's way]] to count the number of one-bits by consecutively isolating and clearing the LS1B:
<pre>
while ( x ) {
U64 ls1b = x & -x; // isolate LS1B
...
x &= x-1; // reset LS1B
}
</pre>
or with likely the same generated assembly:
<pre>
if ( x ) do {
U64 ls1b = x & -x; // isolate LS1B
...
} while ( x &= x-1 );
</pre>
Of course we may also reset the LS1B by
<pre>
x ^= ls1b; // reset LS1B
</pre>
but todays processors like to gain more parallelism to calculate independent expressions.

==Multiple Bits==
Isolating none-empty subsets with possibly multiple one-bits can be applied by [https://en.wikipedia.org/wiki/Divide_and_conquer_algorithm divide and conquer], that is to divide the bitboard or [[Quad Word|quad word]] [[Recursion|recursively]] into smaller items, two [[Double Word|double words]], consisting of two [[Word|words]] and those again of two [[Byte|bytes]]. Word and byte-wide isolated subsets may then act as an index of a pre-calculated lookup table to convert those subsets to adequate data-structures, most likely lists with up to sixteen or eight elements.

=Converting Sets to Lists=

==Square Index Serialization==
For most applications LS1B-isolation alone is not appropriate, but the conversion from the exponential bitboard centric world to the scalar square centric world, also called [[BitScan|bit-scanning]].

===Scanning Forward===
<pre>
if ( x ) do {
int idx = bitScanForward(x); // square index from 0..63
*list++ = foo(idx, ...);
} while (x &= x-1); // reset LS1B
</pre>
Per definition bitScanForward reveals the index of LS1B.

===Scanning Reverse===
If - for some reason - we like to traverse the sets in reverse or unknown order anyway, we can not (or don't want to) rely on the independent LS1B reset.
<pre>
if ( x ) do {
int idx = bitScanReverse(x); // square index from 0..63
*list++ = foo(idx, ...);
} while (x ^= powOf2[idx]) ; // or 1ULL << idx -> reset found bit
</pre>

===Scanning with Reset===
A win of abstraction is to use a combined [[BitScan#BitscanwithReset|bitscan with reset]] found bit routine. This is fine. But probably harder for compilers to generate optimal code in the if-do-while-sense, where reset last bit already sets the zero-flag. If you don't care on such micro-optimizations, this is the preferred control structure.
<pre>
while ( x ) {
int idx = bitScanAndReset(&x); // square index from 0..63
*list++ = foo(idx, ...);
}
</pre>
One may even don't care about the order.

===Intrinsic Version===
If bitscan is able to properly handle empty sets - leaving an value outside the 0..63 range (like leading or trailing zero count), we may think about to skip the leading while condition and to break on bitscan(x) > 63 for instance. That was not recommend - since the reset leaves the condition en-passant, and the computational cost of an additional bitscan or zero count was higher. If you like to play the optimization game, it might be fine for [[x86-64]] [https://en.wikipedia.org/wiki/Intel_Core_2 Core 2 duo] thought - using [[x86-64#gpinstructions|bitscan]] and [[x86-64#gpinstructions|bittestandreset]] intrinsics or wrappers - if kept all in registers of course.
<pre>
while (_BitScanForward64(&idx, x)) { // or reverse
*list++ = foo(idx, ...);
_bittestandreset64(&x, idx);
}
</pre>
The loop is intended to look like this in [[x86-64]] [[Assembly|assembly]]:
<pre>
; input rdx - move target set
; ecx - move from aspects
; rdi - pointer to movelist
std ; set direction flag
bsf rax, rdx ; scan first to-bit
jz over ; jump if no more moves
loop:
btr rdx, rax ; reset found bit
or eax, ecx ; combine to- with from-square
stosw ; store 16-bit move *rdi++ = move
bsf rax, rdx ; scan next to-bit
jnz loop ; jump if more moves
over:
</pre>

===Black or White===
With bitboard serialization one minor problem is the relative order in [[Move Generation|move generation]] considering [[Side to move|side to move]]. Bsf scans the board in a1..h1, a2..h2, a8..h8 order, assuming [[Square Mapping Considerations#LittleEndianRankFileMapping|little-endian rank-file mapping]], which might be the desired order for an attacking black player. Traversing white pieces and target squares the same way may result in asymmetries and different search behavior of [[Color Flipping|color flipped positions]]. Despite other features considered in [[Move Ordering|move ordering]], the initial order in generation has more or less influence. Therefore, it is desired to traverse the "white" bitboards with priority for the black back-rank as well <ref>[http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=289344&t=29611&sid=74848128f12e45f8883a87c3e6729f75 Re: Alternatives to History Heuristics] by [[Robert Hyatt]], [[CCC]], September 02, 2009</ref> . This might be done by bitscan reverse, which covers the rank symmetry, but also mirrors the files. Another alternative is to traverse a [[Flipping Mirroring and Rotating#FlipVertically|vertically flipped]] "white" bitboard, which can be done outside the do-while loop by a "conditional" [[x86-64]] [[x86-64#gpinstructions|byte swap]], and requires one further register and xor per loop cycle, which might be combined with other stuff, f.i. the [[Origin Square|from square]] of a move:
<pre>
if ( x ) {
U64 m = (U64)color - 1; // e.g. -1 if white, 0 for black
int o = (int)m & 56;
x = x ^ ((x ^ flipVertical(x)) & m); // conditional flip
do {
int idx = bitScanForward(x) ^ o; // square index from 0..63
*list++ = foo(idx , ...);
} while (x &= x-1); // reset LS1B
}
</pre>

===STL Iterator===
[[Rein Halbersma]] has written a prototype of a [[Generic Programming|generic]] [[Cpp|C++11]] bitset [[Cpp#Template|template]] that can be used to traverse a set in [https://en.wikipedia.org/wiki/Iterator#C.2B.2B STL iterator] style, hiding bitscan and reset <ref>[http://liveworkspace.org/code/41EaZl$203 LiveWorkSpace(IDE online): C++-3.2 (clang++): 41EaZl]</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=47414&start=4 Re: C vs ASM] by [[Rein Halbersma]], [[CCC]], March 05, 2013</ref> ...
<pre>
typedef bit_set<int64_t, 1> bitset;

void testLoop(int* p, const bitset & x) {
for (auto it = x.begin(); it != x.end(); ++it)
*p++ = *it;
}
/* or std::copy */
void testCopy(int* p, const bitset & x) {
std::copy(x.begin(), x.end(), p);
}
</pre>
... which yields in following [[X86-64]] [[Assembly|assembly]], almost identical for the iterator loop and std::copy <ref>[http://gcc.godbolt.org/ GCC Explorer] with g++ 4.7 compiler, options -std=c++11 -O3 -march=k8-sse3 -fverbose-asm </ref>:
<pre>
; testLoop(int*, bit_set<long, 1ul>):
test rsi, rsi
jne .L13
.L7:
ret
.L13:
bsf rax, rsi
mov DWORD PTR [rdi], eax
lea rax, [rsi-1]
add rdi, 4
and rsi, rax
jne .L13
jmp .L7
</pre>

==Hashing Multiple Bits==
Similar to the idea to hash occupancies in [[Kindergarten Bitboards|kindergarten bitboards]] and [[Magic Bitboards|magic bitboards]], one may hash certain move-target subsets of one piece in one run, to lookup tables with pre-calculated moves-lists <ref>[https://www.stmintz.com/ccc/index.php?id=487844 Subject: sliding move generation idea with bitboards] by [[Gerd Isenberg]], [[CCC]], February 19, 2006</ref> . Sounds like doing up to eight bitscans in parallel. The idea is to muliply-shift-lookup a move-target bitboard, to do an almost branch-less [[Move Generation|move-generation]] with pre-calculated moves inside [[Move List|move-lists]]. For instance king- and knight-moves <ref>[http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=6099 Magic Knight- and King-Move Generation] by [[Gerd Isenberg]], [[Computer Chess Forums|Winboard Forum]], Januar 11, 2007</ref> as well as moves of sliding pieces per line:
<pre>
moveTarget = KingAttacks[sq] & ~(ownPieces | attackedSquares);
idx = (moveTarget * kingMagic[sq]) >> kingShift[sq]);
movelists = kingMoveLists[sq];
movelist = movelists[idx];
moveCpy(target, movelist, movelist->n);
</pre>
A king in a corner may have up to three moves. Thus there are 2^3 == 8 possible move-lists.
For instance for a king on a1 (number of moves: vector of moves):
<pre>
0:{empty}
1:{a1-b1}
1:{a1-a2}
1:{a1-b2}
2:{a1-b1,a1-a2}
2:{a1-b1,a1-b2}
2:{a1-a2,a1-b2}
3:{a1-b1,a1-a2,a1-b2}
</pre>
Kings on edges have 5 potential target squares, thus there are 32 possible move-lists. All other kings have 8 all the 8 neighbors with up to 256 move-lists. Similar move-list enumeration is possible with knights and others. All possible move-target subsets of kings and knights for all 64 from-squares are [[Hash Table#MinimalPerfectHashing|perfectly minimal hashtable]] with a magic factor of four one-bits set. 10016 possible king-move-lists and 5520 knight-move-lists. To reduce memory one may offset the sets to a "normalized" source square per king, knight and sliding piece line, implying some vector arithmetic in the board centric world considering the offset.

The less populated move-target subsets are, the less efficient this hashing technique. This might become a problem since bitboard move-generation is essentially about subsets of moves with certain properties, like most importantly fast winning captures at [[Node Types#CUT|Cut-nodes]].

=See also=
* [[Move Generation]]
: [[Belle#Hardware|Belle | Hardware Move Generation]]
* [[Move Ordering]]
* [[Pieces versus Directions]]
* [[Table-driven Move Generation]]
* [[Traversing Subsets of a Set]]

=Forum Posts=
* [https://www.stmintz.com/ccc/index.php?id=487844 Subject: sliding move generation idea with bitboards] by [[Gerd Isenberg]], [[CCC]], February 19, 2006
* [http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=6099 Magic Knight- and King-Move Generation] by [[Gerd Isenberg]], [[Computer Chess Forums|Winboard Forum]], Januar 11, 2007
* [http://www.talkchess.com/forum/viewtopic.php?t=47414&start=4 Re: C vs ASM] by [[Rein Halbersma]], [[CCC]], March 05, 2013
* [http://www.talkchess.com/forum/viewtopic.php?t=54704 Symmetric move generation using bitboards] by [[Lasse Hansen]], [[CCC]], December 20, 2014 » [[BitScan]]

=External Links=
* [https://en.wikipedia.org/wiki/Serialization Serialization from Wikipedia]

=References=
<references />

'''[[Bitboards|Up one Level]]'''

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

Bitboard Serialization

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools