Changes

Jump to: navigation, search

BitScan

59,349 bytes added, 08:55, 4 May 2018
Created page with "'''Home * Board Representation * Bitboards * BitScan''' FILE:EschersEye.jpg|border|right|thumb|187px|link=http://www.mcescher.com/Gallery/back-bmp/LW3..."
'''[[Main Page|Home]] * [[Board Representation]] * [[Bitboards]] * BitScan'''

[[FILE:EschersEye.jpg|border|right|thumb|187px|link=http://www.mcescher.com/Gallery/back-bmp/LW344.jpg|[[Arts#Escher|M. C. Escher]] - Eye, 1946 <ref>[http://www.mcescher.com/Gallery/gallery-back.htm Picture gallery "Back in Holland 1941 - 1954"] from [http://www.mcescher.com/ The Official M.C. Escher Website]</ref> ]]

'''BitScan''',<br/>
a function that determines the bit-index of the least significant 1 [[Bit|bit]] ([[General Setwise Operations#TheLeastSignificantOneBitLS1B|LS1B]]) or the most significant 1 [[Bit|bit]] ([[General Setwise Operations#TheMostSignificantOneBitMS1B|MS1B]]) in an integer such as [[Bitboards|bitboards]]. If exactly one bit is set in an unsigned integer, representing a numerical value of a [https://en.wikipedia.org/wiki/Power_of_two power of two], this is equivalent to a [https://en.wikipedia.org/wiki/Binary_logarithm base-2 logarithm]. Many implementations have been devised since the advent of bitboards, as described on this page, and some [[BitScan#EngineSamples|implementation samples]] of concrete [[Open Source Engines|open source engines]] listed for didactic purpose.

=Hardware vs. Software=
For recent [[x86-64]] architectures like [https://en.wikipedia.org/wiki/Intel_Core_2 Core 2 duo] and [https://en.wikipedia.org/wiki/AMD_K10 K10], one should use the [[BitScan#bsfbsr|Processor Instructions for Bitscans]] via intrinsics or [[Assembly#InlineAssembly|inline assembly]], see [[BitScan#x86Timing|x86-64 timing]]. [https://en.wikipedia.org/wiki/Pentium_4 P4] and [https://en.wikipedia.org/wiki/Athlon_64 K8] have rather slow bitscan-instructions. K8 uses so called ''vector path instructions'' <ref>[http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html#1.3 Chip Architect: Detailed Architecture of AMD's Opteron - 1.3 A third class of Instructions] by [http://www.chip-architect.com/ Hans de Vries]</ref> with 9 or 11 cycles latency, even blocking other processor resources. For these processors, specially K8 with already fast multiplication, the [[BitScan#DeBruijnMultiplation|De Bruijn Multiplication]] (64-bit mode) or [[Matt Taylor|Matt Taylor's]] [[BitScan#MattTaylorsFoldingtrick|Folded 32-bit Multiplication]] (32-bit mode) might be the right choice. Other routines mentioned might be advantageous on certain architectures, specially with slow integer multiplications.

=Non Empty Sets=
Bitscan is most often used in [[Bitboard Serialization| serializing bitboards]], and is therefor - due to a leading while-condition - not called with empty sets. Until stated otherwise, most mentioned bitscan-routines in [[C]]/[[Cpp|C++]] have the same prototype and assume none empty sets as actual parameter.
<span id="Bitscanforward"></span>
=Bitscan forward=
A bitscan '''forward''' is used to find the index of the '''least''' significant 1 bit ([[General Setwise Operations#TheLeastSignificantOneBitLS1B|LS1B]]).
<span id="TrailingZeroCount"></span>
==Trailing Zero Count==
Bitscan forward is identical with a '''Trailing Zero Count''' for none empty sets, possibly available as machine instruction on some architectures, for instance the [[x86-64]] bit-manipulation expansion set [[BMI1]].
<span id="DeBruijnMultiplation"></span>
==De Bruijn Multiplication==
The '''De Bruijn''' bitscan was devised in 1997, according to [[Donald Knuth]] <ref>[[Donald Knuth]] ('''2009'''). ''[http://www-cs-faculty.stanford.edu/%7Eknuth/taocp.html The Art of Computer Programming], Volume 4, Fascicle 1: Bitwise tricks & techniques'', as [http://www-cs-faculty.stanford.edu/%7Eknuth/fasc1a.ps.gz Pre-Fascicle 1a postscript], p 10</ref> by [[Mathematician#MartinLaeuter|Martin Läuter]], and independently by [[Charles Leiserson]], [[Harald Prokop]] and [[Keith H. Randall]] a few month later <ref>[[Charles Leiserson|Charles E. Leiserson]], [[Harald Prokop]] and [[Keith H. Randall]] ('''1998'''). ''Using de Bruijn Sequences to Index a 1 in a Computer Word'', [http://supertech.csail.mit.edu/papers/debruijn.pdf pdf]</ref> <ref>[https://www.stmintz.com/ccc/index.php?id=212586 "Using de Bruijn Sequences to Index a 1 in a Computer Word"] discussion in [[CCC]], February 08, 2002</ref> , to determine the [[General Setwise Operations#TheLeastSignificantOneBitLS1B|LS1B]] index by [[Hash Table#MinimalPerfectHashing|minimal perfect hashing]]. [[De Bruijn sequence|De Bruijn sequences]] were named after the Dutch mathematician [[Nicolaas de Bruijn]]. Interestingly sequences with the binary alphabet were already investigated by the French mathematician '''Camille Flye Sainte-Marie''' in 1894, but later "forgotten" and re-investigated and generalized by De Bruijn and [[Mathematician#Ehrenfest|Tanja van Ardenne-Ehrenfest]] half a century later <ref>[[Nicolaas de Bruijn|N. G. de Bruijn]] ('''1975'''). ''Acknowledgement of priority to C. Flye Sainte-Marie on the counting of circular arrangements of 2n zeros and ones that show each n-letter word exactly once''. Technical Report, Technische Hogeschool Eindhoven, available as [http://alexandria.tue.nl/repository/books/252901.pdf pdf reprint]</ref> .

A 64-bit De Bruijn sequence contains 64-overlapped unique 6-bit sequences, thus a circle of 64 bits, where five leading zeros overlap five hidden "trailing" zeros. There are 2<span style="vertical-align: super;
font-size: 80%;">26</span> = 67108864 odd sequences with 6 leading binary zeros and 2<span style="vertical-align: super;
font-size: 80%;">26</span> even sequences with 5 leading binary zeros, which may be calculated from the odd ones by shifting left one.

===With isolated LS1B===
A multiplication with a power of two value (the [[General Setwise Operations#LS1BIsolation|isolated LS1B]]) acts like a left shift by it's exponent. Thus, if we multiply a 64-bit De Bruijn sequence with the isolated LS1B, we get a unique six bit subsequence inside the most significant bits. To obtain the bit-index we need to extract these upper six bits by shifting right the product, to lookup an [[Array|array]].

<pre>
const int index64[64] = {
0, 1, 48, 2, 57, 49, 28, 3,
61, 58, 50, 42, 38, 29, 17, 4,
62, 55, 59, 36, 53, 51, 43, 22,
45, 39, 33, 30, 24, 18, 12, 5,
63, 47, 56, 27, 60, 41, 37, 16,
54, 35, 52, 21, 44, 32, 23, 11,
46, 26, 40, 15, 34, 20, 31, 10,
25, 14, 19, 9, 13, 8, 7, 6
};

/**
* bitScanForward
* @author Martin Läuter (1997)
* Charles E. Leiserson
* Harald Prokop
* Keith H. Randall
* "Using de Bruijn Sequences to Index a 1 in a Computer Word"
* @param bb bitboard to scan
* @precondition bb != 0
* @return index (0..63) of least significant one bit
*/
int bitScanForward(U64 bb) {
const U64 debruijn64 = C64(0x03f79d71b4cb0a89);
assert (bb != 0);
return index64[((bb & -bb) * debruijn64) >> 58];
}
</pre>

See also how to [[De Bruijn Sequence Generator|Generate your "private" De Bruijn Bitscan Routine]].
<span id="KimWalisch"></span>
===With separated LS1B===
Instead of the classical [[General Setwise Operations#LS1BIsolation|LS1B isolation]], [[Kim Walisch]] proposed the faster [[General Setwise Operations#ExclusiveOr|xor]] with the ones' decrement. The separation [[General Setwise Operations#LS1BSeparation|bb ^ (bb-1)]] contains all bits set including and below the [[General Setwise Operations#TheLeastSignificantOneBitLS1B|LS1B]]. The 2<span style="vertical-align: super;
font-size: 80%;">22</span> (4,194,304) upper De Bruijn sequences of the 2<span style="vertical-align: super;
font-size: 80%;">26</span> available leave unique 6-bit indices. Using LS1B separation takes advantage of the x86 lea instruction, which saves the move instruction and unlike negate, has no data dependency on the flag register. Kim reported a 10 to 15 percent faster execution (compilers: g++-4.7 -O2, clang++-3.1 -O2, x86_64) than the traditional 64-bit De Bruijn bitscan on [[Intel]] [https://en.wikipedia.org/wiki/Nehalem_%28microarchitecture%29 Nehalem] and [https://en.wikipedia.org/wiki/Sandy_Bridge_%28microarchitecture%29 Sandy Bridge] CPUs.

<pre>
const int index64[64] = {
0, 47, 1, 56, 48, 27, 2, 60,
57, 49, 41, 37, 28, 16, 3, 61,
54, 58, 35, 52, 50, 42, 21, 44,
38, 32, 29, 23, 17, 11, 4, 62,
46, 55, 26, 59, 40, 36, 15, 53,
34, 51, 20, 43, 31, 22, 10, 45,
25, 39, 14, 33, 19, 30, 9, 24,
13, 18, 8, 12, 7, 6, 5, 63
};

/**
* bitScanForward
* @author Kim Walisch (2012)
* @param bb bitboard to scan
* @precondition bb != 0
* @return index (0..63) of least significant one bit
*/
int bitScanForward(U64 bb) {
const U64 debruijn64 = C64(0x03f79d71b4cb0a89);
assert (bb != 0);
return index64[((bb ^ (bb-1)) * debruijn64) >> 58];
}
</pre>
<span id="MattTaylorsFoldingtrick"></span>
==Matt Taylor's Folding trick==
A 32-bit friendly implementation to find the the bit-index of [[General Setwise Operations#TheLeastSignificantOneBitLS1B|LS1B]] by [[Matt Taylor]] <ref>[http://groups.google.com/group/comp.lang.asm.x86/browse_frm/thread/de9546cd019bd72b/f46209f47d2a7ddb?lnk=gst&q=Matt+Taylor+magic#f46209f47d2a7ddb Bit magic] by [[Matt Taylor]], [http://groups.google.com/group/comp.lang.asm.x86/topics comp.lang.asm.x86], June 26, 2003</ref>. The [[General Setwise Operations#ExclusiveOr|xor]] with the ones' decrement, [[General Setwise Operations#LS1BSeparation|bb ^ (bb-1)]] contains all bits set including and below the [[General Setwise Operations#TheLeastSignificantOneBitLS1B|LS1B]]. The 32-bit xor-difference of both halves yields either the complement of the upper half, or the lower half otherwise. Some samples:
{| class="wikitable"
|-
! ls1b
! bb ^ (bb-1)
! folded
|-
| style="text-align:center;" | 63
| <code>0xffffffffffffffff</code>
| <code>0x00000000</code>
|-
| style="text-align:center;" | 62
| <code>0x7fffffffffffffff</code>
| <code>0x80000000</code>
|-
| style="text-align:center;" | 59
| <code>0x0fffffffffffffff</code>
| <code>0xf0000000</code>
|-
| style="text-align:center;" | 32
| <code>0x00000001ffffffff</code>
| <code>0xfffffffe</code>
|-
| style="text-align:center;" | 31
| <code>0x00000000ffffffff</code>
| <code>0xffffffff</code>
|-
| style="text-align:center;" | 30
| <code>0x000000007fffffff</code>
| <code>0x7fffffff</code>
|-
| style="text-align:center;" | 0
| <code>0x0000000000000001</code>
| <code>0x00000001</code>
|}

Even if this folded "LS1B" contains multiple consecutive one-bits, the multiplication is De Bruijn like. There are only two magic 32-bit constants with the combined property of 32- and 64-bit De Bruijn sequences to apply this [[Hash Table#MinimalPerfectHashing|minimal perfect hashing]]:

<pre>
const int lsb_64_table[64] =
{
63, 30, 3, 32, 59, 14, 11, 33,
60, 24, 50, 9, 55, 19, 21, 34,
61, 29, 2, 53, 51, 23, 41, 18,
56, 28, 1, 43, 46, 27, 0, 35,
62, 31, 58, 4, 5, 49, 54, 6,
15, 52, 12, 40, 7, 42, 45, 16,
25, 57, 48, 13, 10, 39, 8, 44,
20, 47, 38, 22, 17, 37, 36, 26
};

/**
* bitScanForward
* @author Matt Taylor (2003)
* @param bb bitboard to scan
* @precondition bb != 0
* @return index (0..63) of least significant one bit
*/
int bitScanForward(U64 bb) {
unsigned int folded;
assert (bb != 0);
bb ^= bb - 1;
folded = (int) bb ^ (bb >> 32);
return lsb_64_table[folded * 0x78291ACF >> 26];
}
</pre>
A slightly modified version may take one [[x86]]-register less in 32-bit mode, but calculates bb-1 twice:
<pre>
int bitScanForwardM(BitBoard bb) {
unsigned int folded;
assert (bb != 0);
folded = (int)((bb ^ (bb-1)) >> 32);
folded ^= (int)( bb ^ (bb-1)); // lea
return lsb_64_table[folded * 0x78291ACF >> 26];
}
</pre>
with this VC6 generated [[x86]] [[Assembly|assembly]] to compare:
<pre>
bitScanForward PROC NEAR bitScanForwardM PROC NEAR
mov ecx, DWORD PTR _bb$[esp-4] mov eax, DWORD PTR _bb$[esp-4]
mov eax, DWORD PTR _bb$[esp] mov ecx, eax
mov edx, ecx add ecx, -1
push esi mov ecx, DWORD PTR _bb$[esp]
add edx, -1 mov edx, ecx
mov esi, eax adc edx, -1
adc esi, -1 xor edx, ecx
xor ecx, edx lea ecx, DWORD PTR [eax-1]
xor eax, esi xor edx, ecx
pop esi
xor eax, ecx xor edx, eax
imul eax, 78291acfH imul edx, 78291acfH
shr eax, 26 shr edx, 26
mov eax, DWORD PTR _lsb_64_table[eax*4] mov eax, DWORD PTR _lsb_64_table[edx*4]
ret 0 ret 0
bitScanForward ENDP bitScanForward ENDP
</pre>
<span id="WalterFaxonsmagicBitscan"></span>
==Walter Faxon's magic Bitscan==
[[Walter Faxon|Walter Faxon's]] 32-bit friendly magic bitscan <ref>[https://www.stmintz.com/ccc/index.php?id=265635 Another hacky method for bitboard bit extraction] by [[Walter Faxon]], [[CCC]], November 17, 2002</ref> uses a fast none minimal [[Hash Table#PerfectHashing|perfect hashing]] function:
<pre>
const char LSB_64_table[154] =
{
#define __ 0
22,__,__,__,30,__,__,38,18,__, 16,15,17,__,46, 9,19, 8, 7,10,
0, 63, 1,56,55,57, 2,11,__,58, __,__,20,__, 3,__,__,59,__,__,
__,__,__,12,__,__,__,__,__,__, 4,__,__,60,__,__,__,__,__,__,
__,__,__,__,21,__,__,__,29,__, __,37,__,__,__,13,__,__,45,__,
__,__, 5,__,__,61,__,__,__,53, __,__,__,__,__,__,__,__,__,__,
28,__,__,36,__,__,__,__,__,__, 44,__,__,__,__,__,27,__,__,35,
__,52,__,__,26,__,43,34,25,23, 24,33,31,32,42,39,40,51,41,14,
__,49,47,48,__,50, 6,__,__,62, __,__,__,54
#undef __
};

/**
* bitScanForward
* @author Walter Faxon, slightly modified
* @param bb bitboard to scan
* @precondition bb != 0
* @return index (0..63) of least significant one bit
*/
int bitScanForward(U64 bb)
{
unsigned int t32;
assert(bb);
bb ^= bb - 1;
t32 = (int)bb ^ (int)(bb >> 32);
t32 ^= 0x01C5FC81;
t32 += t32 >> 16;
t32 -= (t32 >> 8) + 51;
return LSB_64_table [t32 & 255]; // 0..63
}
</pre>
A slightly modified version may take one [[x86]]-register less in 32-bit mode, but calculates bb-1 twice:
<pre>
int bitScanForward(U64 bb)
{
int t32 = 0x01C5FC81;
assert(bb);
t32 ^= (int)((bb ^ (bb-1)) >> 32);
t32 ^= (int)( bb ^ (bb-1)); // lea
t32 += t32 >> 16;
t32 -=(t32 >> 8) + 51;
return LSB_64_table [t32 & 255];
}
</pre>
The initial [[General Setwise Operations#LS1BSeparation|LS1B separation]] by bb ^ (bb-1) and folding is equivalent to [[BitScan#MattTaylorsFoldingtrick|Matt's]],
{| class="wikitable"
|-
! ls1b
! bb ^ (bb-1)
! folded
|-
| style="text-align:center;" | 63
| <code>0xffffffffffffffff</code>
| <code>0x00000000</code>
|-
| style="text-align:center;" | 62
| <code>0x7fffffffffffffff</code>
| <code>0x80000000</code>
|-
| style="text-align:center;" | 59
| <code>0x0fffffffffffffff</code>
| <code>0xf0000000</code>
|-
| style="text-align:center;" | 32
| <code>0x00000001ffffffff</code>
| <code>0xfffffffe</code>
|-
| style="text-align:center;" | 31
| <code>0x00000000ffffffff</code>
| <code>0xffffffff</code>
|-
| style="text-align:center;" | 30
| <code>0x000000007fffffff</code>
| <code>0x7fffffff</code>
|-
| style="text-align:center;" | 0
| <code>0x0000000000000001</code>
| <code>0x00000001</code>
|}
while Walter originally resets the LS1B, yielding in a cyclic index wrap:
{| class="wikitable"
|-
! LS1B
! (bb & (bb-1)) ^ (bb-1)
! folded
|-
| style="text-align:center;" | 0
| <code>0x0000000000000000</code>
| <code>0x00000000</code>
|-
| style="text-align:center;" | 63
| <code>0x7fffffffffffffff</code>
| <code>0x80000000</code>
|-
| style="text-align:center;" | 60
| <code>0x0fffffffffffffff</code>
| <code>0xf0000000</code>
|-
| style="text-align:center;" | 33
| <code>0x00000001ffffffff</code>
| <code>0xfffffffe</code>
|-
| style="text-align:center;" | 32
| <code>0x00000000ffffffff</code>
| <code>0xffffffff</code>
|-
| style="text-align:center;" | 31
| <code>0x000000007fffffff</code>
| <code>0x7fffffff</code>
|-
| style="text-align:center;" | 1
| <code>0x0000000000000001</code>
| <code>0x00000001</code>
|}
<span id="BitscanByModulo"></span>
==Bitscan by Modulo==
Another idea is to apply a [[General Setwise Operations#Modulo|modulo]] (remainder of a division) operation of the isolated [[General Setwise Operations#TheLeastSignificantOneBitLS1B|LS1B]] by the prime number 67 <ref>[http://groups.google.com/group/rec.games.chess.computer/browse_frm/thread/d5dbf08c66e83517# bitboard 2^i mod 67 is unique] by [[Stefan Plenkner]], [[Computer Chess Forums|rgcc]], August 6, 1996</ref> <ref>[[Pablo San Segundo]], [[Ramón Galán]] ('''2005'''). ''[http://www.actapress.com/Abstract.aspx?paperId=18953 Bitboards: A New Approach]''. [http://www.informatik.uni-trier.de/%7Eley/db/conf/aia/aia2005.html#SegundoG05 AIA 2005]</ref> . The remainder 0..66 can be used to [[Hash Table#PerfectHashing|perfectly hash]] the bit-index table. Three gaps are 0, 17, and 34, so the mod 67 can make a branchless trailing zero count:
{| class="wikitable"
|-
! Bit-Index
! Bitboard
! mod 67
|-
| style="text-align:center;" | -
| <code>0x0000000000000000</code>
| style="text-align:right;" | 0
|-
| style="text-align:center;" | 0
| <code>0x0000000000000001</code>
| style="text-align:right;" | 1
|-
| style="text-align:center;" | 1
| <code>0x0000000000000002</code>
| style="text-align:right;" | 2
|-
| style="text-align:center;" | 2
| <code>0x0000000000000004</code>
| style="text-align:right;" | 4
|-
| style="text-align:center;" | 3
| <code>0x0000000000000008</code>
| style="text-align:right;" | 8
|-
| style="text-align:center;" | 4
| <code>0x0000000000000010</code>
| style="text-align:right;" | 16
|-
| style="text-align:center;" | 5
| <code>0x0000000000000020</code>
| style="text-align:right;" | 32
|-
| style="text-align:center;" | 6
| <code>0x0000000000000040</code>
| style="text-align:right;" | 64
|-
| style="text-align:center;" | 7
| <code>0x0000000000000080</code>
| style="text-align:right;" | 61
|-
| style="text-align:center;" | 8
| <code>0x0000000000000100</code>
| style="text-align:right;" | 55
|-
| style="text-align:center;" | 9
| <code>0x0000000000000200</code>
| style="text-align:right;" | 43
|-
| style="text-align:center;" | 10
| <code>0x0000000000000400</code>
| style="text-align:right;" | 19
|-
| style="text-align:center;" | 11
| <code>0x0000000000000800</code>
| style="text-align:right;" | 38
|-
| style="text-align:center;" | 12
| <code>0x0000000000001000</code>
| style="text-align:right;" | 9
|-
| style="text-align:center;" | 13
| <code>0x0000000000002000</code>
| style="text-align:right;" | 18
|-
| style="text-align:center;" | 14
| <code>0x0000000000004000</code>
| style="text-align:right;" | 36
|-
| style="text-align:center;" | 15
| <code>0x0000000000008000</code>
| style="text-align:right;" | 5
|-
| style="text-align:center;" | 16
| <code>0x0000000000010000</code>
| style="text-align:right;" | 10
|-
| style="text-align:center;" | 17
| <code>0x0000000000020000</code>
| style="text-align:right;" | 20
|-
| style="text-align:center;" | 18
| <code>0x0000000000040000</code>
| style="text-align:right;" | 40
|-
| style="text-align:center;" | 19
| <code>0x0000000000080000</code>
| style="text-align:right;" | 13
|-
| style="text-align:center;" | 20
| <code>0x0000000000100000</code>
| style="text-align:right;" | 26
|-
| style="text-align:center;" | 21
| <code>0x0000000000200000</code>
| style="text-align:right;" | 52
|-
| style="text-align:center;" | 22
| <code>0x0000000000400000</code>
| style="text-align:right;" | 37
|-
| style="text-align:center;" | 23
| <code>0x0000000000800000</code>
| style="text-align:right;" | 7
|-
| style="text-align:center;" | 24
| <code>0x0000000001000000</code>
| style="text-align:right;" | 14
|-
| style="text-align:center;" | 25
| <code>0x0000000002000000</code>
| style="text-align:right;" | 28
|-
| style="text-align:center;" | 26
| <code>0x0000000004000000</code>
| style="text-align:right;" | 56
|-
| style="text-align:center;" | 27
| <code>0x0000000008000000</code>
| style="text-align:right;" | 45
|-
| style="text-align:center;" | 28
| <code>0x0000000010000000</code>
| style="text-align:right;" | 23
|-
| style="text-align:center;" | 29
| <code>0x0000000020000000</code>
| style="text-align:right;" | 46
|-
| style="text-align:center;" | 30
| <code>0x0000000040000000</code>
| style="text-align:right;" | 25
|-
| style="text-align:center;" | 31
| <code>0x0000000080000000</code>
| style="text-align:right;" | 50
|-
| style="text-align:center;" | 32
| <code>0x0000000100000000</code>
| style="text-align:right;" | 33
|-
| style="text-align:center;" | 33
| <code>0x0000000200000000</code>
| style="text-align:right;" | 66
|-
| style="text-align:center;" | 34
| <code>0x0000000400000000</code>
| style="text-align:right;" | 65
|-
| style="text-align:center;" | 35
| <code>0x0000000800000000</code>
| style="text-align:right;" | 63
|-
| style="text-align:center;" | 36
| <code>0x0000001000000000</code>
| style="text-align:right;" | 59
|-
| style="text-align:center;" | 37
| <code>0x0000002000000000</code>
| style="text-align:right;" | 51
|-
| style="text-align:center;" | 38
| <code>0x0000004000000000</code>
| style="text-align:right;" | 35
|-
| style="text-align:center;" | 39
| <code>0x0000008000000000</code>
| style="text-align:right;" | 3
|-
| style="text-align:center;" | 40
| <code>0x0000010000000000</code>
| style="text-align:right;" | 6
|-
| style="text-align:center;" | 41
| <code>0x0000020000000000</code>
| style="text-align:right;" | 12
|-
| style="text-align:center;" | 42
| <code>0x0000040000000000</code>
| style="text-align:right;" | 24
|-
| style="text-align:center;" | 43
| <code>0x0000080000000000</code>
| style="text-align:right;" | 48
|-
| style="text-align:center;" | 44
| <code>0x0000100000000000</code>
| style="text-align:right;" | 29
|-
| style="text-align:center;" | 45
| <code>0x0000200000000000</code>
| style="text-align:right;" | 58
|-
| style="text-align:center;" | 46
| <code>0x0000400000000000</code>
| style="text-align:right;" | 49
|-
| style="text-align:center;" | 47
| <code>0x0000800000000000</code>
| style="text-align:right;" | 31
|-
| style="text-align:center;" | 48
| <code>0x0001000000000000</code>
| style="text-align:right;" | 62
|-
| style="text-align:center;" | 49
| <code>0x0002000000000000</code>
| style="text-align:right;" | 57
|-
| style="text-align:center;" | 50
| <code>0x0004000000000000</code>
| style="text-align:right;" | 47
|-
| style="text-align:center;" | 51
| <code>0x0008000000000000</code>
| style="text-align:right;" | 27
|-
| style="text-align:center;" | 52
| <code>0x0010000000000000</code>
| style="text-align:right;" | 54
|-
| style="text-align:center;" | 53
| <code>0x0020000000000000</code>
| style="text-align:right;" | 41
|-
| style="text-align:center;" | 54
| <code>0x0040000000000000</code>
| style="text-align:right;" | 15
|-
| style="text-align:center;" | 55
| <code>0x0080000000000000</code>
| style="text-align:right;" | 30
|-
| style="text-align:center;" | 56
| <code>0x0100000000000000</code>
| style="text-align:right;" | 60
|-
| style="text-align:center;" | 57
| <code>0x0200000000000000</code>
| style="text-align:right;" | 53
|-
| style="text-align:center;" | 58
| <code>0x0400000000000000</code>
| style="text-align:right;" | 39
|-
| style="text-align:center;" | 59
| <code>0x0800000000000000</code>
| style="text-align:right;" | 11
|-
| style="text-align:center;" | 60
| <code>0x1000000000000000</code>
| style="text-align:right;" | 22
|-
| style="text-align:center;" | 61
| <code>0x2000000000000000</code>
| style="text-align:right;" | 44
|-
| style="text-align:center;" | 62
| <code>0x4000000000000000</code>
| style="text-align:right;" | 21
|-
| style="text-align:center;" | 63
| <code>0x8000000000000000</code>
| style="text-align:right;" | 42
|}

<pre>
/**
* trailingZeroCount
* @param bb bitboard to scan
* @return index (0..63) of least significant one bit, 64 if bb is zero
*/
int trailingZeroCount(U64 bb) {
static const int lookup67[67+1] = {
64, 0, 1, 39, 2, 15, 40, 23,
3, 12, 16, 59, 41, 19, 24, 54,
4, -1, 13, 10, 17, 62, 60, 28,
42, 30, 20, 51, 25, 44, 55, 47,
5, 32, -1, 38, 14, 22, 11, 58,
18, 53, 63, 9, 61, 27, 29, 50,
43, 46, 31, 37, 21, 57, 52, 8,
26, 49, 45, 36, 56, 7, 48, 35,
6, 34, 33, -1 };
return lookup67[(bb & -bb) % 67];
}
</pre>
Since div/mod is an expensive instruction, a [[General Setwise Operations#Modulo|modulo by a constant]] is likely replaced by reciprocal fixed point multiplication to get the quotient and a second multiplication and difference to get the remainder. Compared with De Bruijn multiplication it is still too slow.

==Divide and Conquer==
This is a broad group of bitscans that test in succession, like the trailing zero count based on [[Reinhard Scharnagl|Reinhard Scharnagl's]] proposal <ref>[http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=3141 Best BitBoard LSB funktion?] by [[Reinhard Scharnagl]], [[Computer Chess Forums|Winboard Programming Forum]], July 20, 2005</ref> :
<pre>
/**
* trailingZeroCount
* like bitScanForward for none empty sets
* @author Reinhard Scharnagl
* @param bb bitboard to scan
* @return index (0..64)
*/
unsigned char lsbRS[256] = {
8, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
4, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
5, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
4, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
6, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
4, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
5, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
4, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
7, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
4, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
5, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
4, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
6, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
4, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
5, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0,
4, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0
};

int trailingZeroCount(U64 b) {
unsigned buf;
int acc = 0;

if ((buf = (unsigned)b) == 0) {
buf = (unsigned)(b >> 32);
acc = 32;
}
if ((unsigned short)buf == 0) {
buf >>= 16;
acc += 16;
}
if ((unsigned char)buf == 0) {
buf >>= 8;
acc += 8;
}
return acc + lsbRS[buf & 0xff];
}
</pre>
What about direct calculation? On [[x86]] this is a chain of test, set and lea instructions:
<pre>
/**
* bitScanForward
* @author Gerd Isenberg
* @param bb bitboard to scan
* @precondition bb != 0
* @return index (0..63) of least significant one bit
*/
int bitScanForward(U64 bb) {
unsigned int lsb;
assert (bb != 0);
bb &= -bb; // LS1B-Isolation
lsb = (unsigned)bb
| (unsigned)(bb>>32);
return (((((((((((unsigned)(bb>>32) !=0) * 2)
+ ((lsb & 0xffff0000) !=0)) * 2)
+ ((lsb & 0xff00ff00) !=0)) * 2)
+ ((lsb & 0xf0f0f0f0) !=0)) * 2)
+ ((lsb & 0xcccccccc) !=0)) * 2)
+ ((lsb & 0xaaaaaaaa) !=0);
}
</pre>
<span id="DoubleConversionofLS1B"></span>
==Double conversion of LS1B==
Assuming 64-bit [[Double|doubles]] and [[little-endian]] structure (''not portable''). We convert the isolated [[General Setwise Operations#TheLeastSignificantOneBitLS1B|LS1B]] to a double and interprete the exponent:
<pre>
/**
* bitScanForward
* @author Gerd Isenberg
* @param bb bitboard to scan
* @return index (0..63) of least significant one bit
* -1023 if passing zero
*/
int bitScanForward(U64 bb)
{
union {
double d;
struct {
unsigned int mantissal : 32;
unsigned int mantissah : 20;
unsigned int exponent : 11;
unsigned int sign : 1;
};
} ud;
ud.d = (double)(bb & -bb); // isolated LS1B to double
return ud.exponent - 1023;
}
</pre>

==Index of LS1B by Popcount==
If we have a fast [[Population Count|population-count]] instruction, we can count the trailing zeros of [[General Setwise Operations#TheLeastSignificantOneBitLS1B|LS1B]] after subtracting one:
<pre>
// precondition bb != 0
int bitScanForward(U64 bb) {
assert (bb != 0);
return popCount( (bb & -bb) - 1 );
}
</pre>
<span id="Bitscanreverse"></span>
=Bitscan reverse=
A bitscan '''reverse''' is used to find the index of the '''most''' significant 1 bit ([[General Setwise Operations#TheMostSignificantOneBitMS1B|MS1B]]). For non empty sets it is equivalent to [https://en.wikipedia.org/wiki/Floor_and_ceiling_functions floor] of the [https://en.wikipedia.org/wiki/Binary_logarithm base-2 logarithm]. MS1B isolalation or separation is more expensive than LS1B isolalation or separation, due to the LS1B related [[General Setwise Operations#TheTwosComplement|Two's complement]] tricks are not applicable. However, beside Divide and Conquer and Double conversion, Bitscan reverse with MS1B separation is mentioned.

==Divide and Conquer==
As introduced by [[Eugene Nalimov]] in 2000, for an [[Itanium|IA-64]] version of [[Crafty]] <ref>[https://www.stmintz.com/ccc/index.php?id=124712 Re: Will the Itanium have a BSF or BSR instruction?] by [[Eugene Nalimov]], [[CCC]], August 16, 2000</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=38777 ms1bTable array in Eugene Nalimovs bitScanReverse] by [[Stef Luijten]], [[CCC]], April 17, 2011</ref>
<pre>
/**
* bitScanReverse
* @author Eugene Nalimov
* @param bb bitboard to scan
* @return index (0..63) of most significant one bit
*/
int bitScanReverse(U64 bb)
{
int result = 0;
if (bb > 0xFFFFFFFF) {
bb >>= 32;
result = 32;
}
if (bb > 0xFFFF) {
bb >>= 16;
result += 16;
}
if (bb > 0xFF) {
bb >>= 8;
result += 8;
}
return result + ms1bTable[bb];
}
</pre>
<span id="FrankZappa"></span>
==Tribute to Frank Zappa==
A branchless and little bit obfuscated version of the devide and conquer bitScanReverse with in-register-lookup <ref>[https://www.stmintz.com/ccc/index.php?id=472455 just another reverse bitscan] by [[Gerd Isenberg]], [[CCC]], December 22, 2005</ref> - as tribute to [[Videos#FrankZappa|Frank Zappa]] with identifiers from [https://en.wikipedia.org/wiki/Freak_Out! Freak Out!] (1966), [https://en.wikipedia.org/wiki/Hot_Rats Hot Rats] (1969), [https://en.wikipedia.org/wiki/Waka/Jawaka Waka/Jawaka] (1972), [https://en.wikipedia.org/wiki/Sofa_%28Frank_Zappa_song%29 Sofa] (1975), [https://en.wikipedia.org/wiki/One_Size_Fits_All_%28Frank_Zappa_album%29 One Size Fits All] (1975), [https://en.wikipedia.org/wiki/Sheik_Yerbouti Sheik Yerbouti] (1979), and [https://en.wikipedia.org/wiki/Jazz_from_Hell Jazz from Hell] (1986):
<pre>
typedef unsigned __int64 OneSizeFits;
typedef unsigned int HotRats;
const HotRats s = 0;
const HotRats heik = 457;
const HotRats y = 1;
const HotRats e = 2;
const HotRats r = 3;
const HotRats b = 4;
const HotRats o = 5;
const HotRats u = 8;
const HotRats t = 16;
const HotRats i = 32;
const HotRats ka = (1<< 4)-1;
const HotRats waka = (1<< 8)-1;
const HotRats jawaka = (1<<16)-1;
const HotRats jazzFromHell = 0-(16*3*heik);

HotRats freakOut(OneSizeFits all) {
HotRats so,fa;
fa = (HotRats)(all >> i);
so = (fa!=s) << o;
fa ^= (HotRats) all & (fa!=s)-y;
so ^= (jawaka < fa) << b;
fa >>= (jawaka < fa) << b;
so ^= ( waka - fa) >> t & u;
fa >>= ( waka - fa) >> t & u;
so ^= ( ka - fa) >> u & b;
fa >>= ( ka - fa) >> u & b;
so ^= jazzFromHell >> e*fa & r;
return so;
}
</pre>
<span id="FillDeBruijn"></span>
==De Bruijn Multiplication==
While the [[BitScan#FrankZappa|tribute]] to [[Videos#FrankZappa|Frank Zappa]] is quite 32-bit friendly <ref>[https://www.stmintz.com/ccc/index.php?id=472762 final version - homage to FZ] by [[Gerd Isenberg]], [[CCC]], December 23, 2005</ref>, [[Kim Walisch]] suggested to use the [[Parallel Prefix Algorithms|parallel prefix fill]] for a [[General Setwise Operations#TheMostSignificantOneBitMS1B|MS1B]] separation with the same [[De Bruijn sequence|De Bruijn]] multiplication and lookup as in his [[BitScan#KimWalisch|bitScanForward]] routine with [[General Setwise Operations#LS1BSeparation|separated LS1B]], with less instructions in 64-bit mode. A log base 2 method was already devised by Eric Cole on January 8, 2006, and shaved off rounded up to one less than the next power of 2 by Mark Dickinson <ref>[https://ep2012.europython.eu/conference/p/mark-dickinson EuroPython 2012: Florence, July 2–8 | Mark Dickinson]</ref> on December 10, 2009, as published in Sean Eron Anderson's ''Bit Twiddling Hacks'' for 32-bit integers <ref>[http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup] from [http://graphics.stanford.edu/%7Eseander/bithacks.html Bit Twiddling Hacks] by [http://graphics.stanford.edu/%7Eseander/ Sean Eron Anderson]</ref>.
<pre>
const int index64[64] = {
0, 47, 1, 56, 48, 27, 2, 60,
57, 49, 41, 37, 28, 16, 3, 61,
54, 58, 35, 52, 50, 42, 21, 44,
38, 32, 29, 23, 17, 11, 4, 62,
46, 55, 26, 59, 40, 36, 15, 53,
34, 51, 20, 43, 31, 22, 10, 45,
25, 39, 14, 33, 19, 30, 9, 24,
13, 18, 8, 12, 7, 6, 5, 63
};

/**
* bitScanReverse
* @authors Kim Walisch, Mark Dickinson
* @param bb bitboard to scan
* @precondition bb != 0
* @return index (0..63) of most significant one bit
*/
int bitScanReverse(U64 bb) {
const U64 debruijn64 = C64(0x03f79d71b4cb0a89);
assert (bb != 0);
bb |= bb >> 1;
bb |= bb >> 2;
bb |= bb >> 4;
bb |= bb >> 8;
bb |= bb >> 16;
bb |= bb >> 32;
return index64[(bb * debruijn64) >> 58];
}
</pre>
<span id="DoubleConversionBSR"></span>
==Double conversion==
Assuming 64-bit [[Double|doubles]] and [[little-endian]] structure (''not portable''!). Conversion to a double, interpreting the exponent. To avoid possible rounding errors, some lower bits may be cleared.
<pre>
/**
* bitScanReverse
* @author Gerd Isenberg
* @param bb bitboard to scan
* @return index (0..63) of most significant one bit
* -1023 if passing zero
*/
int bitScanReverse(U64 bb)
{
union {
double d;
struct {
unsigned int mantissal : 32;
unsigned int mantissah : 20;
unsigned int exponent : 11;
unsigned int sign : 1;
};
} ud;
ud.d = (double)(bb & ~(bb >> 32)); // avoid rounding error
return ud.exponent - 1023;
}
</pre>
<span id="LeadingZeroCount"></span>
==Leading Zero Count==
Some processors have a fast leading zero count instruction. The [[Motorola]] [[68020]] has a ''bit field find first one'' instruction (BFFFO), which actually performs an up to 32-bit ''Leading Zero Count'' <ref>[http://www-scm.tees.ac.uk/users/a.clements/BF/BF.htm 68020 Bit Field Instructions]</ref> . [[x86-64]] [[AMD]] [https://en.wikipedia.org/wiki/AMD_K10 K10] has ''lzcnt'' as part of the [[SSE4#SSE4a|SSE4a]] extension <ref>[https://en.wikipedia.org/wiki/SSE4#SSE4a SSE4a from Wikipedia]</ref> <ref>[http://msdn.microsoft.com/en-us/library/bb384809.aspx __lzcnt16, __lzcnt, __lzcnt64] Visual C++ Language Reference</ref> , [[BMI1]] has ''lzcnt'' as well, while [[AVX-512#VPLZCNT|AVX-512CD]] even features leading zero count on vectors of eight bitbaords.

One can replace bitScanReverse of non empty sets by leadingZeroCount xor 63. Like trailing zero count, it returns 64 for empty sets, and might therefor save the leading condition in some applications.

<span id="BitscanversusZeroCount"></span>
=Bitscan versus Zero Count=
While the presented bitscan routines are suited to work only on none empty sets and return a value-range from 0 to 63 as bit-index, leading or trailing zero-count instructions or routines leave 64 for empty sets. Zero-counting has a immanent property of dealing correctly with empty sets - while it likely takes a conditional branch to implement this semantic in bit-scanning.

<pre>
int trailingZeroCount(U64 bb) {
if ( bb )
return bitScanForward(bb);
return 64;
}

int leadingZeroCount(U64 bb) {
if ( bb )
return bitScanReverse(bb) ^ 63;
return 64;
}
</pre>

<span id="BitscanwithReset"></span>
=Bitscan with Reset=
While [[Bitboard Serialization|traversing sets]], one may combine bitscanning with reset found bit. That implies passing the bitboard per reference or pointer, and tends to confuse compilers to keep all inside registers inside a typical serialization loop <ref>[https://www.stmintz.com/ccc/index.php?id=283655 Bitscan] by [[Matt Taylor]], [[CCC]], February 11, 2003</ref> .
<pre>
int bitScanForwardWithReset(U64 &bb) { // also called dropForward
int idx = bitScanForward(bb);
bb &= bb - 1; // reset bit outside
return idx;
}
</pre>
<span id="GeneralizedBitscan"></span>
=Generalized Bitscan=
This generalized bitscan uses a boolean parameter to scan reverse or forward. It relies on bitScanReverse, but conditionally masks the [[General Setwise Operations#TheLeastSignificantOneBitLS1B|LS1B]] in case of scanning forward. It might be used in the [[Classical Approach|classical approach]] to get positive or negative ray directions with one generalized routine.
<pre>
/**
* generalized bitScan
* @author Gerd Isenberg
* @param bb bitboard to scan
* @precondition bb != 0
* @param reverse, true bitScanReverse, false bitScanForward
* @return index (0..63) of least/most significant one bit
*/
int bitScan(U64 bb, bool reverse) {
U64 rMask;
assert (bb != 0);
rMask = -(U64)reverse;
bb &= -bb | rMask;
return bitScanReverse(bb);
}
</pre>
<span id="bsfbsr"></span>
=Processor Instructions for Bitscans=
==x86==
[[x86-64]] processors have [[x86-64#gpinstructions|bitscan instructions]] and can be accessed with compilers today through either [[Assembly#InlineAssembly|inline assembly]] or compiler intrinsics. For the Microsoft/Intel C compiler, the intrinsics can be accessed by including and using the instructions ''_BitScanForward64'' <ref>[https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanforward-bitscanforward64 _BitScanForward, _BitScanForward64] Visual C++ Language Reference</ref> , ''_BitScanReverse64'' <ref>[https://docs.microsoft.com/en-us/cpp/intrinsics/bitscanreverse-bitscanreverse64 _BitScanReverse, _BitScanReverse64] Visual C++ Language Reference</ref> or _lzcnt64 <ref>[https://msdn.microsoft.com/en-us/library/bb384809.aspx __lzcnt16, __lzcnt, __lzcnt64] Visual C++ Language Reference</ref> .
<pre>
unsigned char_BitScanForward64(unsigned long * Index, unsigned __int64 Mask);
unsigned char _BitScanReverse64(unsigned long * Index, unsigned __int64 Mask);
unsigned __int64 __lzcnt64(unsigned __int64 value); // AMD K10 only see CPUID
</pre>

[[Linux]] provides library functions <ref>[http://linux.die.net/man/3/ Section 3: library functions - Linux man pages]</ref> , find first bit set (ffsll) in a word leaves an index of 1..64, and zero of no bit is set <ref>[http://linux.die.net/man/3/ffsll ffsll(3): find first bit set in word - Linux man page]</ref> . [[Free Software Foundation#GCC|GCC]] 4.4.5 further has the Built-in Function ''_builtin_ffsll'' for finding the least significant one bit, ''_builtin_ctzll'' for trailing, and ''_builtin_clzll'' for leading zero count <ref>[http://gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/Other-Builtins.html#Other-Builtins Other Builtins - Using the GNU Compiler Collection (GCC)]</ref> :
<pre>
/* Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero */
int __builtin_ffsll (unsigned long long);

/* Returns the number of trailing 0-bits in x, starting at the least significant bit position.
If x is 0, the result is undefined */
int __builtin_ctzll (unsigned long long);

/* Returns the number of leading 0-bits in x, starting at the most significant bit position.
If x is 0, the result is undefined */
int __builtin_clzll (unsigned long long);
</pre>

===Emulating Intrinsics===
For the GNU C compiler, the intrinsics can be emulated with [[Assembly#InlineAssembly|inline assembly]] <ref>[https://www.stmintz.com/ccc/index.php?id=388787 Re: Nalimov: bsf/bsr intrinsics implementation still not optimal] by [[Eugene Nalimov]], [[CCC]], September 23, 2004</ref> .
<pre>
//These processor instructions work only for 64-bit processors
#ifdef _MSC_VER
#include <intrin.h>
#ifdef _WIN64
#pragma intrinsic(_BitScanForward64)
#pragma intrinsic(_BitScanReverse64)
#define USING_INTRINSICS
#endif
#elif defined(__GNUC__) && defined(__LP64__)
static INLINE unsigned char _BitScanForward64(unsigned long* Index, U64 Mask)
{
U64 Ret;
__asm__
(
"bsfq %[Mask], %[Ret]"
:[Ret] "=r" (Ret)
:[Mask] "mr" (Mask)
);
*Index = (unsigned long)Ret;
return Mask?1:0;
}
static INLINE unsigned char _BitScanReverse64(unsigned long* Index, U64 Mask)
{
U64 Ret;
__asm__
(
"bsrq %[Mask], %[Ret]"
:[Ret] "=r" (Ret)
:[Mask] "mr" (Mask)
);
*Index = (unsigned long)Ret;
return Mask?1:0;
}
#define USING_INTRINSICS
#endif
</pre>

===Intrinsics versus asm===
Alternatively, rather than to emulate the intrinsics one might use the standard prototype, by using intrinsics or [[Assembly#InlineAssembly|inline assembly]] for [[Free Software Foundation#GCC|GCC]] <ref>[http://www.jjj.de/fxt/fxtbook.pdf Matters Computational - ideas, algorithms, source code] (pdf) Ideas and Source Code by [[Mathematician#Arndt|Jörg Arndt]]</ref> :
<pre>
#ifdef USE_X86INTRINSICS
#include <intrin.h>
#pragma intrinsic(_BitScanForward64)
#pragma intrinsic(_BitScanReverse64)

/**
* bitScanForward
* @param bb bitboard to scan
* @precondition bb != 0
* @return index (0..63) of least significant one bit
*/
int bitScanForward(U64 x) {
unsigned long index;
assert (x != 0);
_BitScanForward64(&index, x);
return (int) index;
}

/**
* bitScanReverse
* @param bb bitboard to scan
* @precondition bb != 0
* @return index (0..63) of most significant one bit
*/
int bitScanReverse(U64 x) {
unsigned long index;
assert (x != 0);
_BitScanReverse64(&index, x);
return (int) index;
}
#else

/**
* bitScanForward
* @param bb bitboard to scan
* @precondition bb != 0
* @return index (0..63) of least significant one bit
*/
int bitScanForward(U64 x) {
assert (x != 0);
asm ("bsfq %0, %0" : "=r" (x) : "0" (x));
return (int) x;
}

/**
* bitScanReverse
* @param bb bitboard to scan
* @precondition bb != 0
* @return index (0..63) of most significant one bit
*/
int bitScanReverse(U64 x) {
assert (x != 0);
asm ("bsrl %0, %0" : "=r" (x) : "0" (x));
return (int) x;
}
#endif
</pre>
<span id="x86Timing"></span>
===Bsf/Bsr x86-64 Timings===
The instruction latency and reciprocal throughput <ref>[http://www.agner.org/optimize/instruction_tables.pdf Instruction tables, Lists of instruction latencies, throughputs and microoperation breakdowns for Intel and AMD CPU's] (pdf) by [http://www.agner.org/ Agner Fog]</ref> heavily differs between various [[x86-64]] architectures:

{| class="wikitable"
|-
! Architecture Stepping
! Instruction(s)
! Latency / Cycles
! Reciprocal Throughput
|-
| colspan="4" | [[AMD]]
|-
| rowspan="2" | [https://en.wikipedia.org/wiki/Athlon_64 K8] <ref>[http://support.amd.com/us/Processor_TechDocs/25112.PDF Software Optimization Guide for AMD64 Processors]</ref>
| style="text-align:center;" | BSF reg16/32/64, mreg16/32/64
| style="text-align:right;" | Vector Path 8/8/9
|
| style="text-align:right;" | 8/8/9
|-
| style="text-align:center;" | BSR reg16/32/64, mreg16/32/64
| style="text-align:right;" | Vector Path 11
| style="text-align:right;" | 11
|-
| rowspan="3" | [https://en.wikipedia.org/wiki/AMD_K10 K10] <ref>[http://support.amd.com/us/Processor_TechDocs/40546.pdf Software Optimization Guide for AMD Family 10h and 12h Processors]</ref>
| style="text-align:center;" | BSF reg, reg
| style="text-align:right;" | Vector Path 4
| style="text-align:right;" | 4
|-
| style="text-align:center;" | BSR reg, reg
| style="text-align:right;" | Vector Path 4
| style="text-align:right;" | 4
|-
| style="text-align:center;" | [[SSE4#ABM|LZCNT]] reg, reg
| style="text-align:right;" | Direct Path single 2
| style="text-align:right;" | 1
|-
| colspan="4" | [[Intel]] <ref>[http://www.intel.com/design/processor/manuals/248966.pdf Intel 64 and IA32 Architectures Optimization Reference Manual]</ref>
|-
| [https://en.wikipedia.org/wiki/Intel_Atom ATOM]
| style="text-align:center;" | BSF/BSR
| style="text-align:right;" | 16
| style="text-align:right;" | 15
|-
| [https://en.wikipedia.org/wiki/NetBurst_%28microarchitecture%29 NetBurst] 0F_3H
| style="text-align:center;" | BSF/BSR
| style="text-align:right;" | 16
| style="text-align:right;" | 4
|-
| NetBurst 0F_2H
| style="text-align:center;" | BSF/BSR
| style="text-align:right;" | 8
| style="text-align:right;" | 2
|-
| [https://en.wikipedia.org/wiki/Core_%28microarchitecture%29 Core] 06_0EH
| style="text-align:center;" | BSF/BSR
| style="text-align:right;" | 2
| style="text-align:right;" | 1
|-
| 65 nm Intel Core 06_0FH
| style="text-align:center;" | BSF/BSR
| style="text-align:right;" | 2
| style="text-align:right;" | 1
|-
| Enhanced Intel Core 06_17H
| style="text-align:center;" | BSF/BSR
| style="text-align:right;" | 1
| style="text-align:right;" | 1
|-
| Enhanced Intel Core 06_1DH
| style="text-align:center;" | BSF/BSR
| style="text-align:right;" | 1
| style="text-align:right;" | 1
|-
| [https://en.wikipedia.org/wiki/Nehalem_%28microarchitecture%29 Nehalem] 06_1AH
| style="text-align:center;" | BSF/BSR
| style="text-align:right;" | 3
| style="text-align:right;" | 1
|-
| rowspan="3" |[https://en.wikipedia.org/wiki/Sandy_Bridge Sandy Bridge]<br/>
[https://en.wikipedia.org/wiki/Ivy_Bridge_%28microarchitecture%29 Ivy Bridge]<br/>
[https://en.wikipedia.org/wiki/Haswell_%28microarchitecture%29 Haswell] <ref>[http://users.atw.hu/instlatx64/GenuineIntel00306C3_Haswell_InstLatX64.txt Haswell Instructions Latency]</ref>
| style="text-align:center;" | BSF/BSR
| style="text-align:right;" | 3
| style="text-align:right;" | 1
|-
| style="text-align:center;" | [[BMI1#LZCNT|LZCNT]]
| style="text-align:right;" | 3
| style="text-align:right;" | 1
|-
| style="text-align:center;" | [[BMI1#LZCNT|TZCNT]]
| style="text-align:right;" | 3
| style="text-align:right;" | 1
|}

===Bsf/Bsr behavior with zero source===
[[Intel]] and [[AMD]] specify different behavior. In praxis there seems no difference so far. However, as long as Intel docs explicitly state content undefined, it is recommend to don't rely on a pre-initialized content of that target register, if the source is zero.
* Intel : If the content of the source operand is 0, the content of the destination operand is undefined. <ref>[http://www.intel.com/Assets/PDF/manual/253666.pdf Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference, A-M] (pdf) BSF—Bit Scan Forward 3-87</ref>
* AMD: If the second operand contains 0, the instruction sets ZF to 1 and does not change the contents of the destination register. <ref>[http://support.amd.com/us/Processor_TechDocs/24594_APM_v3.pdf AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions] (pdf) Bit Scan Forward pg. 111</ref>

==ARM==
[[ARM]] has ''CLZ'' (Count Leading Zeros) instruction for 32-bit integers. ARM instruction is available in [http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0100i/index.html ARMv5] and above, [https://en.wikipedia.org/wiki/ARM_architecture#Thumb 32-bit Thumb instruction] is available in [http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0100i/index.html ARMv6T2] and [http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0100i/index.html ARMv7] <ref>[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0379a/Cihjgjed.html ARM Information Center > General data processing instructions > CLZ]</ref> , the C-intrinsic is called ''_builtin_clz'' <ref>[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0376a/CJAEJGJD.html ARM Information Center > Instruction intrinsics > __builtin_clz]</ref> <ref>[http://developer.apple.com/mac/library/documentation/DeveloperTools/gcc-4.0.1/gcc/Other-Builtins.html Other Builtins - Using the GNU Compiler Collection (GCC)]</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=31228 Bit Scan (equivalent to ASM instructions bsr and bsf)] by [[Pascal Georges]], [[CCC]], December 24, 2009</ref> .
<span id="EngineSamples"></span>
=Engine Samples=
* [[Amundsen#BitScan|BitScan in Amundsen]]
* [[Chess 0.5#BitScan|BitScan in Chess 0.5]]
* [[CookieCat#BitScan|BitScan in CookieCat]]
* [[Crafty#BitScan|BitScan in Crafty]] (23.5)
* [[Gibbon#BitScan|BitScan in Gibbon]]
* [[Gk#BitScan|BitScan in Gk]]
* [[HeavyChess#BitScan|BitScan in HeavyChess]]
* [[Kurt#BitScan|BitScan in Kurt]]
* [[Murka#BitScan|BitScan in Murka]]
* [[Prophet#BitScan|BitScan in Prophet]]
* [[RedQueen#BitScan|BitScan in RedQueen]]
* [[Spector#BitScan|BitScan in Spector]]
* [[Tucano#BitScan|BitScan in Tucano]]

=See also=
* [[Bitboard Serialization]]
* [[Pablo San Segundo#BITSCAN|BITSCAN]], a [[Cpp#Libraries|C++ library]] for bitstrings by [[Pablo San Segundo]]
* [[Bit-Twiddling]]
* [[De Bruijn Sequence Generator]]
* [[Java-Bitscan]]
* [[Population Count]]

=Publications=
* [[Alan Turing]] ('''1949'''). ''[http://www.computerhistory.org/chess/full_record.php?iid=doc-431614f45472f Alan Turing's Manual for the Ferranti Mk. I]''. transcribed in 2000 by [http://www.panix.com/%7Erst/ Robert Thau], [http://archive.computerhistory.org/projects/chess/related_materials/text/2-1.Ferranti_Mark_1_manual.Turing-Alan/2-1.Ferranti_Mark_1_manual.Turing-Alan.1951.UNIVERSITY_OF_MANCHESTER.062303005.pdf pdf] from [[The Computer History Museum]], 9.4 The position of the most significant digit » [[Ferranti Mark 1]]
* [[Charles Leiserson|Charles E. Leiserson]], [[Harald Prokop]] and [[Keith H. Randall]] ('''1998'''). ''Using de Bruijn Sequences to Index a 1 in a Computer Word'', [http://supertech.csail.mit.edu/papers/debruijn.pdf pdf]
* [[Pablo San Segundo]], [[Ramón Galán]] ('''2005'''). ''[http://www.actapress.com/Abstract.aspx?paperId=18953 Bitboards: A New Approach]''. [http://www.informatik.uni-trier.de/%7Eley/db/conf/aia/aia2005.html#SegundoG05 AIA 2005]
* [[Donald Knuth]] ('''2009'''). ''[http://www-cs-faculty.stanford.edu/%7Eknuth/taocp.html The Art of Computer Programming], Volume 4, Fascicle 1: Bitwise tricks & techniques'', as [http://www-cs-faculty.stanford.edu/%7Eknuth/fasc1a.ps.gz Pre-Fascicle 1a postscript], p. 10
* [http://de.linkedin.com/pub/andreas-stiller/a/381/aa9 Andreas Stiller] ('''2013'''). ''[http://www.heise.de/ct/inhalt/2013/07/186/ Spezialkommando - Bits setzen, abfragen, scannen und mehr]''. [http://www.heise.de/ct/ c't Magazin für Computertechnik] 7/2013, p. 186 (German)

=Forum Posts=
==1996 ...==
* [http://groups.google.com/group/rec.games.chess.computer/browse_frm/thread/bcb03df7630d6274 Bitboards: speeding up FirstOne] by Laurent Desnogues, [[Computer Chess Forums|rgcc]], April 10, 1996 » [[Othello]]
* [http://groups.google.com/group/rec.games.chess.computer/browse_frm/thread/d5dbf08c66e83517# bitboard 2^i mod 67 is unique] by [[Stefan Plenkner]], [[Computer Chess Forums|rgcc]], August 6, 1996
* [http://groups.google.com/group/rec.games.chess.computer/browse_frm/thread/9658009e315021fe# bitboard 2^i mod 67 is unique] by [[Stefan Plenkner]], [[Computer Chess Forums|rgcc]], August 7, 1996
* [http://groups.google.com/group/rec.games.chess.computer/browse_frm/thread/871851f83e2c429f# bitboard 2^i mod 67 is unique] by [[Joël Rivat]], [[Computer Chess Forums|rgcc]], September 2, 1996
* [https://www.stmintz.com/ccc/index.php?id=20057 Question to Bob: Crafty , Alpha and FindBit()] by [[Guido Schimmels]], [[CCC]], June 05, 1998
* [https://www.stmintz.com/ccc/index.php?id=39619 To Nalimov and other programmers about BSF/BSR in VC] by [[Dezhi Zhao]], [[CCC]], January 16, 1999
==2000 ...==
* [https://groups.google.com/group/comp.lang.asm.x86/browse_frm/thread/37efb792689be6b8 Re: TASM 5.0 versus BSF] by [[Frans Morsch]], [https://groups.google.com/group/comp.lang.asm.x86/topics comp.lang.asm.x86], March 28, 2000
* [https://www.stmintz.com/ccc/index.php?id=124638 Will the Itanium have a BSF or BSR instruction?] by Larry Griffiths, [[CCC]], August 15, 2000
: [https://www.stmintz.com/ccc/index.php?id=124712 Re: Will the Itanium have a BSF or BSR instruction?] by [[Eugene Nalimov]], [[CCC]], August 16, 2000
* [https://www.stmintz.com/ccc/index.php?id=134077 Binary question] by [[Severi Salminen]], [[CCC]], October 19, 2000
* [https://www.stmintz.com/ccc/index.php?id=175203 Bitboards and Piece Lists] by [[Dann Corbit]], [[CCC]], June 14, 2001
* [https://www.stmintz.com/ccc/index.php?id=207080 FirstBit() in assembler] by [[David Rasmussen]], [[CCC]], January 13, 2002
* [https://www.stmintz.com/ccc/index.php?id=211203 Reply from Intel about BSF/BSR] by [[Severi Salminen]], [[CCC]], January 31, 2002
* [https://www.stmintz.com/ccc/index.php?id=212586 "Using de Bruijn Sequences to Index a 1 in a Computer Word"] by Oliver Roese, [[CCC]], February 08, 2002
* [https://www.stmintz.com/ccc/index.php?id=265635 Another hacky method for bitboard bit extraction] by [[Walter Faxon]], [[CCC]], November 17, 2002
* [https://www.stmintz.com/ccc/index.php?id=268065 Modulo verus BitScan and MMX-PopCount] by [[Gerd Isenberg]], [[CCC]], November 29, 2002
* [https://www.stmintz.com/ccc/index.php?id=268305 Fast 3DNow! BitScan] by [[Gerd Isenberg]], [[CCC]], December 01, 2002
* [https://www.stmintz.com/ccc/index.php?id=275160 Bitscan Conclusions] by [[Matt Taylor]], [[CCC]], January 05, 2003
* [https://www.stmintz.com/ccc/index.php?id=283655 Bitscan] by [[Matt Taylor]], [[CCC]], February 11, 2003
* [https://www.stmintz.com/ccc/index.php?id=291062 FirstOne for Linux] by [[Sune Fischer]], [[CCC]], March 29, 2003
* [http://groups.google.com/group/comp.lang.asm.x86/browse_frm/thread/de9546cd019bd72b/f46209f47d2a7ddb?lnk=gst&q=Matt+Taylor+magic#f46209f47d2a7ddb Bit magic] by [[Matt Taylor]], [http://groups.google.com/group/comp.lang.asm.x86/topics comp.lang.asm.x86], June 26, 2003
* [https://www.stmintz.com/ccc/index.php?id=339225 Re: De Bruijn Sequence Generator] by [[Dieter Bürssner]], [[CCC]], December 30, 2003 » [[De Bruijn Sequence Generator]]
* [https://www.stmintz.com/ccc/index.php?id=348097 Determining location of LSB/MSB] by [[Jan Renze Steenhuisen|Renze Steenhuisen]], [[CCC]], February 09, 2004
* [https://www.stmintz.com/ccc/index.php?id=388668 Nalimov: bsf/bsr intrinsics implementation still not optimal] by [[Dezhi Zhao]], [[CCC]], September 22, 2004
: [https://www.stmintz.com/ccc/index.php?id=388787 Re: Nalimov: bsf/bsr intrinsics implementation still not optimal] by [[Eugene Nalimov]], [[CCC]], September 23, 2004
==2005 ...==
* [https://www.stmintz.com/ccc/index.php?id=425020 A data point for PowerPC bitboard program authors] by [[Steven Edwards]], [[CCC]], May 09, 2005 » [[PowerPC]]
* [http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=3141 Best BitBoard LSB funktion?] by [[Reinhard Scharnagl]], [[Computer Chess Forums|Winboard Programming Forum]], July 20, 2005
* [http://www.talkchess.com/forum/viewtopic.php?t=14151 Fastest bitboard compress routine when you can't use ASM] by mambofish, [[CCC]], May 31, 2007
* [http://talkchess.com/forum/viewtopic.php?t=29333 Bit twiddling question, part 2: arbitrary bitscan order] by [[Zach Wegner]], [[CCC]], August 11, 2009
* [http://talkchess.com/forum/viewtopic.php?t=29482 32 bit versions for bitscan64] by [[Michael Hoffmann]], [[CCC]], August 21, 2009
* [http://www.talkchess.com/forum/viewtopic.php?t=30342 64-bit intrinsic performance] by [[Nathan Thom]], [[CCC]], October 27, 2009
* [http://www.talkchess.com/forum/viewtopic.php?t=31228 Bit Scan (equivalent to ASM instructions bsr and bsf)] by [[Pascal Georges]], [[CCC]], December 24, 2009
==2010 ...==
* [http://talkchess.com/forum/viewtopic.php?t=32046 bitScanReverse32] by [[Luca Hemmerich]], [[CCC]], January 25, 2010
* [http://www.talkchess.com/forum/viewtopic.php?t=39268 Introduction and (hopefully) contribution - bitboard methods] by [[Alcides Schulz]], [[CCC]], June 03, 2011 » [[Population Count]]
* [http://www.talkchess.com/forum/viewtopic.php?t=45188 Leading Zero Count Question] by [[Matthew R. Brades]], [[CCC]], September 16, 2012
* [http://www.talkchess.com/forum/viewtopic.php?t=46040 Optimizing bitboards for ARM] by [[Martin Sedlak]], [[CCC]], November 17, 2012
* [http://www.talkchess.com/forum/viewtopic.php?t=54704 Symmetric move generation using bitboards] by [[Lasse Hansen]], [[CCC]], December 20, 2014
* [http://www.talkchess.com/forum/viewtopic.php?t=54798 Stockfish 32-bit and hardware instructions on MSVC++] by [[Syed Fahad]], [[CCC]], December 30, 2014 » [[Stockfish]], [[BitScan]], [[Population Count]]
==2015 ...==
* [http://www.talkchess.com/forum/viewtopic.php?t=57400 Fun with De Bruijn] by [[Henk van den Belt]], [[CCC]], August 27, 2015
* [http://www.talkchess.com/forum/viewtopic.php?t=58230&start=3 Re: Linux Version of Maverick 1.5] by [[Michael Dvorkin]], [[CCC]], November 12, 2015 » [[Mac OS|OS X]], [[Maverick]]
* [http://www.talkchess.com/forum/viewtopic.php?t=61559 syzygy users (and Ronald)] by [[Robert Hyatt]], [[CCC]], September 29, 2016 » [[Population Count]]

=External Links=
* [https://en.wikipedia.org/wiki/Find_first_set Find first set from Wikipedia]
* [http://aggregate.org/MAGIC The Aggregate Magic Algorithms] by [[Hank Dietz]]
* [http://graphics.stanford.edu/%7Eseander/bithacks.html Bit Twiddling Hacks] by [http://graphics.stanford.edu/%7Eseander/ Sean Eron Anderson]
* [http://caladan.nanosoft.ca/c4/software/bitsort.php An Efficient Bit-Reversal Sorting Algorithm for the Fast Fourier Transform] by [http://caladan.nanosoft.ca/index.php Jennifer Elaan], January 16, 2005
* [http://www.freepatentsonline.com/6172623.html Efficient bit scan mechanism - United States Patent 6172623] from [http://www.freepatentsonline.com/ FreePatentsOnline.com]
* [[Videos#FrankZappa|Frank Zappa]] & [https://en.wikipedia.org/wiki/The_Mothers_of_Invention the Mothers] - [https://en.wikipedia.org/wiki/King_Kong King Kong] BBC Studio Recording 1968, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
: {{#evu:https://www.youtube.com/watch?v=kk8mPKROUS4|alignment=left|valignment=top}}

=References=
<references />

'''[[Bitboards|Up one Level]]'''

Navigation menu