Changes

Newer edit →

Population Count

39,939 bytes added, 08:17, 4 May 2018

Created page with "'''Home * Board Representation * Bitboards * Population Count''' FILE:Hamming.jpg|border|right|thumb|Visualization of [[Population Count#HammingDistan..."

'''[[Main Page|Home]] * [[Board Representation]] * [[Bitboards]] * Population Count'''

[[FILE:Hamming.jpg|border|right|thumb|Visualization of [[Population Count#HammingDistance|Hamming distance]] <ref>Color of each pixel is [https://en.wikipedia.org/wiki/Hamming_distance Hamming distance] between the binary representations of its x and y coordinates, modulo 16, in the 16-color system, by Josiedraus, June 8, 2007, [https://en.wikipedia.org/wiki/Richard_Hamming Richard Hamming from Wikipedia]</ref> ]]

'''Population count''', 
an operation to determine the [https://en.wikipedia.org/wiki/Cardinality cardinality] of a bitboard, also called [https://en.wikipedia.org/wiki/Hamming_weight Hamming weight] or sideways sum <ref>[https://en.wikipedia.org/wiki/Cryptography Cryptography] is also a significant application of the /R function symbol, which counts the number of one bits in a word; Turing refers to this as the "sideways adder" in his quick-reference summary. from [[Alan Turing]] ('''1949'''). ''[http://www.computerhistory.org/chess/full_record.php?iid=doc-431614f45472f Alan Turing's Manual for the Ferranti Mk. I]''. transcribed in 2000 by [http://www.panix.com/~rst/ Robert Thau], [http://archive.computerhistory.org/projects/chess/related_materials/text/2-1.Ferranti_Mark_1_manual.Turing-Alan/2-1.Ferranti_Mark_1_manual.Turing-Alan.1951.UNIVERSITY_OF_MANCHESTER.062303005.pdf pdf] from [[The Computer History Museum]], 9.4 The position of the most significant digit » [[Ferranti Mark 1]]</ref>. How many one bits exists in a 64-bit computer word? In computer chess, population count is used to [[Evaluation|evaluate]] the [[Mobility|mobility]] of pieces from their [[Attacks|attack sets]], as [[CDC Cyber#Mobility|already applied]] in [[Chess (Program)|Chess 4.6]] on the [[CDC 6600]] and [[CDC Cyber]].

Recent [[x86-64]] processors (since [[AMD]] [https://en.wikipedia.org/wiki/AMD_K10 K10] with [[SSE4#SSE4a|SSE4a]], [[Intel]] [https://en.wikipedia.org/wiki/Nehalem_%28microarchitecture%29 Nehalem] with [[SSE4#SSE4.2|SSE4.2]]) provide a [[x86-64#gpinstructions|64-bit popcount instruction]] <ref>[http://software.intel.com/en-us/articles/extending-the-worlds-most-popular-processor-architecture Extending the World’s Most Popular Processor Architecture] by R.M. Ramanathan, [[Intel]], covers SSE4 and popcnt</ref>, available via [[Cpp|C++]] compiler intrinsic <ref>[https://msdn.microsoft.com/en-us/library/bb531475(v=vs.120).aspx _mm_popcnt_u64]</ref> <ref>[https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html __builtin_popcountll] [[Free Software Foundation#GCC|GCC]] Intrinsic</ref> or [[Assembly#InlineAssembly|inline assembly]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=32227 Stockfish POPCNT support with gcc] by [[Marco Costalba]], [[CCC]], January 31, 2010</ref>. Despite different Intrinsic prototypes (_mm_popcnt_u64 vs. popcnt64), Intel and AMD popcnt instructions are binary compatible, have same encoding (F3 [REX] 0F B8 /r), and both require bit 23 set in RCX of the [https://en.wikipedia.org/wiki/CPUID CPUID] function 0000_0001h. Code samples are in [[C]] / [[Cpp|C++]], ''see [[Bitboards#DefiningBitboards|Defining Bitboards]]''

=Recurrence Relation=
The [[Recursion|recursive]] [https://en.wikipedia.org/wiki/Recurrence_relation recurrence relation] of population counts can be transformed to iteration as well, but lacks an arithmetical sum-formula:
popcnt(0) = 0
popcnt(n) = popcnt(n ÷ 2) + (n mod 2)
However, it is helpful to initialize a [[Population Count#Lookup|lookup table]], i.e. for bytes:
<pre>
unsigned char popCountOfByte256[];

void initpopCountOfByte256()
{
popCountOfByte256[0] = 0;
for (int i = 1; i < 256; i++)
popCountOfByte256[i] = popCountOfByte256[i / 2] + (i & 1);
}
</pre>
=Empty or Single?=
Often one has to deal with sparsely populated or even empty bitboards. To determine whether a bitboard is empty or a single populated power of two value, one may use simple boolean statements rather than a complete population count.


==Empty Bitboards==
To test a bitboard is empty, one compares it with zero, or use the logical not operator:
<pre>
if ( x == 0 ) -> bitboard is empty
if ( !x ) -> bitboard is empty
</pre>
The inverse condition (not empty) is of course
<pre>
if ( x != 0 ) -> bitboard is not empty
if ( x ) -> bitboard is not empty
</pre>


==Single Populated Bitboards==
If the bitboard is not empty, we can [[General Setwise Operations#LS1BSeparation|extract]] the [[General Setwise Operations#TheLeastSignificantOneBitLS1B|LS1B]] to look whether it is equal with the bitboard. Alternatively and faster, we can [[General Setwise Operations#LS1BReset|reset the LS1B]] to look whether the bitboard becomes empty.
<pre>
if ( x != 0 && (x & (x-1)) == 0 ) -> population count is one, power of two value
</pre>
One can skip the leading x != 0 condition to test popcount <= 1:
<pre>
if ( (x & (x-1)) == 0 ) -> population count is less or equal than one
</pre>
Again the inverse relation tests, whether a bitboard has more than one bit set:
<pre>
if ( x & (x-1) ) -> population count is greater than one
</pre>
An alternative approach to determine single populated sets, aka power of two values is based on [[General Setwise Operations#LS1BSeparation|Inclusive LS1B separation]] divided by two equals the ones' decrement <ref>[http://www.jjj.de/fxt/fxtbook.pdf Matters Computational - ideas, algorithms, source code] (pdf) Ideas and Source Code by [[Mathematician#Arndt|Jörg Arndt]], 1.7 Functions related to the base-2 logarithm, function one_bit_q(), pp 18</ref>:
<pre>
if ( ((x ^ (x-1)) >> 1) == (x-1) ) -> population count is one, power of two value
</pre>

=Loop-Approaches=

==Too Slow==
Brute force adding all 64-bits
<pre>
int popCount (U64 x) {
int count = 0;
for (int i = 0; i < 64; i++, x >>= 1)
count += (int)x & 1;
return count;
}
</pre>
Of course, this is a slow algorithm, which might be improved by testing x not empty rather than i < 64. But unrolled in [[Parallel Prefix Algorithms|parallel prefix]] manner it already reminds on [[Population Count#SWARPopcount|this one]].


==Brian Kernighan's way==
Consecutively [[General Setwise Operations#LS1BReset|reset LS1B]] in a loop body and counting loop cycles until the bitset becomes empty. [https://en.wikipedia.org/wiki/Brian_Kernighan Brian Kernighan] <ref>[http://www-graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan Counting bits set, Brian Kernighan's way] from [http://graphics.stanford.edu/%7Eseander/bithacks.html Bit Twiddling Hacks] by [http://graphics.stanford.edu/%7Eseander/ Sean Eron Anderson]</ref> mentioned the trick in his and [https://en.wikipedia.org/wiki/Dennis_Ritchie Ritchie's] book [https://en.wikipedia.org/wiki/The_C_Programming_Language_%28book%29 The C Programming_Language], 2nd Edition 1988, exercise 2-9. However, the method was first published in 1960 by [[Mathematician#PWegner|Peter Wegner]] <ref>[[Mathematician#PWegner|Peter Wegner]] ('''1960'''). ''A technique for counting ones in a binary computer''. [[ACM#Communications|Communications of the ACM]], [http://www.informatik.uni-trier.de/~ley/db/journals/cacm/cacm3.html#Wegner60 Volume 3, 1960]</ref>, discovered independently by [[Mathematician#DHLehmer|Derrick Henry Lehmer]], published in 1964 <ref>[[Mathematician#EFBeckenbach|Edwin Ford Beckenbach]] (editor) ('''1964'''). ''[http://onlinelibrary.wiley.com/doi/10.1002/bimj.19660080310/abstract Applied combinatorial mathematics]'', [https://en.wikipedia.org/wiki/John_Wiley_%26_Sons John Wiley]</ref>:
<pre>
int popCount (U64 x) {
int count = 0;
while (x) {
count++;
x &= x - 1; // reset LS1B
}
return count;
}
</pre>
Despite branches - this is still one of the fastest approaches for sparsely populated bitboards. Of course the more bits that are set, the longer it takes.


=Lookup=
Of course we can not use the whole bitboard as index to a lookup table - since it's size would be 18,446,744,073,709,551,616 bytes! If it is about the population count of a byte, we can simply construct a table lookup with 256 elements. For a bitboard that takes eight byte lookups we can add together:
<pre>
unsigned char popCountOfByte256[];

void initpopCountOfByte256()
{
popCountOfByte256[0] = 0;
for (int i = 1; i < 256; i++)
popCountOfByte256[i] = popCountOfByte256[i / 2] + (i & 1);
}

int popCount (U64 x) {
return popCountOfByte256[ x & 0xff] +
popCountOfByte256[(x >> 8) & 0xff] +
popCountOfByte256[(x >> 16) & 0xff] +
popCountOfByte256[(x >> 24) & 0xff] +
popCountOfByte256[(x >> 32) & 0xff] +
popCountOfByte256[(x >> 40) & 0xff] +
popCountOfByte256[(x >> 48) & 0xff] +
popCountOfByte256[ x >> 56];
}
</pre>
Looks quite expensive - one may use four 16-bit word-lookups with a pre-calculated 64KByte table though, but that pollutes the memory caches quite a bit. One can also treat the bitboard as [[Array|array]] of bytes or words in memory, since endian issues don't care here - that safes all the shifts and 'ands', but has to read byte for byte from memory.
<pre>
int popCount (U64 x) {
unsigned char * p = (unsigned char *) &x;
return popCountOfByte256[p[0]] +
popCountOfByte256[p[1]] +
popCountOfByte256[p[2]] +
popCountOfByte256[p[3]] +
popCountOfByte256[p[4]] +
popCountOfByte256[p[5]] +
popCountOfByte256[p[6]] +
popCountOfByte256[p[7]];
}
</pre>

=SWAR-Popcount=
The [https://en.wikipedia.org/wiki/Divide_and_conquer_algorithm divide and conquer] [[SIMD and SWAR Techniques|SWAR-approach]] deals with counting bits of duos, to aggregate the duo-counts to [[Nibble|nibbles]] and [[Byte|bytes]] inside one 64-bit register in parallel, to finally sum all bytes together. According to [[Donald Knuth]] <ref>[[Donald Knuth]] ('''2009'''). ''[http://www-cs-faculty.stanford.edu/%7Eknuth/taocp.html The Art of Computer Programming], Volume 4, Fascicle 1: Bitwise tricks & techniques'', as [http://www-cs-faculty.stanford.edu/%7Eknuth/fasc1a.ps.gz Pre-Fascicle 1a postscript], Sideways addition, p 11</ref>, a parallel population count routine was already introduced in 1957 due to [[Mathematician#DBGillies|Donald B. Gillies]] and [https://en.wikipedia.org/wiki/J._C._P._Miller Jeffrey C. P. Miller] in the first textbook on programming, second edition: ''The Preparation of Programs for an Electronic Digital Computer'', by [[Mathematician#MVWilkes|Maurice Wilkes]], [[Mathematician#DJWheeler|David Wheeler]] and [https://en.wikipedia.org/wiki/Stanley_Gill Stanley Gill], pages 191–193 <ref>[[Mathematician#MVWilkes|Maurice Wilkes]], [[Mathematician#DJWheeler|David Wheeler]], [https://en.wikipedia.org/wiki/Stanley_Gill Stanley Gill] ('''1951'''). ''The Preparation of Programs for an Electronic Digital Computer''. Addison-Wesley Press; 1st edition, [http://www.amazon.com/preparation-programs-electronic-digital-computer/dp/B0007DWTT0 amazon.com]; 2nd edition 1957, [http://www.amazon.com/preparation-programs-electronic-Addison-Wesley-mathematics/dp/B0006AV1QQ amazon.com]</ref> <ref>[https://en.wikipedia.org/wiki/Electronic_Delay_Storage_Automatic_Calculator Electronic Delay Storage Automatic Calculator from Wikipedia]</ref>.

==Counting Duo-Bits==
A bit-duo (two neighboring bits) can be interpreted with bit 0 = a, and bit 1 = b as
duo := 2b + a
The duo population is
popcnt(duo) := b + a
which can be archived by
(2b + a) - (2b + a) ÷ 2
or
(2b + a) - b
The bit-duo has up to four states with population count from zero to two as demonstrated in following table with binary digits:
{| class="wikitable"
|-
! x
! x div 2
! popcnt(x)
|-
| style="text-align:center;" | 00
| style="text-align:center;" | 00
! style="text-align:center;" | 00
|-
| style="text-align:center;" | 01
| style="text-align:center;" | 00
! style="text-align:center;" | 01
|-
| style="text-align:center;" | 10
| style="text-align:center;" | 01
! style="text-align:center;" | 01
|-
| style="text-align:center;" | 11
| style="text-align:center;" | 01
! style="text-align:center;" | 10
|}
Only the lower bit is needed from x div 2 - and one don't has to worry about borrows from neighboring duos. SWAR-wise, one needs to clear all "even" bits of the div 2 subtrahend to perform a 64-bit subtraction of all 32 duos:
<pre>
x = x - ((x >> 1) & 0x5555555555555555);
</pre>
Note that the popcount-result of the bit-duos still takes two bits.

==Counting Nibble-Bits==
The next step is to add the duo-counts to populations of four neighboring bits, the 16 nibble-counts, which may range from zero to four. SWAR-wise it is done by masking odd and even duo-counts to add them together:
<pre>
x = (x & 0x3333333333333333) + ((x >> 2) & 0x3333333333333333);
</pre>
Note that the popcount-result of the nibbles takes only three bits, since 100B is the maximum population (of the nibble 1111B).

==Byte-Counts==
You already got the idea? Now it is about to get the byte-populations from two nibble-populations. Maximum byte-population of 1000B only takes four bits, so it is safe to mask all those four bits of the sum, rather than to mask the summands:
<pre>
x = (x + (x >> 4)) & 0x0f0f0f0f0f0f0f0f;
</pre>

==Adding the Byte-Counts==
===Parallel Prefix Adds===
We may continue with mask-less [[Parallel Prefix Algorithms|parallel prefix]] SWAR-adds for byte-counts, word-counts and finally double-word-counts, to mask the least significant 8 (or 7) bits for final result in the 0..64 range:
<pre>
x += (x >> 8);
x += (x >> 16);
x += (x >> 32);
return x & 255;
</pre>

===Multiplication===
With todays fast 64-bit multiplication one can multiply the vector of 8-byte-counts with 0x0101010101010101 to get the final result in the most significant byte, which is then shifted right:
<pre>
x = (x * 0x0101010101010101) >> 56;
</pre>

===Casting out===
Interestingly, there is another approach to add the bytes together. As demonstrated with decimal digits (base 10) and [https://en.wikipedia.org/wiki/Casting_out_nines Casting out nines] <ref>[http://www.billthelizard.com/2009/06/casting-out-nines.html Casting Out Nines] by [http://www.billthelizard.com/ Bill the Lizard], June 13, 2009</ref>, casting out by [[General Setwise Operations#Modulo|modulo]] base minus one is equivalent to taking the [https://en.wikipedia.org/wiki/Digit_sum digit sum], which might be applied here with low range 0..8 "base 256" digits:
<pre>
x = x % 255;
</pre>
However, since division and modulo are usually slow instructions and modulo by constant is likely replaced by reciprocal multiplication anyway by the compiler, the multiplication by 0x0101010101010101 aka the 2-[https://en.wikipedia.org/wiki/P-adic_number adic] fraction -1/255 is the preferred method.

==The PopCount routine==
===The Constants===
Putting all together, the various SWAR-Masks and factors as defined by [[Donald Knuth]] as 2-[https://en.wikipedia.org/wiki/P-adic_number adic] fractions <ref>[[Donald Knuth]] ('''2009'''). ''[http://www-cs-faculty.stanford.edu/%7Eknuth/taocp.html The Art of Computer Programming], Volume 4, Fascicle 1: Bitwise tricks & techniques'', as [http://www-cs-faculty.stanford.edu/%7Eknuth/fasc1a.ps.gz Pre-Fascicle 1a postscript], p 9</ref>:
<pre>
const U64 k1 = C64(0x5555555555555555); /* -1/3 */
const U64 k2 = C64(0x3333333333333333); /* -1/5 */
const U64 k4 = C64(0x0f0f0f0f0f0f0f0f); /* -1/17 */
const U64 kf = C64(0x0101010101010101); /* -1/255 */
</pre>
represented as bitboards:
<pre>
k1 -1/3 k2 -1/5 k4 -1/17 kf -1/255
0x5555555555555555 0x3333333333333333 0x0f0f0f0f0f0f0f0f 0x0101010101010101
1 . 1 . 1 . 1 . 1 1 . . 1 1 . . 1 1 1 1 . . . . 1 . . . . . . .
1 . 1 . 1 . 1 . 1 1 . . 1 1 . . 1 1 1 1 . . . . 1 . . . . . . .
1 . 1 . 1 . 1 . 1 1 . . 1 1 . . 1 1 1 1 . . . . 1 . . . . . . .
1 . 1 . 1 . 1 . 1 1 . . 1 1 . . 1 1 1 1 . . . . 1 . . . . . . .
1 . 1 . 1 . 1 . 1 1 . . 1 1 . . 1 1 1 1 . . . . 1 . . . . . . .
1 . 1 . 1 . 1 . 1 1 . . 1 1 . . 1 1 1 1 . . . . 1 . . . . . . .
1 . 1 . 1 . 1 . 1 1 . . 1 1 . . 1 1 1 1 . . . . 1 . . . . . . .
1 . 1 . 1 . 1 . 1 1 . . 1 1 . . 1 1 1 1 . . . . 1 . . . . . . .
</pre>
===popCount===
This is how the complete routine looks in [[C]]:
<pre>
int popCount (U64 x) {
x = x - ((x >> 1) & k1); /* put count of each 2 bits into those 2 bits */
x = (x & k2) + ((x >> 2) & k2); /* put count of each 4 bits into those 4 bits */
x = (x + (x >> 4)) & k4 ; /* put count of each 8 bits into those 8 bits */
x = (x * kf) >> 56; /* returns 8 most significant bits of x + (x<<8) + (x<<16) + (x<<24) + ... */
return (int) x;
}
</pre>

'''Advantage''': no branches, no memory lookups, constant runtime - independent from population
'''Drawback''': dependency chain, not much parallel speedup

''For likely sparsely populated bitboards, the loop-wise [[Population Count#BrianKernighansway|Brian Kernighan's way]] might be the faster one.''

==HAKMEM 169==
A similar technique was proposed by [[Bill Gosper]] et al. from [[Massachusetts Institute of Technology]], as published 1972 in [https://en.wikipedia.org/wiki/HAKMEM Memo 239 (HAKMEM)] <ref>[http://home.pipeline.com/~hbaker1/hakmem/hacks.html#item169 HAKMEM - ITEM 169 To count the ones in a PDP-6/10 word] (in order of one-ups-manship: [[Bill Gosper|Gosper]], Mann, Lenard, [Root and Mann])</ref> <ref>[http://www.cl.cam.ac.uk/~am21/hakmemc.html HAKMEMC -- HAKMEM Programming hacks in C] by [http://www.cl.cam.ac.uk/~am21/ Alan Mycroft]</ref>, to add bit-trio- rather than duo populations consecutively, and the 32 bit version relies on casting out 63. Note that the constants in the code below have [https://en.wikipedia.org/wiki/Octal octal] (base-8) digits, originally written in [[Assembly#HAKMEM169|Assembly]] for the [[PDP-6]] <ref>[[Bill Gosper#HAKMEM169|HAKMEM 169]] for [[PDP-6]]/[[PDP-10]] 36-bit words</ref>. An expanded 64-bit version, casting out 511 or 4095, is slightly less efficient than the binary SWAR version above.
<pre>
int hakmem169_32(unsigned int x) {
x = x - ((x >> 1) & 033333333333)
- ((x >> 2) & 011111111111);
x = (x + (x >> 3)) & 030707070707 ;
return x % 63; /* casting out 63 */
}
</pre>

=Miscellaneous=

==Cardinality of Multiple Sets==
If we like to count [[Array|arrays]] of sets, we can reduce 2^N-1 popcounts to N popcounts, by applying the odd-major-trick. For three sets to count we safe one, with five additional cheap instructions.
<pre>
odd = (x ^ y) ^ z;
major = ((x ^ y ) & z) | (x & y);

popCount(x) + popCount(y) + popCount(z) == 2*popCount(major) + popCount(odd)
</pre>
The combined popCount3 likely gains more parallel speedup, since there are two independent chains to calculate. Possible Application is to pass the union of both bishops (since usually bishops cover disjoint sets due to different square colors) plus the up to two knight move-target sets.
<pre>
// return popCount(x) + popCount(y) + popCount(z)
int popCount3 (U64 x, U64 y, U64 z) {
U64 maj = ((x ^ y ) & z) | (x & y);
U64 odd = ((x ^ y ) ^ z);
maj = maj - ((maj >> 1) & k1 );
odd = odd - ((odd >> 1) & k1 );
maj = (maj & k2) + ((maj >> 2) & k2);
odd = (odd & k2) + ((odd >> 2) & k2);
maj = (maj + (maj >> 4)) & k4;
odd = (odd + (odd >> 4)) & k4;
odd = ((maj + maj + odd) * kf ) >> 56;
return (int) odd;
}
</pre>
===Odd and Major 7-15===
Of course - reducing seven popcount to three, or even 15 popcounts to four sounds even more promising.
For N = 2^n - 1 it takes N - n odd-major pairs. Thus one add-major pair - five instructions - per saved popCount.

That is 7 - 3 = 4 pairs:
<pre>
one1,two1 := oddMaj(x1,x2,x3)
one2,two2 := oddMaj(x4,x5,x6)
ones,two3 := oddMaj(x7,one1,one2)
twos,four := oddMaj(two1,two2,two3)
</pre>
Or 15 - 4 = 11 pairs:
<pre>
one1,two1 := oddMaj(x1,x2,x3)
one2,two2 := oddMaj(x4,x5,x6)
one3,two3 := oddMaj(x7,x8,x9)
one4,two4 := oddMaj(x10,x11,x12)
one5,two5 := oddMaj(x13,x14,x15)
one6,two6 := oddMaj(one1,one2,one3)
ones,two7 := oddMaj(one4,one5,one6)
two8,four1 := oddMaj(two1,two2,two3)
two9,four2 := oddMaj(two4,two5,two6)
twos,four3 := oddMaj(two7,two8,two9)
four,eight := oddMaj(four1,four2,four3)
</pre>


===Odd and Major Digit Counts===
Odd-Major is probably also useful to determine digit count sets of attacks or other stuff:
<pre>
U64 odd(U64 x, U64 y, U64 z) {return x^y^z;}
U64 maj(U64 x, U64 y, U64 z) {return ((x^y)&z)|(x&y);}

void attackCounts(U64 t[3], const U64 s[7]) {
one1 = odd(s[0], s[1], s[2]);
two1 = maj(s[0], s[1], s[2]);
one2 = odd(s[3], s[4], s[5]);
two2 = maj(s[3], s[4], s[5]);
t[0] = odd(s[6], one1, one2);
two3 = maj(s[6], one1, one2);
t[1] = odd(two1, two2, two3);
t[2] = maj(two1, two2, two3);
}
</pre>
with following semantics:
<pre>
exactly7attacks := t[2] & t[1] & t[0]
exactly6attacks := t[2] & t[1] & ~t[0]
exactly5attacks := t[2] & ~t[1] & t[0]
exactly4attacks := t[2] & ~t[1] & ~t[0]
exactly3attacks := ~t[2] & t[1] & t[0]
exactly2attacks := ~t[2] & t[1] & ~t[0]
exactly1attack := ~t[2] & ~t[1] & t[0]
noAttack := ~t[2] & ~t[1] & ~t[0]

atLeast4attacks := t[2]
atLeast2attacks := atLeast4attacks | t[1]
atLeast1attack := atLeast2attacks | t[0]
noAttack := ~atLeast1attack
exactly1attack := atLeast1attack ^ atLeast2attacks
</pre>

==Popcount as log2 of LS1B==
Assuming an architecture has a fast popcount-instruction (but no bitscan). One can isolate the LS1B, decrement it and count the remaining trailing "ones" to perform the logarithm dualis:
<pre>
log2(LS1B) = popCount( LS1B - 1 );
bitIndexOfLS1B(x) = popCount( (x & -x) - 1 );
</pre>

For instance, LS1B is 2^44, decrementing leaves a below LSB1 mask with exactly 44 bits set:
<pre>
0x0000100000000000 0x00000FFFFFFFFFFF
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . 1 . . . 1 1 1 1 . . . .
. . . . . . . . 1 1 1 1 1 1 1 1
. . . . . . . . 1 1 1 1 1 1 1 1
. . . . . . . . 1 1 1 1 1 1 1 1
. . . . . . . . 1 1 1 1 1 1 1 1
. . . . . . . . 1 1 1 1 1 1 1 1
</pre>

==Hamming Distance==
The [https://en.wikipedia.org/wiki/Hamming_distance hamming distance] of two words is defined as the number of corresponding [[General Setwise Operations#ExclusiveOr|different bits]].
<pre>
int hammingDistance (U64 a, U64 b) {return popcnt( a ^ b);}
</pre>
Hamming distance greater than one or two is an important property of codes to detect or even correct one-bit errors.

The minimum and average hamming distance over all [[Zobrist Hashing|Zobrist keys]] was considered as "quality"-measure of the keys. However, as long the minimum hamming distance is greater zero, [https://en.wikipedia.org/wiki/Linear_independence linear independence] (that is a small subset of all keys doesn't xor to zero), is much more important than hamming distance <ref>[https://www.stmintz.com/ccc/index.php?id=200622 Re: About random numbers and hashing] by [[Sven Reichard]], [[CCC]], December 05, 2001</ref>. Maximizing the minimal hamming distance leads to very poor Zobrist keys <ref>[http://www.talkchess.com/forum/viewtopic.php?t=26152 Zobrist key random numbers] by [[Robert Hyatt]] from [[CCC]], January 21, 2009</ref>.

==Weighted PopCount==
For a [[SIMD and SWAR Techniques|SIMD-wise]] kind of weighted population count, see the [[SSE2#SSE2dotproduct|SSE2 dot-product]].

==Pre-calculated Mobility==
Similar to [[Sliding Piece Attacks#AttacksbyOccupancyLookup|Attacks by Occupancy Lookup]] to determine attack sets of sliding pieces, we may use pre-calculated population count or even center-weighted population count as a rough estimate on piece [[Mobility|mobility]] <ref>[http://www.open-aurec.com/wbforum/viewtopic.php?f=4&t=6823 Magic and precomputation] by [[Onno Garms]] from [[Computer Chess Forums|Winboard Programming Forum]], September 23, 2007</ref>. It may not consider subsets of let say safe target squares.

==Piece Attacks Count==
As pointed out by [[Marco Costalba]] <ref>[http://www.talkchess.com/forum/viewtopic.php?t=27820 fast mobility count through hashing] by [[Marco Costalba]] from [[CCC]], May 09, 2009</ref> <ref>[http://www.talkchess.com/forum/viewtopic.php?t=27965 Piece attacks count] by [[Marco Costalba]] from [[CCC]], May 18, 2009</ref>, specialized routines to count the population ([[Mobility]]) of attack sets of [[King|king]], [[Knight|knight]] and line-wise sub-sets of sliding pieces can be done more efficiently than the general [[Population Count#SWARPopcount|SWAR-Popcount]]. This is similar to [[Flipping Mirroring and Rotating]] the whole bitboard versus [[Flipping Mirroring and Rotating#RankFileAndDiagonal|Rank, File and Diagonal]], and is based on mapping the up to eight scattered occupied bits to one byte, to perform a single [[Population Count#Lookup|byte lookup]]. For various mapping techniques, see:
* [[Bitboard Serialization#HashingMultipleBits|Hashing Multiple Bits]] from [[Bitboard Serialization]]
* [[Flipping Mirroring and Rotating#RankFileAndDiagonal|Rank, File and Diagonal]] from [[Flipping Mirroring and Rotating]]
* [[Occupancy of any Line]]

=Popcount in Hardware=
* [[Ferranti Mark 1#Instructions|Ferranti Mark 1]]
* [[CDC 6600#PopulationCount|CDC 6600]]
* [[CDC Cyber#PopulationCount|CDC Cyber]]
* [[SSE4#SSE4.2|SSE4.2]], [[Intel]] [[x86]], [[x86-64]]
* [[SSE4#SSE4a|SSE4a]], [[AMD]] x86, x86-64

=See also=
* [[Assembly#HAKMEM169|Assembly Popcounts]]
* [[Bit-Twiddling]]
* [[General Setwise Operations#GreaterOne|Greater One Sets]] from [[General Setwise Operations]]
* [[Kim Walisch#PopCount|libpopcnt]] by [[Kim Walisch]]
* [[MMX#MMXPopcount|MMX Popcount]]
* [[CDC Cyber#Mobility|Mobility]] in [[Chess (Program)|Chess 4.6]] on the [[CDC Cyber]]
* [[SIMD and SWAR Techniques]]
* [[SSE2#SSE2popcount|SSE2 Population Count]]
* [[SSSE3#PopCount|SSSE3 Population Count]]

=Publications=
==1949 ...==
* [[Alan Turing]] ('''1949'''). ''[http://www.computerhistory.org/chess/full_record.php?iid=doc-431614f45472f Alan Turing's Manual for the Ferranti Mk. I]''. transcribed in 2000 by [http://www.panix.com/~rst/ Robert Thau], [http://archive.computerhistory.org/projects/chess/related_materials/text/2-1.Ferranti_Mark_1_manual.Turing-Alan/2-1.Ferranti_Mark_1_manual.Turing-Alan.1951.UNIVERSITY_OF_MANCHESTER.062303005.pdf pdf] from [[The Computer History Museum]], 9.4 The position of the most significant digit » [[Ferranti Mark 1]]
* [[Mathematician#MVWilkes|Maurice Wilkes]], [[Mathematician#DJWheeler|David Wheeler]], [https://en.wikipedia.org/wiki/Stanley_Gill Stanley Gill] ('''1957'''). ''The Preparation of Programs for an Electronic Digital Computer''. Addison-Wesley Press; 2nd edition, [http://www.amazon.com/preparation-programs-electronic-Addison-Wesley-mathematics/dp/B0006AV1QQ amazon.com], [[Mathematician#DBGillies|Donald B. Gillies]] and [https://en.wikipedia.org/wiki/J._C._P._Miller Jeffrey C. P. Miller] on [[Population Count#SWARPopcount|SWAR-Popcount]], pages 191–193
* [[Mathematician#PWegner|Peter Wegner]] ('''1960'''). ''A technique for counting ones in a binary computer''. [[ACM#Communications|Communications of the ACM]], [http://www.informatik.uni-trier.de/~ley/db/journals/cacm/cacm3.html#Wegner60 Volume 3, 1960]
* [[Mathematician#DAWagner|David A. Wagner]], [[Steven M. Bellovin]] ('''1994'''). ''[http://academiccommons.columbia.edu/catalog/ac:127097 A Programmable Plaintext Recognizer]''. <ref>[http://cryptome.org/jya/sadd.htm Sideways Add / Population Count] by [http://www.couperus.org/ Jitze Couperus] and [[Steven M. Bellovin|Steve Bellovin]] et al., [https://en.wikipedia.org/wiki/C2Net cryptography@c2.net], January 28, 1999</ref>
==2000 ...==
* [http://www.cs.gwu.edu/people/faculty/82 Simon Y. Berkovich], Gennadi M. Lapir, Marilyn Mack ('''2000'''). ''[http://portal.acm.org/citation.cfm?id=362453 A Bit-Counting Algorithm Using the Frequency Division Principle]''. Software, Practice and Experience Vol. 30, No. 14, 2000, pp. 1531-1540
* [http://www.informatik.uni-trier.de/~ley/pers/hd/e/El=Qawasmeh:Eyas.html Eyas El-Qawasmeh] ('''2001'''). ''[http://130.203.133.150/viewdoc/summary;jsessionid=A8ADC0228B8051C3ED8DF344DF5573CD?doi=10.1.1.127.8246 Beating the Popcount].'' International Journal of Information Technology, Singapore, Vol. 9. No. 1
* [[Henry S. Warren, Jr.]] ('''2002'''). ''[http://hackersdelight.org/ Hacker's Delight]''. [https://en.wikipedia.org/wiki/Addison%E2%80%93Wesley Addison-Wesley Professional]
* [http://www.informatik.uni-trier.de/~ley/pers/hd/e/El=Qawasmeh:Eyas.html Eyas El-Qawasmeh], [http://www.yu.edu.jo/fmd/detailsEN.aspx?myID=18327 Wafa'a Al-Qarqaz] ('''2006'''). ''Reducing Lookup Table Size used for Bit-Counting Algorithm''. Computer Science Dept. [https://en.wikipedia.org/wiki/Jordan_University_of_Science_and_Technology Jordan University of Science and Technology], [http://www.uop.edu.jo/csit2006/vol3%20pdf/pg524.pdf pdf]
* [[Henry S. Warren, Jr.]] ('''2007'''). ''The Quest for an Accelerared Population Count''. in [http://www.oreillynet.com/pub/au/36 Andy Oram] & [[Greg Wilson]] (eds.) ('''2007'''). '' Beautiful code: Leading Programmers Explain How They Think''. [https://en.wikipedia.org/wiki/O%27Reilly_Media O'Reilly], [http://www.amazon.com/Beautiful-Code-Leading-Programmers-Practice/dp/0596510047 amazon.com]
* [[Donald Knuth]] ('''2009'''). ''[http://www-cs-faculty.stanford.edu/%7Eknuth/taocp.html The Art of Computer Programming], Volume 4, Fascicle 1: Bitwise tricks & techniques'', as [http://www-cs-faculty.stanford.edu/%7Eknuth/fasc1a.ps.gz Pre-Fascicle 1a postscript], Sideways addition, pp 11
==2010 ...==
* [[Henry S. Warren, Jr.]] ('''2012'''). ''[http://www.informit.com/store/hackers-delight-9780321842688 Hacker's Delight, 2nd Edition]''. [https://en.wikipedia.org/wiki/Addison%E2%80%93Wesley Addison-Wesley Professional], More coverage of population count and counting leading zeros, Array population count
* [http://de.linkedin.com/pub/andreas-stiller/a/381/aa9 Andreas Stiller] ('''2013'''). ''[http://www.heise.de/artikel-archiv/ct/2013/05/180_Spezialkommando Spezialkommando - Intrinsic __popcnt() zählt die Einsen]''. [http://www.heise.de/ct/ c't Magazin für Computertechnik] 5/2013, p. 180 (German)
* [[Wojciech Muła]], [http://dblp.uni-trier.de/pers/hd/k/Kurz:Nathan Nathan Kurz], [https://github.com/lemire Daniel Lemire] ('''2016'''). ''Faster Population Counts Using AVX2 Instructions''. [https://arxiv.org/abs/1611.07612 arXiv:1611.07612] <ref>[https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx512-harley-seal.cpp sse-popcount/popcnt-avx512-harley-seal.cpp at master · WojciechMula/sse-popcount · GitHub]</ref> » [[AVX2]], [[AVX-512]]

=Postings=
==1998 ...==
* [https://www.stmintz.com/ccc/index.php?id=25091 Bean counters Part1] by [[Peter Fendrich]], [[CCC]], August 19, 1998
* [https://www.stmintz.com/ccc/index.php?id=25095 Bean counters Part2] by [[Peter Fendrich]], [[CCC]], August 19, 1998
* [https://www.stmintz.com/ccc/index.php?id=38188 Countbits() Function] by [[Roberto Waldteufel]], [[CCC]], January 03, 1999
* [http://cryptome.org/jya/sadd.htm Sideways Add / Population Count] by [http://www.couperus.org/ Jitze Couperus], [[Steven M. Bellovin|Steve Bellovin]] and [http://de.linkedin.com/in/horns Axel H. Horns], [https://en.wikipedia.org/wiki/C2Net cryptography@c2.net], January 28, 1999 » [[CDC 6600]] <ref>[[Mathematician#DAWagner|David A. Wagner]], [[Steven M. Bellovin]] ('''1994'''). ''[http://academiccommons.columbia.edu/catalog/ac:127097 A Programmable Plaintext Recognizer]''.</ref>
==2000 ...==
* [https://www.stmintz.com/ccc/index.php?id=106644 fast bit counting] by Flemming Rodler, [[CCC]], April 19, 2000
* [https://www.stmintz.com/ccc/index.php?id=106791 Bit counting revisited] by Flemming Rodler, [[CCC]], April 19, 2000
* [https://www.stmintz.com/ccc/index.php?id=106960 PowerPC BitCounting Functions Speed] by [[William Bryant]], [[CCC]], April 20, 2000 » [[PowerPC]]
* [http://groups.google.com/group/comp.lang.c/browse_frm/thread/56af26ce0b48cbcd Counting the number of bits in a 32-bit word] by [[Mathematician#GMarsaglia|George Marsaglia]], [http://groups.google.com/group/comp.lang.c/topics comp.lang.c], December 7, 2000
* [https://www.stmintz.com/ccc/index.php?id=281989 Re: Chezzz 1.0.1 - problem solved - for David Rasmussen] by [[David Rasmussen]], [[CCC]], February 05, 2003 <ref>[http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf AMD Athlon Processor x86 Code Optimization Guide] (pdf) Efficient 64-Bit Population Count Using MMX™ Instructions Page 184</ref>
* [https://www.stmintz.com/ccc/index.php?id=293853 Counting bits] by [[Andreas Herrmann]], [[CCC]], April 17, 2003
* [https://www.stmintz.com/ccc/index.php?id=313807 Hamming distance and lower hash table indexing] by [[Tom Likens]], [[CCC]], September 02, 2003
* [https://www.stmintz.com/ccc/index.php?id=353997 PopCount optimization] by [[Anastasios Milikas|milix]], [[CCC]], March 11, 2004
==2005 ...==
* [https://groups.google.com/group/comp.lang.asm.x86/browse_frm/thread/958e93d748e48121 Population count in SSE2, again] by James Van Buskirk, [https://groups.google.com/group/comp.lang.asm.x86/topics comp.lang.asm.x86], April 12, 2008
* [http://www.talkchess.com/forum/viewtopic.php?t=26542 core2 popcnt] by [[Frank Phillips]], [[CCC]], February 13, 2009
* [http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=267994&t=27965 Piece attacks count] by [[Marco Costalba]], [[CCC]], May 18, 2009 » [[Attack and Defend Maps]]
* [http://talkchess.com/forum/viewtopic.php?t=29269 Bit twiddlement question: greater of two popcounts] by [[Zach Wegner]], [[CCC]], August 06, 2009
==2010 ...==
* [http://www.talkchess.com/forum/viewtopic.php?t=32227 Stockfish POPCNT support with gcc] by [[Marco Costalba]], [[CCC]], January 31, 2010
* [https://groups.google.com/group/comp.lang.asm.x86/browse_frm/thread/6c8b3f09d890af1b Yet another handmade POPCNT] by hopcode, [https://groups.google.com/group/comp.lang.asm.x86/topics comp.lang.asm.x86], January 05, 2011
* [http://www.talkchess.com/forum/viewtopic.php?t=38521 A brief history of the popcnt instruction] by [[Steven Edwards]], [[CCC]], March 22, 2011
* [http://www.talkchess.com/forum/viewtopic.php?t=39268 Introduction and (hopefully) contribution - bitboard methods] by [[Alcides Schulz]], [[CCC]], June 03, 2011 » [[BitScan]]
* [http://www.talkchess.com/forum/viewtopic.php?t=43771 using Popcount and Prefetch with SSE4 hardware support] by [[Engin Üstün]], [[CCC]], May 19, 2012 » [[Memory]], [[SSE4]]
* [http://macechess.blogspot.de/2013/04/64-bits-for-64-squares.html 64 bits for 64 squares ?] by [[Thomas Petzke]], [http://macechess.blogspot.de/ mACE Chess], April 28, 2013
* [http://www.talkchess.com/forum/viewtopic.php?t=54798 Stockfish 32-bit and hardware instructions on MSVC++] by [[Syed Fahad]], [[CCC]], December 30, 2014 » [[Stockfish]], [[BitScan]], [[Population Count]]
==2015 ...==
* [http://www.talkchess.com/forum/viewtopic.php?t=58230&start=3 Re: Linux Version of Maverick 1.5] by [[Michael Dvorkin]], [[CCC]], November 12, 2015 » [[Mac OS|OS X]], [[Maverick]]
* [http://www.talkchess.com/forum/viewtopic.php?t=61559 syzygy users (and Ronald)] by [[Robert Hyatt]], [[CCC]], September 29, 2016 » [[BitScan]]

=External Links=
* [https://en.wikipedia.org/wiki/Hamming_weight Hamming weight from Wikipedia]
* [http://semipublic.comp-arch.net/wiki/Population_count_%28POPCNT%29 Population count (POPCNT) - CompArch]
* [http://www.crazyontap.com/topic.php?TopicId=43382 Crazy On Tap - Secret Opcodes] <ref>[https://en.wikipedia.org/wiki/National_Security_Agency National Security Agency from Wikipedia]</ref>
* [http://kent-vandervelden.blogspot.com/2009/10/counting-bits-population-count-and.html Blender: POPCNT for counting bits]
* [http://home.pipeline.com/~hbaker1/hakmem/hacks.html#item169 HAKMEM - ITEM 169 To count the ones in a PDP-6/10 word] (in order of one-ups-manship: [[Bill Gosper|Gosper]], Mann, Lenard, [Root and Mann]) <ref>Michael Beeler, [[Bill Gosper]] and [https://en.wikipedia.org/wiki/Richard_Schroeppel Rich Schroeppel] ('''1972'''). ''[http://home.pipeline.com/~hbaker1/hakmem/hakmem.html HAKMEM]'', Memo 239, Artificial Intelligence Laboratory, [[Massachusetts Institute of Technology]], made Web-available by [http://home.pipeline.com/~hbaker1/ Henry Baker]</ref>
: [http://www.cl.cam.ac.uk/~am21/hakmemc.html HAKMEMC -- HAKMEM Programming hacks in C] by [http://www.cl.cam.ac.uk/~am21/ Alan Mycroft]
* [http://www.hackersdelight.org/hdcodetxt/pop.c.txt popcount] [[C]] samples from [[Henry S. Warren, Jr.]] ('''2002, 2012'''). ''[[Henry S. Warren, Jr.#HackersDeligh|Hacker's Delight]]''. [https://en.wikipedia.org/wiki/Addison%E2%80%93Wesley Addison-Wesley]
* [http://aggregate.org/MAGIC/#Population%20Count%20%28Ones%20Count The Aggregate Magic Algorithms -Population Count (Ones Count)] by [[Hank Dietz]]
* [http://www-graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive Counting bits set] from [http://graphics.stanford.edu/%7Eseander/bithacks.html Bit Twiddling Hacks] by [http://graphics.stanford.edu/%7Eseander/ Sean Eron Anderson]
* [http://www.necessaryandsufficient.net/2009/04/optimising-bit-counting-using-iterative-data-driven-development/ Optimising Bit Counting using Iterative, Data-Driven Development] from [http://www.necessaryandsufficient.net/ Necessary and Sufficient] by [http://www.necessaryandsufficient.net/author/admin/page/2/ Damien Wintour]
* [http://bits.stephan-brumme.com/countBits.html Count bits set in parallel a.k.a. Population Count] from [http://bits.stephan-brumme.com/ the bit twiddler] by [http://www.stephan-brumme.com/ Stephan Brumme]
* [http://www.strchr.com/crc32_popcnt Benchmarking CRC32 and PopCnt instructions - strchr.com] by Peter Kankowski
* [http://0x80.pl/articles/sse-popcount.html SSSE3: fast popcount] by [[Wojciech Muła]], May 24, 2008 » [[SSSE3]]
* [http://0x80.pl/articles/faster-popcount-for-large-data.html Speeding up bit-parallel population count] by [[Wojciech Muła]], April 13, 2015
* [http://0x80.pl/articles/xop-popcnt.html Population count using XOP instructions] by [[Wojciech Muła]], December 16, 2016 » [[XOP]]
* [https://github.com/WojciechMula/sse-popcount GitHub - WojciechMula/sse-popcount: SIMD (SSE) population count] by [[Wojciech Muła]]
* [https://github.com/kimwalisch/libpopcnt GitHub - kimwalisch/libpopcnt: Fast C/C++ bit population count library] » [[Kim Walisch#PopCount|libpopcnt]] by [[Kim Walisch]]
* [https://en.wikipedia.org/wiki/Census Census from Wikipedia]
* [[Videos#JohnAbercrombie|John Abercrombie]] 4tet - One, one, one + Spring song, [https://de.wikipedia.org/wiki/Subway_(Musikclub) Subway], [https://en.wikipedia.org/wiki/Cologne Cologne], April 12, 1999, [https://en.wikipedia.org/wiki/3sat 3sat] broadcast <ref>[https://de.wikipedia.org/wiki/Ali_Haurand Ali Haurand from Wikipedia.de] (German)</ref>, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
: [[Videos#JohnAbercrombie|John Abercrombie]], [https://en.wikipedia.org/wiki/Bobo_Stenson Bobo Stenson], [[Videos#LarsDanielsson|Lars Danielsson]], [[Videos#JonChristensen|Jon Christensen]]
: {{#evu:https://www.youtube.com/watch?v=pv5e6ZsFIBw|alignment=left|valignment=top}}

=References=
<references />

'''[[Bitboards|Up one Level]]'''

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

Population Count

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools