Home * Board Representation * Bitboards * Sliding Piece Attacks * Kogge-Stone Algorithm

4-bit Kogge-Stone adder ^[1]

The Kogge-Stone Algorithm for set-wise sliding piece attack generation was first introduced by Steffan Westcott in CCC ^[2] . It is a parallel prefix approach of a occluded dumb7 flood-fill, propagating sliding piece attacks like carries of a Kogge-Stone hardware adder ^[3] in software. One needs to pass sliding pieces as generator set "g" and the set of empty squares as propagator set "p". For appropriate attacks we need to shift the occluded fill one step further, considering wraps.

Code samples and bitboard diagrams rely on Little endian file and rank mapping.

Parallel Prefix

The routine fillUpOccluded() smears the set bits of bitboard g upwards, but only along set bits of p; a reset bit in p is enough to halt a smear. Other routines have a similar effect.

U64 fillUpOccluded(U64 g, U64 p) {
   g |= p & (g <<  8);
   p &=     (p <<  8);
   g |= p & (g << 16);
   p &=     (p << 16);
   g |= p & (g << 32);
   return g;
}

The method chosen in FillUpOccluded() is based on a Kogge-Stone parallel prefix network, because it can be implemented very easily in software. The diagram below (trust me, it really _is_ supposed to look like that) is an illustration of how it works. The corresponding lines of program code are shown on the right. The inputs are fed into the network at the top, pass along the connecting lines, are combined by the # operator at various points, and the outputs appear at the bottom.

x1 x2 x3 x4 x5 x6 x7 x8         Input : g, p
|  |  |  |  |  |  |  |
V  V  V  V  V  V  V  V
|  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |
|\ |\ |\ |\ |\ |\ |\ |
| \| \| \| \| \| \| \|
|  #  #  #  #  #  #  #          g |= p & (g <<  8);
|  |  |  |  |  |  |  |          p &=     (p <<  8);
|\ |\ |\ |\ |\ |\ |  |
| \: \: \: \: \: \:  |
|  \  \  \  \  \  \  |
|  :\ :\ :\ :\ :\ :\ |
|  | \| \| \| \| \| \|
|  |  #  #  #  #  #  #          g |= p & (g << 16);
|  |  |  |  |  |  |  |          p &=     (p << 16);
|\ |\ |\ |\ |  |  |  |
| \: \: \: \:  |  |  |
|  \  \  \  \  |  |  |
|  :\ :\ :\ :\ |  |  |
|  | \: \: \: \:  |  |
|  |  \  \  \  \  |  |
|  |  :\ :\ :\ :\ |  |
|  |  | \: \: \: \:  |
|  |  |  \  \  \  \  |
|  |  |  ;\ ;\ :\ :\ |
|  |  |  | \| \| \| \|
|  |  |  |  #  #  #  #          g |= p & (g << 32);
|  |  |  |  |  |  |  |
|  |  |  |  |  |  |  |
V  V  V  V  V  V  V  V
q1 q2 q3 q4 q5 q6 q7 q8         Output : g

Direction-wise Fill

We further rely on the compass rose to identify ray-directions.

  noWe         nort         noEa
          +7    +8    +9
              \  |  /
  west    -1 <-  0 -> +1    east
              /  |  \
          -9    -8    -7
  soWe         sout         soEa

Assuming little-endian file mapping. Big-endian file mapping has to swap A and H and east and west. As a reminder - shifting bitboards - the base of further stuff.

Fill on an empty Board

We already used the south- and north- parallel prefix fill-routines while calculating pawn spans. We need one additional direction step for ray-attacks on the otherwise empty board to exclude sliders. Convenient to initialize ray-attacks arrays of single sliders. We may conduct those routines from the general occluded fills by passing the universal propagator set. The vertical fills already look familiar.

U64 soutFill(U64 gen) {
   gen |= (gen >>  8);
   gen |= (gen >> 16);
   gen |= (gen >> 32);
   return gen;
}

U64 nortFill(U64 gen) {
   gen |= (gen <<  8);
   gen |= (gen << 16);
   gen |= (gen << 32);
   return gen;
}

Using explicit propagators as compile time constants to manage the A-, H-file-wraps.

U64 eastFill(U64 gen) {
   const U64 pr0 = notAFile;
   const U64 pr1 = pr0 & (pr0 << 1);
   const U64 pr2 = pr1 & (pr1 << 2);
   gen |= pr0 & (gen  << 1);
   gen |= pr1 & (gen  << 2);
   gen |= pr2 & (gen  << 4);
   return gen;
}

U64 noEaFill(U64 gen) {
   const U64 pr0 = notAFile;
   const U64 pr1 = pr0 & (pr0 <<  9);
   const U64 pr2 = pr1 & (pr1 << 18);
   gen |= pr0 & (gen <<  9);
   gen |= pr1 & (gen << 18);
   gen |= pr2 & (gen << 36);
   return gen;
}

U64 soEaFill(U64 gen) {
   const U64 pr0 = notAFile;
   const U64 pr1 = pr0 & (pr0 >>  7);
   const U64 pr2 = pr1 & (pr1 >> 14);
   gen |= pr0 & (gen >>  7);
   gen |= pr1 & (gen >> 14);
   gen |= pr2 & (gen >> 28);
   return gen;
}

U64 westFill(U64 gen) {
   const U64 pr0 = notHFile;
   const U64 pr1 = pr0 & (pr0 >> 1);
   const U64 pr2 = pr1 & (pr1 >> 2);
   gen |= pr0 & (gen >> 1);
   gen |= pr1 & (gen >> 2);
   gen |= pr2 & (gen >> 4);
   return gen;
}

U64 soWeFill(U64 gen) {
   const U64 pr0 = notHFile;
   const U64 pr1 = pr0 & (pr0 >>  9);
   const U64 pr2 = pr1 & (pr1 >> 18);
   gen |= pr0 & (gen >>  9);
   gen |= pr1 & (gen >> 18);
   gen |= pr2 & (gen >> 36);
   return gen;
}

U64 noWeFill(U64 gen) {
   const U64 pr0 = notHFile;
   const U64 pr1 = pr0 & (pr0 <<  7);
   const U64 pr2 = pr1 & (pr1 << 14);
   gen |= pr0 & (gen <<  7);
   gen |= pr1 & (gen << 14);
   gen |= pr2 & (gen << 28);
   return gen;
}

Occluded Fill

Occluded fills include sliders, but exclude blockers ^[4].

U64 soutOccl(U64 gen, U64 pro) {
   gen |= pro & (gen >>  8);
   pro &=       (pro >>  8);
   gen |= pro & (gen >> 16);
   pro &=       (pro >> 16);
   gen |= pro & (gen >> 32);
   return gen;
}

U64 nortOccl(U64 gen, U64 pro) {
   gen |= pro & (gen <<  8);
   pro &=       (pro <<  8);
   gen |= pro & (gen << 16);
   pro &=       (pro << 16);
   gen |= pro & (gen << 32);
   return gen;
}

U64 eastOccl(U64 gen, U64 pro) {
   pro &= notAFile;
   gen |= pro & (gen << 1);
   pro &=       (pro << 1);
   gen |= pro & (gen << 2);
   pro &=       (pro << 2);
   gen |= pro & (gen << 4);
   return gen;
}

U64 noEaOccl(U64 gen, U64 pro) {
   pro &= notAFile;
   gen |= pro & (gen <<  9);
   pro &=       (pro <<  9);
   gen |= pro & (gen << 18);
   pro &=       (pro << 18);
   gen |= pro & (gen << 36);
   return gen;
}

U64 soEaOccl(U64 gen, U64 pro) {
   pro &= notAFile;
   gen |= pro & (gen >>  7);
   pro &=       (pro >>  7);
   gen |= pro & (gen >> 14);
   pro &=       (pro >> 14);
   gen |= pro & (gen >> 28);
   return gen;
}

U64 westOccl(U64 gen, U64 pro) {
   pro &= notHFile;
   gen |= pro & (gen >> 1);
   pro &=       (pro >> 1);
   gen |= pro & (gen >> 2);
   pro &=       (pro >> 2);
   gen |= pro & (gen >> 4);
   return gen;
}

U64 soWeOccl(U64 gen, U64 pro) {
   pro &= notHFile;
   gen |= pro & (gen >>  9);
   pro &=       (pro >>  9);
   gen |= pro & (gen >> 18);
   pro &=       (pro >> 18);
   gen |= pro & (gen >> 36);
   return gen;
}

U64 noWeOccl(U64 gen, U64 pro) {
   pro &= notHFile;
   gen |= pro & (gen <<  7);
   pro &=       (pro <<  7);
   gen |= pro & (gen << 14);
   pro &=       (pro << 14);
   gen |= pro & (gen << 28);
   return gen;
}

Ray-wise Attacks

Ray-wise attacks need the occluded fills shift one further, considering wraps.

U64 soutAttacks (U64 rooks,   U64 empty) {return soutOne(soutOccl(rooks,   empty));}
U64 nortAttacks (U64 rooks,   U64 empty) {return nortOne(nortOccl(rooks,   empty));}
U64 eastAttacks (U64 rooks,   U64 empty) {return eastOne(eastOccl(rooks,   empty));}
U64 noEaAttacks (U64 bishops, U64 empty) {return noEaOne(noEaOccl(bishops, empty));}
U64 soEaAttacks (U64 bishops, U64 empty) {return soEaOne(soEaOccl(bishops, empty));}
U64 westAttacks (U64 rooks,   U64 empty) {return westOne(westOccl(rooks,   empty));}
U64 soWeAttacks (U64 bishops, U64 empty) {return soWeOne(soWeOccl(bishops, empty));}
U64 noWeAttacks (U64 bishops, U64 empty) {return noWeOne(noWeOccl(bishops, empty));}

Generalized Rays

Since rotate 64 works like a generalized shift with positive or negative shift amount, it might be applied to get pawn-attacks for both sides - or a Kogge-Stone fill with a direction parameter and small lookups for shift amount and wrap ands, instead of multiple code for eight directions. Of course generalized shift will be a bit slower due to lookups and using cl as the shift amount register.

U64 occludedFill (U64 gen, U64 pro, int dir8)
{
   int r = shift[dir8]; // {+-1,7,8,9}
   pro &= avoidWrap[dir8];
   gen |= pro & rotateLeft(gen, r);
   pro &=       rotateLeft(pro, r);
   gen |= pro & rotateLeft(gen, 2*r);
   pro &=       rotateLeft(pro, 2*r);
   gen |= pro & rotateLeft(gen, 4*r);
   return gen;
}

U64 shiftOne (U64 b, int dir8)
{
   int r = shift[dir8]; // {+-1,7,8,9}
   return rotateLeft(b, r) & avoidWrap[dir8];
}

U64 slidingAttacks (U64 sliders, U64 empty, int dir8)
{
   U64 fill = occludedFill(slider, empty, dir8)
   return shiftOne(fill, dir8);
}

// positve left, negative right shifts
int shift[8] = {9, 1,-7,-8,-9,-1, 7, 8};

U64 avoidWrap[8] =
{
   0xfefefefefefefe00,
   0xfefefefefefefefe,
   0x00fefefefefefefe,
   0x00ffffffffffffff,
   0x007f7f7f7f7f7f7f,
   0x7f7f7f7f7f7f7f7f,
   0x7f7f7f7f7f7f7f00,
   0xffffffffffffff00,
};

Publications

Peter M. Kogge, Harold S. Stone (1972). A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations. Technical Report 25, Stanford University
Peter M. Kogge (1973). Parallel Solution of Recurrence Problems. Ph.D. thesis, Stanford University, advisor Harold S. Stone, pdf
Peter M. Kogge, Harold S. Stone (1973). A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations. IEEE Transactions on Computers, Vol. C-22, No. 8

Forum Posts

Re: flood fill attack bitboards by Steffan Westcott, CCC, September 15, 2002
Kogge Stone Algorithm mistake on chessprogramming Wiki by Christopher Conkie, CCC, June 29, 2008
Re: Hyperbola Quiesscene: hardly any improvement by Karlo Bala Jr., CCC, January 14, 2009 » Hyperbola Quintessence
Kogge Stone, Vector Based by Srdja Matovic, CCC, January 22, 2013 » GPU ^[5] ^[6]
Fast perft on GPU (upto 20 Billion nps w/o hashing) by Ankan Banerjee, CCC, June 22, 2013 » Perft, GPU

External Links

Kogge-Stone adder from Wikipedia
Kogge-Stone algorithm from Terra Help and Information - Theories by Peter Fendrich
Gong - Shapeshifter ^[7], 1992, YouTube Video

References

↑ Kogge-Stone adder from Wikipedia
↑ Re: flood fill attack bitboards by Steffan Westcott, CCC, September 15, 2002
↑ Hardware algorithms for arithmetic modules from the ARITH research group, Aoki lab., Tohoku University
↑ Kogge Stone Algorithm mistake on chessprogramming Wiki by Christopher Conkie, CCC, June 29, 2008
↑ Parallel Thread Execution from Wikipedia
↑ NVIDIA Compute PTX: Parallel Thread Execution, ISA Version 1.4, March 31, 2009, pdf
↑ Shapeshifting from Wikipedia

Up one Level

[1] Kogge-Stone adder from Wikipedia

[2] Re: flood fill attack bitboards by Steffan Westcott, CCC, September 15, 2002

[3] Hardware algorithms for arithmetic modules from the ARITH research group, Aoki lab., Tohoku University

[4] Kogge Stone Algorithm mistake on chessprogramming Wiki by Christopher Conkie, CCC, June 29, 2008

[5] Parallel Thread Execution from Wikipedia

[6] NVIDIA Compute PTX: Parallel Thread Execution, ISA Version 1.4, March 31, 2009, pdf

[7] Shapeshifting from Wikipedia

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Kogge-Stone Algorithm

Contents

Parallel Prefix

Direction-wise Fill

Fill on an empty Board

Occluded Fill

Ray-wise Attacks

Generalized Rays

See also

Publications

Forum Posts

External Links

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools