Changes

Jump to: navigation, search

Parallel Prefix Algorithms

553 bytes added, 00:13, 12 April 2021
no edit summary
'''[[Main Page|Home]] * [[Programming]] * [[Algorithms]] * Parallel Prefix Algorithms'''
[[FILE:parallelprefixsum.png|border|right|thumb|link=http://www.toves.org/books/distalg/|Parallel Prefix Sum <ref>[http://ozark.hendrix.edu/~burch/ [Mathematician#CBurch|Carl Burch]] ('''2009'''). ''[http://www.toves.org/books/distalg/ Introduction to parallel & distributed algorithms]''. On-line Book</ref> ]]
'''Parallel prefix algorithms''' compute all [https://en.wikipedia.org/wiki/Prefix_sum prefixes] of a input sequence in [https://en.wikipedia.org/wiki/Time_complexity#Logarithmic_time logarithmic time], and are topic of various [[SIMD and SWAR Techniques|SIMD and SWAR techniques]] applied to [[Bitboards|bitboards]] <ref>[[#Seealso|See also]]</ref> . This page provides some basics on simple parallel prefix problems, like parity words and [https://en.wikipedia.org/wiki/Gray_code Gray code] with some interesting properties, followed by some theoretical background on more complex parallel prefix problems, like [[Kogge-Stone Algorithm|Kogge-Stone]] by [[Steffan Westcott]], and a comparison of three similar Kogge-Stone routines for different purpose.
==Add==
[[FILE:300px-4_bit_Kogge_Stone_Adder_Example_new4 bit Kogge Stone Adder Example new.png|none|border|text-bottomright|link=https://en.wikipedia.org/wiki/Kogge-Stone_adderthumb|4-bit Kogge-Stone adder]]
The software approach of a [https://en.wikipedia.org/wiki/Kogge-Stone_adder Kogge-Stone] [[Combinatorial Logic#Adder|hardware adder]] <ref>[http://www.aoki.ecei.tohoku.ac.jp/arith/mg/algorithm.html#fsa_pfx Hardware algorithms for arithmetic modules] from the ARITH research group, Aoki lab., [https://en.wikipedia.org/wiki/Tohoku_University Tohoku University]</ref> , requires the initial carry bits as generator, thus the [[General Setwise Operations#Intersection|intersection]] of both summands, while the propagator is the modulo 2 sum, the [[General Setwise Operations#ExclusiveOr|exclusive or]] of both summands. To avoid inter byte overflows, the propagator has the least significant bits of each byte cleared. However the final carries need one further masked shift and the modulo two sum with the initial propagator.
<pre>
// SIMD bytewise add a + b
U64 koggeStoneByteAdd(U64 a, U64 b) {
const U64 aFile = C64(0x0101010101010101);
U64 gen, pro;
gen = a & b;
// SIMD bytewise sub a - b
U64 koggeStoneByteSub(U64 a, U64 b) {
const U64 aFile = C64(0x0101010101010101);
U64 gen, pro;
gen = a & ~b; // based on -b = ~b + 1
<pre>
U64 eastAttacks(U64 occ, U64 rooks) {
const U64 aFile = C64(0x0101010101010101);
U64 gen, pro;
pro = ~occ & ~aFile;
}
</pre>
The mentioned routines also demonstrate, that this east attacking direction may cheaper implemented with [[Fill by Subtraction]] based on [[SIMD and SWAR Techniques#SWAR|SWAR-wise techniques]] and [[Subtracting a Rook from a blocking Blocking Piece|subtracting a rook from a blocking piece]]:
<pre>
U64 byteAdd(U64 a, U64 b) {return ((a & ~hFile) + (b & ~hFile)) ^ ((a ^ b) & hFile);}
=Publications=
* [http://www.cse.nd.edu/%7Ekogge/ [Mathematician#PMKogge|Peter M. Kogge]], [[Mathematician#HSStone|Harold S. Stone]] ('''19731972'''). ''[https://ntrl.ntis.gov/NTRL/dashboard/searchResults/titleDetail/PB212080.xhtml A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations]''. IEEE Transactions on Computers, 1973, C-22, 783-791Technical Report 25, [http://bwrc.eecs.berkeley.edu/Classes/ee225c/2000%20225c/Papers/KoggeStone.pdf pdf[Stanford University]]* [http://www[Mathematician#PMKogge|Peter M.cs.washington.edu/homes/ladner/ Richard E. LadnerKogge], [http://cs-www.cs.yale.edu/homes/fischer/ Michael J. Fischer] ('''19801973'''). ''Parallel Prefix ComputationSolution of Recurrence Problems''. Ph.D. thesis, [http[Stanford University]], advisor [[Mathematician#HSStone|Harold S. Stone]], [https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=5EBA05F3F708ABC617EE34DD08B65485?doi=10.1.1.10684.62472724&rep=rep1&type=pdf pdf] via * [[Mathematician#PMKogge|Peter M. Kogge]], [[httpMathematician#HSStone|Harold S. Stone]] ('''1973'''). ''[https://citeseerxieeexplore.istieee.psu.eduorg/viewdocdocument/summary?doi=105009159 A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations]''.1[[IEEE#TOC|IEEE Transactions on Computers]], Vol.1.106C-22, No.6247 CiteSeerX]8* [[Clyde KruskalMathematician#RELadner|Richard E. Ladner]], [http[Mathematician#MJFischer|Michael J. Fischer]] ('''1980'''). ''[https://necsidl.eduacm.org/doi/facultyabs/rudolph10.1145/322217.322232?download=true Parallel Prefix Computation]''. [[ACM#Journal|Journal of the ACM]], Vol.html 27, No. 4* [[Clyde Kruskal]], [[Mathematician#LRudolph|Larry Rudolph]], [http://cs.illinois.edu/people/faculty/marc-snir [Mathematician#MSnir|Marc Snir] ] ('''1985'''). ''The power of parallel prefix''. [https://enwww.wikipediacomputer.org/wikicsdl/IEEE_Transactions_on_Computers journal/tc/1985/10/06312202/13rRUy0HYQl The power of parallel prefix]''. [[IEEE#TOC|IEEE Transactions on Computers]], Vol. C-34, No. 10
* [[Peter Sanders]], [http://www.traff-industries.de/ Jesper Larsson Träff] ('''2006'''). ''Parallel Prefix (Scan) Algorithms for MPI''. in EuroPVM/MPI 2006, LNCS, [http://algo2.iti.uni-karlsruhe.de/sanders/papers/scan.pdf pdf]
* [http://ozark.hendrix.edu/~burch/ [Mathematician#CBurch|Carl Burch]] ('''2009'''). ''[http://www.toves.org/books/distalg/ Introduction to parallel & distributed algorithms]''. On-line Book
=Forum Posts=
* [https://www.stmintz.com/ccc/index.php?id=252289 flood fill attack bitboards] by [[Steffan Westcott]] from [[Computer Chess Forums|CCC]], September 15, 2002
* [http://www.talkchess.com/forum/viewtopic.php?start=0&t=25979&start=10 Re: Hyperbola Quiesscene: hardly any improvement] by [[Karlo Balla|Karlo Bala Jr.]], [[CCC]], January 14, 2009 » [[Hyperbola Quintessence]]
=External Links=
* [http://everything2.com/title/parallel+prefix+algorithm parallel prefix algorithm] from [http://everything2.com/ Everything2.com]
* [http://charm.cs.uiuc.edu/tutorial/ParallelPrefix.htm Charm++ Tutorial - Parallel Prefix Program]
* [[Videos#JoeZawinul:Category:Joe Zawinul|Joe Zawinul]], [[Videos#TrilokGurtu:Category:Trilok Gurtu|Trilok Gurtu]] - Orient Express Part1, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
: {{#evu:https://www.youtube.com/watch?v=922LumI2ilo|alignment=left|valignment=top}}
'''[[Algorithms|Up one Level]]'''
[[Category:Trilok Gurtu]]
[[Category:Joe Zawinul]]

Navigation menu