Changes

Jump to: navigation, search

SIMD and SWAR Techniques

1,513 bytes added, 19:59, 18 November 2020
no edit summary
[[x86]], [[x86-64]], as well as [[PowerPC#G4|PowerPC]] and [https://en.wikipedia.org/wiki/Power_Architecture#Power_ISA_v.2.03 Power ISA v.2.03] processors provide '''Single Instructions''' on '''Multiple Data''' (SIMD), namely on [[Array|vectors]] of [[Float|floats]], [[Double|doubles]] or various integers, [[Byte|bytes]], [[Word|words]], [[Double Word|double words]] or [[Quad Word|quad words]], available through assembly and compiler intrinsics. SIMD-applications related to computer chess cover [[Bitboards|bitboard]] computations and [[Fill Algorithms|fill-algorithms]] like [[Dumb7Fill]] and [[Kogge-Stone Algorithm]], as well as [[Evaluation|evaluation]] related stuff, like this [[SSE2#SSE2dotproduct|SSE2 dot-product]] of 64 bits by a vector of 64 bytes.
'''SWAR''' as acronym for SIMD Within A Register was coined by [[Hank Dietz]] and Randy '''Randell J. Fisher ''' <ref>[http://www.aggregate.org/SWAR/ The Aggregate: SWAR, SIMD Within A Register] by [[Hank Dietz]]</ref> . It is a processing model which applies SIMD parallel processing across sections of a CPU register, often vectors of smaller than byte-entities are processed in [[Parallel Prefix Algorithms|parallel prefix]] manner.
=SIMD Instruction Sets=
* [[AltiVec]] on [[PowerPC#G4|PowerPC G4]], [[PowerPC#G5|PowerPC G5]]
* [[ARM NEON]]
* [[ARM Helium]]
* [[AVX]] by [[Intel]]
* [[AVX2]] by [[Intel]]
}
</pre>
=See also=
* [[GPU]]
* [[NNUE]]
* [[Parallel Prefix Algorithms]]
 
=Publications=
==1987 ...==
* [[Alan H. Bond]] ('''1987'''). ''Broadcasting Arrays - A Highly Parallel Computer Architecture Suitable For Easy Fabrication''. [http://www.exso.com/bc.pdf pdf]
* [[Mathematician#GEBlelloch|Guy E. Blelloch]] ('''1990'''). ''[https://dl.acm.org/citation.cfm?id=91254 Vector Models for Data-Parallel Computing]''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press], [https://www.cs.cmu.edu/~guyb/papers/Ble90.pdf pdf]
* [https://dblp.uni-trier.de/pers/f/Fisher:Randall_J=.html Randell J. Fisher], [[Hank Dietz]] ('''1998'''). ''[https://link.springer.com/chapter/10.1007/3-540-48319-5_19 Compiling for SIMD Within a Register]''. [https://dblp.uni-trier.de/db/conf/lcpc/lcpc1998.html LCPC 1998], [https://link.springer.com/chapter/10.1007/3-540-48319-5_19 pdf]
* [https://www.linkedin.com/in/tom-thompson-500bb7b Tom Thompson] ('''1999'''). ''[http://www.mactech.com/articles/mactech/Vol.15/15.07/AltiVecRevealed/index.html AltiVec Revealed]''. [http://www.mactech.com/ MacTech], Vol. 15, No. 7
==2000 ...==* [https://dblp.uni-trier.de/pers/f/Fisher:Randall_J=.html Randell J. Fisher] ('''2003'''). ''[https://docs.lib.purdue.edu/dissertations/AAI3108343/ General-Purpose SIMD Within A Register: Parallel Processing on Consumer Microprocessors]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Purdue_University Purdue University], advisor [[Hank Dietz]], [http://aggregate.org/SWAR/Dis/dissertation.pdf pdf]* [[Daisuke Takahashi]] ('''2007'''). ''[httphttps://wwwlink.springerlinkspringer.com/contentchapter/10.1007/6481607675376w42978-3-540-75755-9_135/ An Implementation of Parallel 1-D FFT Using SSE3 Instructions on Dual-Core Processors]''. Proc. Workshop on State-of-the-Art in Scientific and Parallel Computing, [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science], No. 4699, [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]
* [[Daisuke Takahashi]] ('''2008'''). ''Implementation and Evaluation of Parallel FFT Using SIMD Instructions on Multi-Core Processors''. Proc. 2007 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems
* [https://www.researchgate.net/profile/Nicolas_Fritz2 Nicolas Fritz] ('''2009'''). ''SIMD Code Generation in Data-Parallel Programming''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Saarland_University Saarland University], [http://scidok.sulb.uni-saarland.de/volltexte/2009/2563/pdf/Dissertation_9229_Frit_Nico_2009.pdf?q=ibms-cell-processor pdf]
==2010 ...==
* [https://www.rrze.fau.de/wir-ueber-uns/organigramm/mitarbeiter/index.shtml/georg-hager.shtml Georg Hager] <ref>[https://blogs.fau.de/hager/ Georg Hager's Blog | Random thoughts on High Performance Computing]</ref>, [http://dblp.uni-trier.de/pers/hd/t/Treibig:Jan Jan Treibig], [http://dblp.uni-trier.de/pers/hd/w/Wellein:Gerhard Gerhard Wellein] ('''2013'''). ''The Practitioner's Cookbook for Good Parallel Performance on Multi- and Many-Core Systems''. [https://de.wikipedia.org/wiki/Regionales_Rechenzentrum_Erlangen RRZE], [http://sc13.supercomputing.org/ SC13], [https://blogs.fau.de/hager/files/2013/11/sc13_tutorial_134.pdf slides as pdf]
* [https://scholar.google.com/citations?user=4Ab_NBkAAAAJ&hl=en Kaixi Hou], [[Hao Wang]], [http://dblp.uni-trier.de/pers/hd/f/Feng:Wu=chun Wu-chun Feng] ('''2015'''). ''ASPaS: A Framework for Automatic SIMDIZation of Parallel Sorting on x86-based Many-core Processors''. [http://dblp.uni-trier.de/db/conf/ics/ics2015.html#HouWF15 ICS2015],
=Forum Posts=
==1999==* [https://www.stmintz.com/ccc/index.php?id=71754 G4 & AltiVec] by [[Will Singleton]], [[CCC]], October 04, 1999» [[AltiVec]], [[PowerPC #G4|PowerPC G4]]==2000 ...==
* [http://www.talkchess.com/forum/viewtopic.php?t=23860 Superlinear interpolator: a nice novelity ?] by [[Marco Costalba]], [[CCC]], September 20, 2008 » [[Tapered Eval]]
* [http://www.talkchess.com/forum/viewtopic.php?p=301746#301746 Re: talk about IPP's evaluation] by [[Richard Vida]], [[CCC]], November 07, 2009 » [[Ippolit]], [[Tapered Eval]]
==2010 ...==
* [http://www.talkchess.com/forum/viewtopic.php?t=38523 My experience with Linux/GCC] by [[Richard Vida]], [[CCC]], March 23, 2011 » [[C]], [[Linux]], [[Tapered Eval]]
* [http://www.talkchess.com/forum/viewtopic.php?t=39916&start=1 Re: Utilizing Architecture Specific Functions from a HL Language] by [[Wylie Garvin]], [[CCC]], July 31, 2011
* [http://www.talkchess.com/forum/viewtopic.php?t=42054 two values in one integer] by [[Pierre Bokma]], [[CCC]], January 18, 2012
* [http://www.talkchess.com/forum/viewtopic.php?t=59820 Pigeon now using opportunistic SIMD] by [[Stuart Riffle]], [[CCC]], April 11, 2016 » [[Pigeon]]
* [http://www.talkchess.com/forum/viewtopic.php?t=61850 couple of questions about stockfish code ?] by [[Mahmoud Uthman]], [[CCC]], October 26, 2016 » [[Stockfish]], [[Tapered Eval]]
==2020 ...==
* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73126 SIMD methods in TT probing and replacement] by [[Harm Geert Muller]], [[CCC]], February 20, 2020 » [[Transposition Table]]
* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=75862 CPU Vector Unit, the new jam for NNs...] by [[Srdja Matovic]], [[CCC]], November 18, 2020 » [[NNUE]]
=External Links=
* [https://en.wikipedia.org/wiki/SWAR SWAR from Wikipedia]
* [http://www.aggregate.org/SWAR/ The Aggregate: SWAR, SIMD Within A Register] by [[Hank Dietz]]
* ==[[http://andythomason.com/lecture_notes/agp/agp_simd_programming.html Advanced game programing | Session 4 - Math libraries and SIMDx86]] from [http://andythomason.com/lecture_notes/ Game programming lecture notes] by [[Andy Thomasonx86-64]]==x86==
* [https://en.wikipedia.org/wiki/MMX_%28instruction_set%29 MMX from Wikipedia]
* [https://en.wikipedia.org/wiki/3DNow 3DNow! from Wikipedia]
* [https://en.wikipedia.org/wiki/SSSE3 SSSE3 from Wikipedia]
* [https://en.wikipedia.org/wiki/SSE4 SSE4 from Wikipedia]
* [httphttps://de.wikipedia.org/wiki/SSE4a SSE4a from Wikipedia.de]
* [https://en.wikipedia.org/wiki/SSE5 SSE5 from Wikipedia]
* [https://en.wikipedia.org/wiki/XOP_instruction_set XOP instruction set from Wikipedia]
* [http://developer.amd.com/cpu/Libraries/sseplus/Pages/default.aspx SSEPlus Project] from [http://developer.amd.com/pages/default.aspx AMD Developer Central]
* [http://sseplus.sourceforge.net/index.html SSEPlus Project Documentation]
==Other SIMD== * [httphttps://wwwdeveloper.arm.com/productsarchitectures/multimediainstruction-sets/simd-isas/neon/index.html ARM NEON TechnologySIMD ISAs | Neon – Arm Developer]
* [https://en.wikipedia.org/wiki/ARM_architecture#Advanced_SIMD_.28NEON.29 ARM NEON Technology from Wikipedia]
* [https://developer.arm.com/architectures/instruction-sets/simd-isas/helium SIMD ISAs | Arm Helium technology – Arm Developer]
* [https://en.wikipedia.org/wiki/AltiVec AltiVec from Wikipedia]
* [http://developer.apple.com/hardwaredrivers/ve/sse.html Hardware - SSE Performance Programming] from [http://developer.apple.com/index.html Apple Developer]
* [http://developer.apple.com/hardwaredrivers/ve/instruction_crossref.html Apple Instruction Cross-Reference] from [http://developer.apple.com/index.html Apple Developer]
==Misc==
* [https://en.wikipedia.org/wiki/Explicitly_parallel_instruction_computing Explicitly parallel instruction computing (EPIC) from Wikipedia]
=References=
<references />
 
'''[[Programming|Up one Level]]'''

Navigation menu