Changes

SIMD and SWAR Techniques

1,513 bytes added, 19:59, 18 November 2020

no edit summary

[[x86]], [[x86-64]], as well as [[PowerPC#G4|PowerPC]] and [https://en.wikipedia.org/wiki/Power_Architecture#Power_ISA_v.2.03 Power ISA v.2.03] processors provide '''Single Instructions''' on '''Multiple Data''' (SIMD), namely on [[Array|vectors]] of [[Float|floats]], [[Double|doubles]] or various integers, [[Byte|bytes]], [[Word|words]], [[Double Word|double words]] or [[Quad Word|quad words]], available through assembly and compiler intrinsics. SIMD-applications related to computer chess cover [[Bitboards|bitboard]] computations and [[Fill Algorithms|fill-algorithms]] like [[Dumb7Fill]] and [[Kogge-Stone Algorithm]], as well as [[Evaluation|evaluation]] related stuff, like this [[SSE2#SSE2dotproduct|SSE2 dot-product]] of 64 bits by a vector of 64 bytes.

'''SWAR''' as acronym for SIMD Within A Register was coined by [[Hank Dietz]] and ~~Randy~~ '''Randell J. Fisher ''' <ref>[http://www.aggregate.org/SWAR/ The Aggregate: SWAR, SIMD Within A Register] by [[Hank Dietz]]</ref> . It is a processing model which applies SIMD parallel processing across sections of a CPU register, often vectors of smaller than byte-entities are processed in [[Parallel Prefix Algorithms|parallel prefix]] manner.

=SIMD Instruction Sets=

* [[AltiVec]] on [[PowerPC#G4|PowerPC G4]], [[PowerPC#G5|PowerPC G5]]

* [[ARM NEON]]

* [[ARM Helium]]

* [[AVX]] by [[Intel]]

* [[AVX2]] by [[Intel]]

}

</pre>

=See also=

* [[GPU]]

* [[NNUE]]

* [[Parallel Prefix Algorithms]]

=Publications=

==1987 ...==

* [[Alan H. Bond]] ('''1987'''). ''Broadcasting Arrays - A Highly Parallel Computer Architecture Suitable For Easy Fabrication''. [http://www.exso.com/bc.pdf pdf]

* [[Mathematician#GEBlelloch|Guy E. Blelloch]] ('''1990'''). ''[https://dl.acm.org/citation.cfm?id=91254 Vector Models for Data-Parallel Computing]''. [https://en.wikipedia.org/wiki/MIT_Press MIT Press], [https://www.cs.cmu.edu/~guyb/papers/Ble90.pdf pdf]

* [https://dblp.uni-trier.de/pers/f/Fisher:Randall_J=.html Randell J. Fisher], [[Hank Dietz]] ('''1998'''). ''[https://link.springer.com/chapter/10.1007/3-540-48319-5_19 Compiling for SIMD Within a Register]''. [https://dblp.uni-trier.de/db/conf/lcpc/lcpc1998.html LCPC 1998], [https://link.springer.com/chapter/10.1007/3-540-48319-5_19 pdf]

* [https://www.linkedin.com/in/tom-thompson-500bb7b Tom Thompson] ('''1999'''). ''[http://www.mactech.com/articles/mactech/Vol.15/15.07/AltiVecRevealed/index.html AltiVec Revealed]''. [http://www.mactech.com/ MacTech], Vol. 15, No. 7

==2000 ...==* [https://dblp.uni-trier.de/pers/f/Fisher:Randall_J=.html Randell J. Fisher] ('''2003'''). ''[https://docs.lib.purdue.edu/dissertations/AAI3108343/ General-Purpose SIMD Within A Register: Parallel Processing on Consumer Microprocessors]''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Purdue_University Purdue University], advisor [[Hank Dietz]], [http://aggregate.org/SWAR/Dis/dissertation.pdf pdf]* [[Daisuke Takahashi]] ('''2007'''). ''[~~http~~https://~~www~~link.~~springerlink~~springer.com/~~content~~chapter/10.1007/~~6481607675376w42~~978-3-540-75755-9_135/ An Implementation of Parallel 1-D FFT Using SSE3 Instructions on Dual-Core Processors]''. Proc. Workshop on State-of-the-Art in Scientific and Parallel Computing, [https://en.wikipedia.org/wiki/Lecture_Notes_in_Computer_Science Lecture Notes in Computer Science], No. 4699, [https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media Springer]

* [[Daisuke Takahashi]] ('''2008'''). ''Implementation and Evaluation of Parallel FFT Using SIMD Instructions on Multi-Core Processors''. Proc. 2007 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems

* [https://www.researchgate.net/profile/Nicolas_Fritz2 Nicolas Fritz] ('''2009'''). ''SIMD Code Generation in Data-Parallel Programming''. Ph.D. thesis, [https://en.wikipedia.org/wiki/Saarland_University Saarland University], [http://scidok.sulb.uni-saarland.de/volltexte/2009/2563/pdf/Dissertation_9229_Frit_Nico_2009.pdf?q=ibms-cell-processor pdf]

==2010 ...==

* [https://www.rrze.fau.de/wir-ueber-uns/organigramm/mitarbeiter/index.shtml/georg-hager.shtml Georg Hager] <ref>[https://blogs.fau.de/hager/ Georg Hager's Blog | Random thoughts on High Performance Computing]</ref>, [http://dblp.uni-trier.de/pers/hd/t/Treibig:Jan Jan Treibig], [http://dblp.uni-trier.de/pers/hd/w/Wellein:Gerhard Gerhard Wellein] ('''2013'''). ''The Practitioner's Cookbook for Good Parallel Performance on Multi- and Many-Core Systems''. [https://de.wikipedia.org/wiki/Regionales_Rechenzentrum_Erlangen RRZE], [http://sc13.supercomputing.org/ SC13], [https://blogs.fau.de/hager/files/2013/11/sc13_tutorial_134.pdf slides as pdf]

* [https://scholar.google.com/citations?user=4Ab_NBkAAAAJ&hl=en Kaixi Hou], [[Hao Wang]], [http://dblp.uni-trier.de/pers/hd/f/Feng:Wu=chun Wu-chun Feng] ('''2015'''). ''ASPaS: A Framework for Automatic SIMDIZation of Parallel Sorting on x86-based Many-core Processors''. [http://dblp.uni-trier.de/db/conf/ics/ics2015.html#HouWF15 ICS2015],

=Forum Posts=

==1999==* [https://www.stmintz.com/ccc/index.php?id=71754 G4 & AltiVec] by [[Will Singleton]], [[CCC]], October 04, 1999» [[AltiVec]], [[PowerPC #G4|PowerPC G4]]==2000 ...==

* [http://www.talkchess.com/forum/viewtopic.php?t=23860 Superlinear interpolator: a nice novelity ?] by [[Marco Costalba]], [[CCC]], September 20, 2008 » [[Tapered Eval]]

* [http://www.talkchess.com/forum/viewtopic.php?p=301746#301746 Re: talk about IPP's evaluation] by [[Richard Vida]], [[CCC]], November 07, 2009 » [[Ippolit]], [[Tapered Eval]]

==2010 ...==

* [http://www.talkchess.com/forum/viewtopic.php?t=38523 My experience with Linux/GCC] by [[Richard Vida]], [[CCC]], March 23, 2011 » [[C]], [[Linux]], [[Tapered Eval]]

* [http://www.talkchess.com/forum/viewtopic.php?t=39916&start=1 Re: Utilizing Architecture Specific Functions from a HL Language] by [[Wylie Garvin]], [[CCC]], July 31, 2011

* [http://www.talkchess.com/forum/viewtopic.php?t=42054 two values in one integer] by [[Pierre Bokma]], [[CCC]], January 18, 2012

* [http://www.talkchess.com/forum/viewtopic.php?t=59820 Pigeon now using opportunistic SIMD] by [[Stuart Riffle]], [[CCC]], April 11, 2016 » [[Pigeon]]

* [http://www.talkchess.com/forum/viewtopic.php?t=61850 couple of questions about stockfish code ?] by [[Mahmoud Uthman]], [[CCC]], October 26, 2016 » [[Stockfish]], [[Tapered Eval]]

==2020 ...==

* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73126 SIMD methods in TT probing and replacement] by [[Harm Geert Muller]], [[CCC]], February 20, 2020 » [[Transposition Table]]

* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=75862 CPU Vector Unit, the new jam for NNs...] by [[Srdja Matovic]], [[CCC]], November 18, 2020 » [[NNUE]]

=External Links=

* [https://en.wikipedia.org/wiki/SWAR SWAR from Wikipedia]

* [http://www.aggregate.org/SWAR/ The Aggregate: SWAR, SIMD Within A Register] by [[Hank Dietz]]

* ==[[~~http://andythomason.com/lecture_notes/agp/agp_simd_programming.html Advanced game programing | Session 4 - Math libraries and SIMD~~x86]] ~~from [http:~~/~~/andythomason.com/lecture_notes/ Game programming lecture notes] by~~ [[~~Andy Thomason~~x86-64]]~~==x86~~==

* [https://en.wikipedia.org/wiki/MMX_%28instruction_set%29 MMX from Wikipedia]

* [https://en.wikipedia.org/wiki/3DNow 3DNow! from Wikipedia]

* [https://en.wikipedia.org/wiki/SSSE3 SSSE3 from Wikipedia]

* [https://en.wikipedia.org/wiki/SSE4 SSE4 from Wikipedia]

* [~~http~~https://de.wikipedia.org/wiki/SSE4a SSE4a from Wikipedia.de]

* [https://en.wikipedia.org/wiki/SSE5 SSE5 from Wikipedia]

* [https://en.wikipedia.org/wiki/XOP_instruction_set XOP instruction set from Wikipedia]

* [http://developer.amd.com/cpu/Libraries/sseplus/Pages/default.aspx SSEPlus Project] from [http://developer.amd.com/pages/default.aspx AMD Developer Central]

* [http://sseplus.sourceforge.net/index.html SSEPlus Project Documentation]

==Other ~~SIMD~~== * [~~http~~https://~~www~~developer.arm.com/~~products~~architectures/~~multimedia~~instruction-sets/simd-isas/neon~~/index.html ARM NEON Technology~~SIMD ISAs | Neon – Arm Developer]

* [https://en.wikipedia.org/wiki/ARM_architecture#Advanced_SIMD_.28NEON.29 ARM NEON Technology from Wikipedia]

* [https://developer.arm.com/architectures/instruction-sets/simd-isas/helium SIMD ISAs | Arm Helium technology – Arm Developer]

* [https://en.wikipedia.org/wiki/AltiVec AltiVec from Wikipedia]

* [http://developer.apple.com/hardwaredrivers/ve/sse.html Hardware - SSE Performance Programming] from [http://developer.apple.com/index.html Apple Developer]

* [http://developer.apple.com/hardwaredrivers/ve/instruction_crossref.html Apple Instruction Cross-Reference] from [http://developer.apple.com/index.html Apple Developer]

==Misc==

* [https://en.wikipedia.org/wiki/Explicitly_parallel_instruction_computing Explicitly parallel instruction computing (EPIC) from Wikipedia]

=References=

'''[[Programming|Up one Level]]'''

GerdIsenberg

Bureaucrats, Administrators

25,161

edits

Changes

SIMD and SWAR Techniques

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools