Changes

← Older edit

SIMD and SWAR Techniques

794 bytes added, 22:44, 17 May 2023

m

→‎SIMD Instruction Sets: fix in ref

[[x86]], [[x86-64]], as well as [[PowerPC#G4|PowerPC]] and [https://en.wikipedia.org/wiki/Power_Architecture#Power_ISA_v.2.03 Power ISA v.2.03] processors provide '''Single Instructions''' on '''Multiple Data''' (SIMD), namely on [[Array|vectors]] of [[Float|floats]], [[Double|doubles]] or various integers, [[Byte|bytes]], [[Word|words]], [[Double Word|double words]] or [[Quad Word|quad words]], available through assembly and compiler intrinsics. SIMD-applications related to computer chess cover [[Bitboards|bitboard]] computations and [[Fill Algorithms|fill-algorithms]] like [[Dumb7Fill]] and [[Kogge-Stone Algorithm]], as well as [[Evaluation|evaluation]] related stuff, like this [[SSE2#SSE2dotproduct|SSE2 dot-product]] of 64 bits by a vector of 64 bytes.

'''SWAR''' as acronym for SIMD Within A Register was coined by [[Hank Dietz]] and ~~Randy~~ '''Randell J. Fisher ''' <ref>[http://www.aggregate.org/SWAR/ The Aggregate: SWAR, SIMD Within A Register] by [[Hank Dietz]]</ref> . It is a processing model which applies SIMD parallel processing across sections of a CPU register, often vectors of smaller than byte-entities are processed in [[Parallel Prefix Algorithms|parallel prefix]] manner.

=SIMD Instruction Sets=

* [[SSE2]], [[SSE3]], [[SSSE3]] and [[SSE4]] on [[x86]] and [[x86-64]]

* [[SSE5]] by [[AMD]] (proposed but not implemented, replaced by [[XOP]] <ref>[https://en.wikipedia.org/wiki/SSE5 SSE5 from Wikipedia]</ref>)

* [[AltiVec]] on [[PowerPC#G4|PowerPC G4]], [[PowerPC#G5|PowerPC G5]] resp. VMX since [[POWER | POWER6]]* [https://en.wikipedia.org/wiki/AltiVec#VSX_(Vector_Scalar_Extension) VSX] since [[POWER | POWER7]]* [[Helium]] by [[ARM ]]* [[NEON]] by [[ARM]]* [[SVE]] <ref>[https://en.wikipedia.org/wiki/AArch64#Scalable_Vector_Extension_(SVE) SVE from Wikipedia]</ref> and [[SVE2]] <ref>[https://en.wikipedia.org/wiki/SVE SVE2 from Wikipedia]</ref> by [[ARM]]

* [[AVX]] by [[Intel]]

* [[AVX2]] by [[Intel]]

* [[AVX-512]] by [[Intel]]

* [[XOP]] by [[AMD]]

* [[VIS]] <ref>[https://en.wikipedia.org/wiki/Visual_Instruction_Set VIS from Wikipedia]</ref> since [[SPARC]] v9

* [[RISC-V]] vector-set extension <ref>[https://en.wikipedia.org/wiki/RISC-V#Vector_set RISC-V vector-set from Wikipedia]</ref>

=SWAR Arithmetic=

To apply addition and subtraction on vectors of bit-aggregates or [https://en.wikipedia.org/wiki/Bit_field bit-field structures] within a general purpose register, one has to take care carries and borrows don't wrap around. Thus the need to mask of all most significant bits (H) and add in two steps, one 'add' with MSB clear and one add modulo 2 aka '[[General Setwise Operations#ExclusiveOr|xor]]' for the MSB itself. For bytewise (rankwise) math inside a 64-bit register, H is 0x8080808080808080 and L is 0x0101010101010101.

=See also=

* [[GPU]]

* [[NNUE]]

* [[Parallel Prefix Algorithms]]

* [http://www.talkchess.com/forum/viewtopic.php?t=39916&start=1 Re: Utilizing Architecture Specific Functions from a HL Language] by [[Wylie Garvin]], [[CCC]], July 31, 2011

* [http://www.talkchess.com/forum/viewtopic.php?t=42054 two values in one integer] by [[Pierre Bokma]], [[CCC]], January 18, 2012

* [http://www.talkchess.com/forum/viewtopic.php?t=59820 Pigeon now using opportunistic SIMD] by [[Stuart Riffle]], [[CCC]], April 11, 2016 » [[Pigeon]]

* [http://www.talkchess.com/forum/viewtopic.php?t=61850 couple of questions about stockfish code ?] by [[Mahmoud Uthman]], [[CCC]], October 26, 2016 » [[Stockfish]], [[Tapered Eval]]

==2020 ...==

* [http://www.talkchess.com/forum3/viewtopic.php?f=7&t=73126 SIMD methods in TT probing and replacement] by [[Harm Geert Muller]], [[CCC]], February 20, 2020 » [[Transposition Table]]

* [http://www.talkchess.com/forum3/viewtopic.php?f=2&t=75862 CPU Vector Unit, the new jam for NNs...] by [[Srdja Matovic]], [[CCC]], November 18, 2020 » [[NNUE]]

=External Links=

* [https://en.wikipedia.org/wiki/SWAR SWAR from Wikipedia]

* [http://www.aggregate.org/SWAR/ The Aggregate: SWAR, SIMD Within A Register] by [[Hank Dietz]]

==[[x86]]/[[x86-64]]==

* [https://en.wikipedia.org/wiki/MMX_%28instruction_set%29 MMX from Wikipedia]

* [https://en.wikipedia.org/wiki/3DNow 3DNow! from Wikipedia]

* [http://sseplus.sourceforge.net/index.html SSEPlus Project Documentation]

==Other==

* [~~http~~https://~~www~~developer.arm.com/~~products~~architectures/~~multimedia~~instruction-sets/simd-isas/neon~~/index.html ARM NEON Technology~~SIMD ISAs | Neon – Arm Developer]

* [https://en.wikipedia.org/wiki/ARM_architecture#Advanced_SIMD_.28NEON.29 ARM NEON Technology from Wikipedia]

* [https://developer.arm.com/architectures/instruction-sets/simd-isas/helium SIMD ISAs | Arm Helium technology – Arm Developer]

* [https://en.wikipedia.org/wiki/AltiVec AltiVec from Wikipedia]

* [http://developer.apple.com/hardwaredrivers/ve/sse.html Hardware - SSE Performance Programming] from [http://developer.apple.com/index.html Apple Developer]

* [http://developer.apple.com/hardwaredrivers/ve/instruction_crossref.html Apple Instruction Cross-Reference] from [http://developer.apple.com/index.html Apple Developer]

==Misc==

* [https://en.wikipedia.org/wiki/Explicitly_parallel_instruction_computing Explicitly parallel instruction computing (EPIC) from Wikipedia]

Smatovic

422

edits

Changes

SIMD and SWAR Techniques

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools