Changes

Jump to: navigation, search

SSE2

20,590 bytes added, 11:48, 9 August 2018
Created page with " '''Home * Hardware * x86 * SSE2''' '''SSE2''' (Streaming SIMD Extensions 2) and further x86- or x86-64 streaming SIMD and SWAR Techniques|SIM..."

'''[[Main Page|Home]] * [[Hardware]] * [[x86]] * SSE2'''

'''SSE2''' (Streaming SIMD Extensions 2) and further [[x86]]- or [[x86-64]] streaming [[SIMD and SWAR Techniques|SIMD]] extensions, like [[SSE3]], [[SSSE3]], [[SSE4]] and AMD's announced [[SSE5]], as major enhancement to [[SSE]], provide an instruction set on 128-bit registers, namely on [[Array|vectors]] of four [[Float|floats]] or two [[Double|doubles]], as well since SSE2 as vectors of 16 [[Byte|bytes]], eight [[Word|words]], four [[Double Word|double words]] or two [[Quad Word|quad words]] <ref>[https://support.amd.com/TechDocs/26568.pdf AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit and 256-Bit Media Instructions] (pdf), has detailed explanations on all SSE2 128-Bit Media Instructions</ref>. In 64-bit mode there are 16 xmm registers available, xmm0..xmm15, in 32-bit mode only eight, xmm0..xmm7. SSE is explicitly available through [[C]]-Compiler intrinsics <ref>[http://msdn.microsoft.com/en-us/library/84t4h8ys%28v=VS.100%29.aspx Integer Intrinsics Using Streaming SIMD Extensions 2] Visual C++ Developer Center - Run-Time Library Reference</ref> or (inline) [[Assembly|assembly]]. Some compiler implicitly use SSE-float and double instructions for floating point data types, others even provide automatic SSE2 vectorization, while processing [[Array|arrays]] of various integer types. SSE- and SSE2-intrinsic functions are available in [https://en.wikipedia.org/wiki/Visual_C%2B%2B Visual C] <ref>[http://msdn.microsoft.com/en-us/library/x8zs5twb%28v=VS.100%29.aspx Instruction Reference] Visual C++ Developer Center - Run-Time Library Reference</ref> or [https://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler Intel-C] <ref>[https://software.intel.com/sites/landingpage/IntrinsicsGuide/# Intel Intrinsics Guide]</ref>.

=Integer Instructions=
{{SSE2 Instructions}}

=Applications=
Some vc2005 tested, sample SSE2 [[Bitboards|bitboard]] routines:
<span id="OneStepOnlySSE2"></span>
==One Step Only==
Since 128-bit xmm registers may treated as vector of 16 bytes, shifting techniques such as [[General Setwise Operations#OneStepOnly| one step]] in all eight directions can be done more efficiently with respect to wraps from a- to the h-file or vice versa. It is recommend to write a own SSE2-wrapper class with overloaded operators in C++ to encapsulate a vector of two bitboards.
<pre>
northwest north northeast
noWe nort noEa
+7 +8 +9
\ | /
west -1 <- 0 -> +1 east
/ | \
-9 -8 -7
soWe sout soEa
southwest south southeast
</pre>
Veritcal steps as usual with 64-byte shift a rank each:
<pre>
__m128i nortOne(__m128i b) {
b = _mm_slli_epi64 (b, 8);
return b;
}

__m128i soutOne(__m128i b) {
b = _mm_srli_epi64 (b, 8);
return b;
}
</pre>
Unfortunately there is no byte-wise shift in the SSE2-instruction set (as well as MMX), but using byte-wise parallel add avoids using the wrap masks, which need otherwise load from memory or computed. Applying the wraps mask takes two instructions.
<pre>
__m128i butNotA(__m128i b) {
b = _mm_srli_epi64 (b, 1);
b = _mm_add_epi8 (b, b);
return b;
}

__m128i butNotH(__m128i b) {
b = _mm_add_epi8 (b, b);
b = _mm_srli_epi64 (b, 1);
return b;
}
</pre>
This is how the east direction are computed based on parallel byte-wise add. Either one or two SSE2-instructions:
<pre>
__m128i eastOne(__m128i b) {
b = _mm_add_epi8 (b, b);
return b;
}

__m128i noEaOne (__m128i b) {
b = _mm_add_epi8 (b, b);
b = _mm_slli_epi64 (b, 8);
return b;
}

__m128i soEaOne (__m128i b) {
b = _mm_add_epi8 (b, b);
b = _mm_srli_epi64 (b, 8);
return b;
}
</pre>
West directions need a leading not A-file and take three instructions each:
<pre>
__m128i westOne(__m128i b) {
b = _mm_srli_epi64 (b, 1);
b = _mm_add_epi8 (b, b);
b = _mm_srli_epi64 (b, 1);
return b;
}

__m128i soWeOne (__m128i b) {
b = _mm_srli_epi64 (b, 1);
b = _mm_add_epi8 (b, b);
b = _mm_srli_epi64 (b, 9);
return b;
}

__m128i noWeOne (__m128i b) {
b = _mm_srli_epi64 (b, 1);
b = _mm_add_epi8 (b, b);
b = _mm_slli_epi64 (b, 7);
return b;
}
</pre>
<span id="EastAttacks"></span>
==East Attacks==
SIMD-wise [[Fill by Subtraction]] with byte- or rank-wise arithmetic takes only a few instructions with SSE2:
<pre>
__m128i eastAttacks (__m128i occ, __m128i rooks) {
__m128i tmp;
occ = _mm_or_si128 (occ, rooks); // make rooks member of occupied
tmp = _mm_xor_si128(occ, rooks); // occ - rooks
tmp = _mm_sub_epi8 (tmp, rooks); // occ - 2*rooks
return _mm_xor_si128(occ, tmp); // occ ^ (occ - 2*rooks)
}
</pre>
<span id="SSE2dotproduct"></span>
==SSE2 dot product==
The [https://en.wikipedia.org/wiki/Dot_product dot product] <ref>[http://mathworld.wolfram.com/DotProduct.html Dot Product - from Wolfram MathWorld]</ref> of a vector of [[Bit|bits]] by a weight vector of [[Byte|bytes]] might be used in determining [[Mobility|mobility]] for [[Evaluation|evaluation]] purposes. The vector of bits is a bitboard of all squares attacked by one (or multiple) piece(s), while the the weight vector considers the "importance" of [[Squares|squares]], like center control, or even [[King Safety#SquareControl|square controls]] near the opponent [[King|king]], e.g. by providing 64 weight vectors for each king square.

64
bb&#183;weights = <span style="font-size:240%;">&#8721;</span> bb<span style="vertical-align: sub;">i</span> weights<span style="vertical-align: sub;">i</span> = bb<span style="vertical-align: sub;">1</span> weights<span style="vertical-align: sub;">1</span> + ... + bb<span style="vertical-align: sub;">64</span> weights<span style="vertical-align: sub;">64</span>
i=1

The 64-bit times 64-byte dot product implements a kind of weighted [[Population Count|population count]] similar than following loop approach, but completely unrolled and [[Avoiding Branches|branchless]]:
<pre>
int dotProduct(U64 bb, BYTE weights[])
{
U64 bit = 1;
int accu = 0;
for (int sq=0; sq < 64; sq++, bit += bit) {
if ( bb & bit) accu += weights[sq];
// accu += weights[sq] & -(( bb & bit) == bit); // branchless 1
// accu += weights[sq] & -(( ~bb & bit) == 0); // branchless 2
}
return accu;
}
</pre>
The SSE2 routine expands a bitboard as a vector of 64 bits into 64-bytes inside four 128-bit xmm registers, and performs the multiplication with the byte-vector by bitwise 'and', before it finally adds all bytes together. The bitboard is therefor manifolded eight times with a sequence of seven unpack and interleave instructions to remain the same expanded byte-order of the bits from the bitboards, before to mask and compare them with a vector of bytes with one appropriate bit set each.

The dot product is designed for unsigned weights in the 0..63 range, so that vertical bytewise adds of the four weights can not overflow. Nevertheless, three ''PADDUSB - packed add unsigned byte with saturation'' instructions ([http://msdn.microsoft.com/en-us/library/9hahyddy%28VS.80%29.aspx _mm_adds_epu8]) are used to limit the maximum four byte sum to 255 to make the routine more "robust" for cases with average weights greater than 63. The horizontal add of both [[Quad Word|quad words]] of the 128-bit xmmregister is performed by the ''PSADBW - packed sum of absolute differences of bytes into a word'' instruction ([http://msdn.microsoft.com/en-us/library/b0yshs6s.aspx _mm_sad_epu8]) with zero, while the final add of the two resulting [[Word|word]] sums in the high and low quad word of the xmm register is done with general purpose registers.
<pre>
#include <emmintrin.h>
#define XMM_ALIGN __declspec(align(16))

/* for average weights < 64 */
int dotProduct64(U64 bb, BYTE weights[] /* XMM_ALIGN */)
{
static const U64 XMM_ALIGN sbitmask[2] = {
C64(0x8040201008040201),
C64(0x8040201008040201)
};
__m128i x0, x1, x2, x3, bm;
__m128i* pW = (__m128i*) weights;
bm = _mm_load_si128 ( (__m128i*) sbitmask);
x0 = _mm_cvtsi64x_si128(bb); // 0000000000000000:8040201008040201
// extend bits to bytes
x0 = _mm_unpacklo_epi8 (x0, x0); // 8080404020201010:0808040402020101
x2 = _mm_unpackhi_epi16 (x0, x0); // 8080808040404040:2020202010101010
x0 = _mm_unpacklo_epi16 (x0, x0); // 0808080804040404:0202020201010101
x1 = _mm_unpackhi_epi32 (x0, x0); // 0808080808080808:0404040404040404
x0 = _mm_unpacklo_epi32 (x0, x0); // 0202020202020202:0101010101010101
x3 = _mm_unpackhi_epi32 (x2, x2); // 8080808080808080:4040404040404040
x2 = _mm_unpacklo_epi32 (x2, x2); // 2020202020202020:1010101010101010
x0 = _mm_and_si128 (x0, bm);
x1 = _mm_and_si128 (x1, bm);
x2 = _mm_and_si128 (x2, bm);
x3 = _mm_and_si128 (x3, bm);
x0 = _mm_cmpeq_epi8 (x0, bm);
x1 = _mm_cmpeq_epi8 (x1, bm);
x2 = _mm_cmpeq_epi8 (x2, bm);
x3 = _mm_cmpeq_epi8 (x3, bm);
// multiply by "and" with -1 or 0
x0 = _mm_and_si128 (x0, pW[0]);
x1 = _mm_and_si128 (x1, pW[1]);
x2 = _mm_and_si128 (x2, pW[2]);
x3 = _mm_and_si128 (x3, pW[3]);
// add all bytes (with saturation)
x0 = _mm_adds_epu8 (x0, x1);
x0 = _mm_adds_epu8 (x0, x2);
x0 = _mm_adds_epu8 (x0, x3);
x0 = _mm_sad_epu8 (x0, _mm_setzero_si128 ());
return _mm_cvtsi128_si32(x0)
+ _mm_extract_epi16(x0, 4);
}
</pre>
==Rotated Dot Product==
A little bit cheaper is to expand the bitboard to a vector or 90 degree rotated {0,255} bytes, which requires a rotated weight vector as well <ref>[https://www.stmintz.com/ccc/index.php?id=377546 SSE2 bit[64] * byte[64] dot product] by [[Gerd Isenberg]], [[Computer Chess Forums]], July 17, 2004</ref>.
<pre>
/* for average weights < 64 */
int dotProduct64(U64 bb, BYTE weightsRot90[] /* XMM_ALIGN */)
{
static const U64 CACHE_ALIGN masks[8] = {
C64(0x0101010101010101), C64(0x0202020202020202),
C64(0x0404040404040404), C64(0x0808080808080808),
C64(0x1010101010101010), C64(0x2020202020202020),
C64(0x4040404040404040), C64(0x8080808080808080),
};
__m128i x0, x1, x2, x3, zr; U32 cnt;
__m128i * pM = (__m128i*) masks;
__m128i * pW = (__m128i*) weightsRot90;
x0 = _mm_cvtsi64x_si128 (bb);
x0 = _mm_unpacklo_epi64 (x0, x0);
zr = _mm_setzero_si128 ();
x3 = _mm_andnot_si128 (x0, pM[3]);
x2 = _mm_andnot_si128 (x0, pM[2]);
x1 = _mm_andnot_si128 (x0, pM[1]);
x0 = _mm_andnot_si128 (x0, pM[0]);
x3 = _mm_cmpeq_epi8 (x3, zr);
x2 = _mm_cmpeq_epi8 (x2, zr);
x1 = _mm_cmpeq_epi8 (x1, zr);
x0 = _mm_cmpeq_epi8 (x0, zr);
// multiply by "and" with -1 or 0
x3 = _mm_and_si128 (x3, pW[3]);
x2 = _mm_and_si128 (x2, pW[2]);
x1 = _mm_and_si128 (x1, pW[1]);
x0 = _mm_and_si128 (x0, pW[0]);
// add all bytes (with saturation)
x3 = _mm_adds_epu8 (x3, x2);
x0 = _mm_adds_epu8 (x0, x1);
x0 = _mm_adds_epu8 (x0, x3);
x0 = _mm_sad_epu8 (x0, zr);
return _mm_cvtsi128_si32(x0)
+ _mm_extract_epi16(x0, 4);
}
</pre>
<span id="SSE2popcount"></span>
==SSE2 Population Count==
Following proposal of a [[Population Count#SWARPopcount|SWAR-Popcount]] combined with a dot product might be quite competitive on recent [[x86-64]] processors with a throughput of up to three simd-instructions per cycle <ref>[http://www.intel.com/design/processor/manuals/248966.pdf Intel 64 and IA32 Architectures Optimization Reference Manual] (pdf) Appendix C Instruction Latencies</ref> <ref>[https://support.amd.com/techdocs/40546.pdf Software Optimization Guide for AMD Family 10h and 12h Processors] (pdf) Appendix C Instruction Latencies</ref> . It determines the cardinalities of eight bitboards, multiplies them with a corresponding weight, a signed 16-bit [[Word|word]], to finally add all together as integer. However, [[Wojciech Muła|Wojciech Muła's]] [[SSSE3#PopCount|SSSE3 PopCnt]] would save some more cycles, even more with doubled or fourfold register widths using [[AVX2]] or [[AVX-512]].
<pre>
/**
* popCountWeight8
* @author Gerd Isenberg
* @param bb vector of eight bitboards
* weight vector of eight short weights
* @return sum(0,7) popcnt(bb[i]) * weight[i]
*/
int popCountWeight8(const U64 bb[8], const short weight[8]) {
static const U64 XMM_ALIGN masks[6] = {
C64(0x5555555555555555), C64(0x5555555555555555),
C64(0x3333333333333333), C64(0x3333333333333333),
C64(0x0f0f0f0f0f0f0f0f), C64(0x0f0f0f0f0f0f0f0f)
};
const __m128i* pM = (const __m128i*) masks;
const __m128i* pb = (const __m128i*) bb;
__m128i v = pb[0], w = pb[1], x = pb[2], y = pb[3];
v = _mm_sub_epi8(v, _mm_and_si128(_mm_srli_epi64(v, 1), pM[0]));
w = _mm_sub_epi8(w, _mm_and_si128(_mm_srli_epi64(w, 1), pM[0]));
x = _mm_sub_epi8(x, _mm_and_si128(_mm_srli_epi64(x, 1), pM[0]));
y = _mm_sub_epi8(y, _mm_and_si128(_mm_srli_epi64(y, 1), pM[0]));

v = _mm_add_epi8(_mm_and_si128(v, pM[1]), _mm_and_si128(_mm_srli_epi64(v, 2), pM[1]));
w = _mm_add_epi8(_mm_and_si128(w, pM[1]), _mm_and_si128(_mm_srli_epi64(w, 2), pM[1]));
x = _mm_add_epi8(_mm_and_si128(x, pM[1]), _mm_and_si128(_mm_srli_epi64(x, 2), pM[1]));
y = _mm_add_epi8(_mm_and_si128(y, pM[1]), _mm_and_si128(_mm_srli_epi64(y, 2), pM[1]));

v = _mm_and_si128(_mm_add_epi8 (v, _mm_srli_epi64(v, 4)), pM[2]);
w = _mm_and_si128(_mm_add_epi8 (w, _mm_srli_epi64(w, 4)), pM[2]);
x = _mm_and_si128(_mm_add_epi8 (x, _mm_srli_epi64(x, 4)), pM[2]);
y = _mm_and_si128(_mm_add_epi8 (y, _mm_srli_epi64(y, 4)), pM[2]);

__m128i z = _mm_setzero_si128();
v = _mm_packs_epi16(_mm_sad_epu8(v, z), _mm_sad_epu8(w, z));
x = _mm_packs_epi16(_mm_sad_epu8(x, z), _mm_sad_epu8(y, z));
const __m128i* pW = (const __m128i*) weight;
v = _mm_madd_epi16 (_mm_packs_epi16(v, x), pW[0]);
v = _mm_add_epi32 (v, _mm_srli_si128(v, 4));
v = _mm_add_epi32 (v, _mm_srli_si128(v, 8));
return _mm_cvtsi128_si32(v);
}
</pre>
<span id="SSE2WrapperinCpp"></span>
=SSE2-Wrapper in C++=
[[Cpp|C++]] quite efficiently allows to wrap all the intrinsics and to write classes with appropriate operators overloaded - the proposal here overloads + and - operators for byte- or rank-wise arithmetic, shifts work on 64-bit entities as often used in mentioned SSE2-routines or [[Kogge-Stone Algorithm|Kogge-Stone]] fill-stuff. A base class for the memory layout and two derivations provide to implement routines with SSE2 or general purpose instructions - or any other available SIMD-architecture like [[AltiVec]]. For instance a [[Quad-Bitboards|quad-bitboard]] attack-getter:
<pre>
// T is either XMM or GPR
template <class T> inline
void eastAttacks(QBB& t, const QBB& s, U64 occ) {
T* pt = (T*)&t;
T r0(s.bb[0]);
T r1(s.bb[2]);
T o0(occ, occ);
T o1 = o0 | r1;
o0 = o0 | r0;
pt[0] = o0 ^ ((o0 ^ r0) - r0);
pt[1] = o1 ^ ((o1 ^ r1) - r1);
}
</pre>
A proposal for a class skeleton:
<pre>
class DBB
{
friend class XMM;
friend class GPR;
public:
DBB(){}
... more constructors
public:
union
{
__m128i x; // this intrinsice type is wrapped here
u64 b[2];
};
};

// intrinsic sse2 xmm wrapper
class XMM : public DBB
{
public:
XMM(){}
XMM(U64 b) {x = _mm_cvtsi64x_si128(b);}
XMM(__m128i a){x = a;}
XMM(U64 high, U64 low) ...
... more constructors

XMM& operator>>=(int sh) {x = _mm_srli_epi64(x, sh); return *this;}
XMM& operator<<=(int sh) {x = _mm_slli_epi64(x, sh); return *this;}
XMM& operator&=(const XMM &a) {x = _mm_and_si128(x, a.x); return *this;}
XMM& operator|=(const XMM &a) {x = _mm_or_si128 (x, a.x); return *this;}
XMM& operator^=(const XMM &a) {x = _mm_xor_si128(x, a.x); return *this;}

// byte- or rankwise arithmetic
XMM& operator+=(const XMM &a) {x = _mm_add_epi8(x, a.x); return *this;}
XMM& operator-=(const XMM &a) {x = _mm_sub_epi8(x, a.x); return *this;}

friend XMM operator>>(const XMM &a, int sh) {return XMM(_mm_srli_epi64(a.x, sh));}
friend XMM operator<<(const XMM &a, int sh) {return XMM(_mm_slli_epi64(a.x, sh));}
friend XMM operator& (const XMM &a, const XMM &b) {return XMM(_mm_and_si128(a.x, b.x));}
friend XMM operator| (const XMM &a, const XMM &b) {return XMM(_mm_or_si128(a.x, b.x));}
friend XMM operator^ (const XMM &a, const XMM &b) {return XMM(_mm_xor_si128(a.x, b.x));}
friend XMM operator+ (const XMM &a, const XMM &b) {return XMM(_mm_add_epi8(a.x, b.x));}
friend XMM operator- (const XMM &a, const XMM &b) {return XMM(_mm_sub_epi8(a.x, b.x));}
...
};

// pure C wrapper
class GPR : public DBB
{
...
};
</pre>

=See also=
* [[AltiVec]]
* [[AVX]]
* [[AVX2]]
* [[AVX-512]]
* [[DirGolem]]
* [[MMX]]
* [[SIMD and SWAR Techniques]]
* [[SIMD techniques]] for [[Sliding Piece Attacks]] with [[Bitboards]]
* [[Quad-Bitboards#SSE2Conversions|SSE2 Conversions]] of [[Quad-Bitboards]]
* [[SSE]]
* [[SSE3]]
* [[SSSE3]]
* [[SSE4]]
* [[SSE5]]
* [[XOP]]

=Manuals=
* [https://support.amd.com/TechDocs/26568.pdf AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit and 256-Bit Media Instructions] (pdf)
* [https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-2a-manual.html Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference, A-L]
* [https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-2b-manual.html Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2B: Instruction Set Reference, M-U]

=Forum Posts=
* [https://www.stmintz.com/ccc/index.php?id=343790 A SIMD idea, eg. Piece/Gain of a capture target] by [[Gerd Isenberg]], [[CCC]], January 21, 2004 » [[Move Ordering]]
* [https://www.stmintz.com/ccc/index.php?id=377546 SSE2 bit[64] * byte[64] dot product] by [[Gerd Isenberg]], [[CCC]], July 17, 2004
* [https://groups.google.com/group/comp.lang.asm.x86/browse_frm/thread/11095ec26e3ed536 SSE2-Sort within a register] by [[Gerd Isenberg]], [https://groups.google.com/group/comp.lang.asm.x86/topics comp.lang.asm.x86], January 08, 2005
* [https://www.stmintz.com/ccc/index.php?id=405396 planning a SSE-optimized chess engine] by [[Aart Bik]], [[CCC]], January 12, 2005
* [https://www.stmintz.com/ccc/index.php?id=418648 On SSE2-Intrinsics] by [[Gerd Isenberg]], [[CCC]], March 28, 2005
* [http://www.talkchess.com/forum/viewtopic.php?t=30471 Problem with functions not inlining] by [[Gregory Strong]], [[CCC]], November 04, 2009

=External Links=
* [https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions SSE from Wikipedia]
* [https://en.wikipedia.org/wiki/SSE2 SSE2 from Wikipedia]
* [http://stackoverflow.com/questions/661338/sse-sse2-and-sse3-for-gnu-c SSE SSE2 and SSE3 for GNU C++], [https://en.wikipedia.org/wiki/Stack_Overflow Stack Overflow]
* [http://stackoverflow.com/questions/7646018/sse-instructions-single-memory-access Concurrency - SSE instructions: single memory access], [https://en.wikipedia.org/wiki/Stack_Overflow Stack Overflow]
* [http://www.agner.org/optimize/#manuals Agner Fog's manuals]
: [http://www.agner.org/optimize/calling_conventions.pdf Calling conventions for different C++ compilers and operating systems] (pdf) by [http://www.agner.org/ Agner Fog]
* [http://www.agner.org/optimize/blog/ Agner`s CPU blog] by [http://www.agner.org/ Agner Fog]
* [http://developer.amd.com/cpu/Libraries/sseplus/Pages/default.aspx SSEPlus Project] from [http://developer.amd.com/pages/default.aspx AMD Developer Central]
* [http://sseplus.sourceforge.net/index.html SSEPlus Project Documentation]
* [http://software.intel.com/sites/landingpage/IntrinsicsGuide/ Intel Intrinsics Guide]
* [[:Category:Kraan|Kraan]] <ref>[http://www.kraan.de/ Kraan homepage] (German)</ref> - [https://en.wikipedia.org/wiki/Full_Throttle Vollgas] [https://en.wikipedia.org/wiki/Ahoy_%28greeting%29 Ahoi] - Live April 25, 2009, [http://www.raetsche.de/index.php Rätschenmühle], [https://en.wikipedia.org/wiki/Geislingen_an_der_Steige Geislingen], [https://en.wikipedia.org/wiki/YouTube YouTube] Video
: [https://en.wikipedia.org/wiki/Hellmut_Hattler Hellmut Hattler], [https://en.wikipedia.org/wiki/Peter_Wolbrandt Peter Wolbrandt], [http://www1.sticks.de/magazine/0309/fride.htm Jan Fride Wolbrandt]
: {{#evu:https://www.youtube.com/watch?v=A11bAJOOqF8|alignment=left|valignment=top}}

=References=
<references />

'''[[x86|Up one Level]]'''
[[Category:Kraan]]

Navigation menu