Quad-core AMD Opteron processor ^[1]

x86-64 or x64,
an 64-bit x86-extension, designed by AMD as Hammer- or K8 architecture with Athlon 64 and Opteron cpus. It has been cloned by Intel under the name EMT64 and later Intel 64. Beside 64-bit general purpose extensions, x86-64 supports MMX-, x87- as well as the 128-bit SSE- and SSE2-instruction sets. According to the CPUID-instructions, further SIMD Streamig Extensions, such as SSE3, SSSE3 (Intel only), SSE4 (Core2, K10), AVX, AVX2 and AVX-512, and AMD's 3DNow!, Enhanced 3DNow! and XOP.

Register File

x86-64 doubles the number of x86 general purpose- and XMM registers.

General Purpose

The 16 general purpose registers may be treated as 64 bit Quad Word (bitboard), 32 bit Double Word, 16 bit Word and high (partly), low Byte ^[2]:

64	32	16	8 high	8 low	Purpose
RAX	EAX	AX	AH	AL	GP, Accumulator
RBX	EBX	BX	BH	BL	GP, Index Register
RCX	ECX	CX	CH	CL	GP, Counter, variable shift, rotate via CL
RDX	EDX	DX	DH	DL	GP, high Accumulator mul/div
RSI	ESI	SI	-	-	GP, Source Index
RDI	EDI	DI	-	-	GP, Destination Index
RSP	ESP	SP	-	-	Stack Pointer
RBP	EBP	BP	-	-	GP, Base Pointer
R08	R08D	R08W	-	R08B	GP
R..	R..D	R..W	-	R..B	GP
R15	R15D	R15W	-	R15B	GP

MMX

Eight 64-bit MMX-Registers: MM0 - MM7. Treated as Double, Quad Word or vector of two Floats, Double Words, vector if four Words or eight Bytes.

SSE/SSE*

Sixteen 128-bit XMM-Registers: XMM0 - XMM15. Treated as vector of two Doubles or Quad Words, as vector of four Floats or Double Words, and as vector of eight Words or 16 Bytes.

AVX, AVX2/XOP

Intel Sandy Bridge and AMD Bulldozer Sixteen 256-bit YMM-Registers: YMM0 - YMM15 (shared by XMM as lower half). Treated as vector of four Doubles or Quad Words, as vector of eight Floats or Double Words, and as vector of 15 Words or 32 Bytes.

AVX-512

Intel Xeon Phi (2015) 32 512-bit ZMM-Registers: ZMM0 - ZMM31 Eight vector mask registers

Instructions

Useful instructions for bitboard-applications are by default not supported by high-level programming languages. Available through (inline) Assembly or compiler intrinsics of various C-Compilers ^[3].

General Purpose

x86-64 Instructions, C-Intrinsic reference from x64 (amd64) Intrinsics List | Microsoft Docs

Mnemonic	Description	C-Intrinsic	Remark
bsf	bit scan forward	_BitScanForward64
bsr	bit scan reverse	_BitScanReverse64
bswap	byte swap	_byteswap_uint64
bt	bit test	_bittest64
btc	bit test and complement	_bittestandcomplement64
btr	bit test and reset	_bittestandreset64
bts	bit test and set	_bittestandset64
cpuid	cpuid	_cpuid	cpuid
imul	signed multiplication	_mulh, _mul128
lzcnt	leading zero count	_lzcnt16, _lzcnt, _lzcnt64	cpuid, SSE4a
mul	unsigned multiplication	_umulh, _umul128
popcnt	population count	_popcnt16, _popcnt, _popcnt64	cpuid, SSE4.2, SSE4a
rdtsc	read performance counter	_rdtsc
rol, ror	rotate left, right	_rotl, _rotl64, _rotr, _rotr64

Bit-Manipulation

SSE2

x86 and x86-64 - SSE2 Instructions, C-Intrinsic reference from Intel Intrinsics Guide

Mnemonic	Description	C-Intrinsic
bitwise logical		return		parameter
pand	packed and, r := a & b	_m128i	_mm_and_si128	(_m128i a, _m128i b)
pandn	packed and not, r := ~a & b	_m128i	_mm_andnot_si128	(_m128i a, _m128i b)
por	packed or, r := a \| b	_m128i	_mm_or_si128	(_m128i a, _m128i b)
pxor	packed xor, r:= a ^ b	_m128i	_mm_xor_si128	(_m128i a, _m128i b)
quad word shifts		return		parameter
psrlq	packed shift right logical quad	_m128i	_mm_srl_epi64	(_m128i a, _m128i cnt)
psrlq	immediate	_m128i	_mm_srli_epi64	(_m128i a, int cnt)
psllq	packed shift left logical quad	_m128i	_mm_sll_epi64	(_m128i a, _m128i cnt)
psllq	immediate	_m128i	_mm_slli_epi64	(_m128i a, int cnt)
arithmetical		return		parameter
paddb	packed add bytes	_m128i	_mm_add_epi8	(_m128i a, _m128i b)
psubb	packed subtract bytes	_m128i	_mm_sub_epi8	(_m128i a, _m128i b)
psadbw	packed sum of absolute differences of bytes into a word	_m128i	_mm_sad_epu8	(_m128i a, _m128i b)
pmaxsw	packed maximum signed words	_m128i	_mm_max_epi16	(_m128i a, _m128i b)
pmaxub	packed maximum unsigned bytes	_m128i	_mm_max_epu8	(_m128i a, _m128i b)
pminsw	packed minimum signed words	_m128i	_mm_min_epi16	(_m128i a, _m128i b)
pminub	packed minimum unsigned bytes	_m128i	_mm_min_epu8	(_m128i a, _m128i b)
pcmpeqb	packed compare equal bytes	_m128i	_mm_cmpeq_epi8	(_m128i a, _m128i b)
pmullw	packed multiply mow signed (unsigned) word	_m128i	_mm_mullo_epi16	(_m128i a, _m128i b)
pmulhw	packed multiply high signed word	_m128i	_mm_mulhi_epi16	(_m128i a, _m128i b)
pmulhuw	packed multiply high unsigned word	_m128i	_mm_mulhi_epu16	(_m128i a, _m128i b)
pmaddwd	packed multiply words and add doublewords	_m128	_mm_madd_epi16	(_m128i a, _m128i b)
unpack, shuffle		return		parameter
punpcklbw	unpack and interleave low bytes `gGhHfFeE:dDcCbBaA :=` `xxxxxxxx:GHFEDCBA #` `xxxxxxxx:ghfedcba`	_m128i	_mm_unpacklo_epi8	(_m128i A, _m128i a)
punpckhbw	unpack and interleave high bytes `gGhHfFeE:dDcCbBaA :=` `GHFEDCBA:xxxxxxxx #` `ghfedcba:xxxxxxxx`	_m128i	_mm_unpackhi_epi8	(_m128i A, _m128i a)
punpcklwd	unpack and interleave low words `dDcC:bBaA := xxxx:DCBA#xxxx:dcba`	_m128i	_mm_unpacklo_epi16	(_m128i A, _m128i a)
punpckhwd	unpack and interleave high words `dDcC:bBaA := DCBA:xxxx#dcba:xxxx`	_m128i	_mm_unpackhi_epi16	(_m128i A, _m128i a)
punpckldq	unpack and interleave low doublewords `bB:aA := xx:BA # xx:ba`	_m128i	_mm_unpacklo_epi32	(_m128i A, _m128i a)
punpckhdq	unpack and interleave high doublewords `bB:aA := BA:xx # ba:xx`	_m128i	_mm_unpackhi_epi32	(_m128i A, _m128i a)
punpcklqdq	unpack and interleave low quadwords `a:A := x:A # x:a`	_m128i	_mm_unpacklo_epi64	(_m128i A, _m128i a)
punpckhqdq	unpack and interleave high quadwords `a:A := A:x # a:x`	_m128i	_mm_unpackhi_epi64	(_m128i A, _m128i a)
pshuflw	packed shuffle low words	_m128i	_mm_shufflelo_epi16	(_m128i a, int imm)
pshufhw	packed shuffle high words	_m128i	_mm_shufflehi_epi16	(_m128i a, int imm)
pshufd	packed shuffle doublewords	_m128i	_mm_shuffle_epi32	(_m128i a, int imm)
load, store, moves		return		parameter
movdqa	move aligned double quadword xmm := *p	_m128i	_mm_load_si128	(_m128i const *p)
movdqu	move unaligned double quadword xmm := *p	_m128i	_mm_loadu_si128	(_m128i const*p)
movdqa	move aligned double quadword *p := xmm	void	_mm_store_si128	(_m128i *p, _m128i a)
movdqu	move unaligned double quadword *p := xmm	void	_mm_storeu_si128	(_m128i *p, _m128i a)
movq	move quadword, xmm := gp64	_m128i	_mm_cvtsi64_si128	(_int64 a)
movq	move quadword, gp64 := xmm	_int64	_mm_cvtsi128_si64	(_m128i a)
movd	move double word or quadword xmm := gp64	_m128i	_mm_cvtsi64x_si128	(_int64 value)
movd	move doubleword, xmm := gp32	_m128i	_mm_cvtsi32_si128	(int a)
movd	move doubleword, gp32 := xmm	int	_mm_cvtsi128_si32	(_m128i a)
pextrw	extract packed word, gp16 := xmm[i]	int	_mm_extract_epi16	(_m128i a, int imm)
pinsrw	packed insert word, xmm[i] := gp16	_m128i	_mm_insert_epi16	(_m128i a, int b, int imm)
pmovmskb	packed move mask byte, gp32 := 16 sign-bits(xmm)	int	_mm_movemask_epi	(_m128i a)
cache support		return		parameter
prefetch		void	_mm_prefetch	(char const* p , int i)

Software

Operating Systems

Development

Assembly

C-Compiler

Publications

Georg Hager ^[5], Jan Treibig, Gerhard Wellein (2013). The Practitioner's Cookbook for Good Parallel Performance on Multi- and Many-Core Systems. RRZE, SC13, slides as pdf
S. Ali Mirsoleimani, Aske Plaat, Jaap van den Herik, Jos Vermaseren (2014). Performance analysis of a 240 thread tournament level MCTS Go program on the Intel Xeon Phi. CoRR abs/1409.4297 » Go
S. Ali Mirsoleimani, Aske Plaat, Jaap van den Herik, Jos Vermaseren (2015). Scaling Monte Carlo Tree Search on Intel Xeon Phi. CoRR abs/1507.04383 » Hex, MCTS, Parallel Search

Manuals

Agner Fog

AMD

AMD Tech Docs

Instructions

Optimization Guides

Intel

Instructions

Optimization Guides

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Forum Posts

2003 ...

IA-64 vs OOOE (attn Taylor, Hyatt) by Tom Kerrigan, CCC, February 11, 2003 » Itanium
Opteron NUMA/SMP question by Matthew Hull, CCC, February 09, 2005 » NUMA, SMP
core2 popcnt by Frank Phillips, CCC, February 13, 2009 » Population Count

2010 ...

Ivy Bridge vs Sandy Bridge for computer chess by Larry Kaufman, CCC, September 15, 2012
What is your take on AMD's new processor? by Tano-Urayoan Russi Roman, CCC, October 24, 2012
Intel i3 L2 cache by Harm Geert Muller, CCC, January 28, 2014 » Memory ^[6]
Core Port Saturation by Natale Galioto, CCC, April 14, 2014

2015 ...

syzygy users (and Ronald) by Robert Hyatt, CCC, September 29, 2016 » BitScan, Population Count
New AMD processors by Ingo Althöfer, The Computer-go Archives, March 03, 2017
Ryzen and BMI2: Strange behavior and high latencies by DonnieTinyHands, Reddit, March 20, 2017 » AMD, BMI2
Is anyone here already using a Ryzen 1800X processor ? by Aloisio Ponti, CCC, March 26, 2017 » AMD
Intel CPU performance-loss by security-patch?!? by Stefan Pohl, CCC, January 03, 2018
Re: Komodo 11.3 by Mark Lefler, CCC, March 04, 2018 » AMD, BMI2 PEXT, Komodo 11.3
Some x64 assembler for the curious by Michael Sherwin, CCC, March 22, 2019 » Assembly
Ryzen problems - AGAIN! by noobpwnftw, CCC, October 22, 2019

2020 ...

Intel AMX with TMUL on Xeon Sapphire Rapids (2021?) by Srdja Matovic, CCC, July 05, 2020 » AMX
Can somebody compare the AMD Ryzen processors to the intel processors by George Pichard, CCC, March 24, 2021

External Links

x86-64 from Wikipedia
x86-64 calling conventions from Wikipedia
x86 Addressing modes from Wikipedia
X32 ABI from Wikipedia ^[7]
Stack frame layout on x86-64 from Eli Bendersky's website, September 06, 2011 » Stack
Introduction to x64 Assembly by Chris Lomont, March 2012

AMD

List of AMD CPU microarchitectures from Wikipedia
AMD K8 from Wikipedia
- Athlon 64
- Athlon 64 FX
- Opteron
- Athlon 64 X2 dual-core
- Turion 64 X2 dual-core
Inside AMD's Hammer: the 64-bit architecture behind the Opteron and Athlon 64 by Jon Stokes, ars technica, February 01, 2005
Understanding the detailed Architecture of AMD's 64 bit Core by Hans de Vries, September 21, 2003
AMD K8 from 7-Zip LZMA Benchmark
AMD K9 from Wikipedia
AMD 10h from Wikipedia
AMD K10 (Phenom) from 7-Zip LZMA Benchmark
Phenom triple-core, quad-core
Bobcat (microarchitecture) from Wikipedia
Bulldozer (microarchitecture) from Wikipedia
Piledriver (microarchitecture) from Wikipedia
Steamroller (microarchitecture) from Wikipedia
Excavator (microarchitecture) from Wikipedia
Zen (microarchitecture) from Wikipedia
Zen (first generation microarchitecture) from Wikipedia
Zen+ from Wikipedia
Zen 2 from Wikipedia
Zen 3 from Wikipedia
Zen 4 from Wikipedia

Intel

Instruction Sets

AVX-512 from Wikipedia » AVX-512

Security Vulnerability

References

↑ Die shot of AMD Opteron quad-core processor, Wikimedia Commons
↑ Introduction to x64 Assembly | Intel® Software
↑ Intel(R) C++ Compiler User and Reference Guides covers Intrinsics
↑ Advanced Matrix Extension (AMX) - x86 - WikiChip
↑ Georg Hager's Blog | Random thoughts on High Performance Computing
↑ Intel Nehalem Core i3
↑ Application binary interface from Wikipedia

Up one Level

[1] Die shot of AMD Opteron quad-core processor, Wikimedia Commons

[2] Introduction to x64 Assembly | Intel® Software

[3] Intel(R) C++ Compiler User and Reference Guides covers Intrinsics

[4] Advanced Matrix Extension (AMX) - x86 - WikiChip

[5] Georg Hager's Blog | Random thoughts on High Performance Computing

[6] Intel Nehalem Core i3

[7] Application binary interface from Wikipedia

[1]

[2]

[3]

[4]

[5]

[6]

[7]

X86-64

Contents

Register File

General Purpose

MMX

SSE/SSE*

AVX, AVX2/XOP

AVX-512

Instructions

General Purpose

Bit-Manipulation

SSE2

Software

Operating Systems

Development

Assembly

C-Compiler

See also

Publications

Manuals

Agner Fog

AMD

Instructions

Optimization Guides

Intel

Instructions

Optimization Guides

Forum Posts

2003 ...

2010 ...

2015 ...

2020 ...

External Links

AMD

Intel

Instruction Sets

Security Vulnerability

References

Navigation menu

Search