MMX

Home * Hardware * x86 * MMX



MMX is a SIMD (Single instruction, multiple data) instruction set of x86 processors, starting in 1996 with Intel's Pentium MMX. In 1998, AMD enhanced Intel's MMX with the 3DNow! extension, mostly related to the Float data type. MMX instructions are available through Assembly language, inline assembly and C-Compiler intrinsics along with the _m64 intrinsic data type.

=Register File= MMX uses eight 64-bit registers MM0 through MM7, treated each as vector of eight bytes, four words, two double words or one quad word. The eight registers were aliased for the existing x87 FPU stack registers, and are therefor implicitly saved and restored during context switch in existing operating systems. The drawback is, it is somewhat difficult to work with x87 floating point and MMX data in the same application, since the original emms-instruction to switch the register file was relatively slow.

=MMX and 64-bit Windows= Since 64-bit Windows applications merely use SSE for floating point arithmetic, there was some early confusion whether MMX/x87 registers are safe to use due to context switching. Quote from Agner Fog's Calling conventions manual:

6.1 Can floating point registers be used in 64-bit Windows?

There has been widespread confusion about whether 64-bit Windows allows the use of the floating point registers ST(0)-ST(7) and the MM0 - MM7 registers that are aliased upon these. One early technical document found at Microsoft's website says x87/MMX registers are unavailable to Native Windows64 applications" (Rich Brunner: Technical Details Of Microsoft® Windows® For The AMD64 Platform, Dec. 2003). An AMD document says: "64-bit Microsoft Windows does not strongly support MMX and 3Dnow! instruction sets in the 64-bit native mode" (Porting and Optimizing Multimedia Codecs for AMD64 architecture on Microsoft® Windows®, July 21, 2004). A document in Microsoft's MSDN says: "A caller must also handle the following issues when calling a callee: [...] Legacy Floating-Point Support: The MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are volatile. That is, these legacy floating-point stack registers do not have their state preserved across context switches" (MSDN: Kernel-Mode Driver Architecture: Windows DDK: Other Calling Convention Process Issues. Preliminary, June 14, 2004; February 18, 2005).

This description is nonsense because it confuses saving registers across function calls and saving registers across context switches. Some versions of the Microsoft assembler ml64 (e.g. v. 8.00.40310) gives the following message when attempts are made to use floating point registers in 64 bit mode: "error A2222: x87 and MMX instructions disallowed; legacy FP state not saved in Win64". However, a public discussion forum quotes the following answers from Microsoft engineers regarding this issue: "From: Program Manager in Visual C++ Group, Sent: Thursday, May 26, 2005 10:38 AM. It does preserve the state. It's the DDK page that has stale information, which I've requested it to be changed. Let them know that the OS does preserve state of x87 and MMX registers on context switches." and "From: Software Engineer in Windows Kernel Group, Sent: Thursday, May 26, 2005 11:06 AM. For user threads the state of legacy floating point is preserved at context switch. But it is not true for kernel threads. Kernel mode drivers can not use legacy floating point instructions."

The issue has finally been resolved with the long overdue publication of a more detailed ABI for x64 Windows in the form of a document entitled "x64 Software Conventions", well hidden in the bin directory (not the help directory) of some compiler packages. This document says: "The MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are preserved across context switches. There is no explicit calling convention for these registers. The use of these registers is strictly prohibited in kernel mode code." The same text has later appeared at the Microsoft website.

=Applications= Almost the same bitboard applications as mentioned in the SSE2 application samples are possible with MMX, despite with scalar bitboards rather than vector of two.

East Fill
For instance East Attacks based on SIMD-wise Fill by Subtraction. __m64 eastAttacks (__m64 occ, __m64 rooks) { __m64 tmp; occ = _mm_or_si64 (occ, rooks);  //  make rooks member of occupied tmp = _mm_xor_si64(occ, rooks);  // occ - rooks tmp = _mm_sub_pi8 (tmp, rooks);  // occ - 2*rooks return _mm_xor_si64(occ, tmp);   // occ ^ (occ - 2*rooks) }

MMX Popcount
AMD's proposed Efficient 64-Bit Population Count using MMX, 3DNow! and inline assembly :


 * 1) include "amd3d.h"

__declspec (naked) unsigned int __stdcall popcount64 (unsigned __int64 v) { static const __int64 C55 = 0x5555555555555555; static const __int64 C33 = 0x3333333333333333; static const __int64 C0F = 0x0F0F0F0F0F0F0F0F; __asm { MOVD     MM0, [ESP+4] ;v_low PUNPCKLDQ MM0, [ESP+8] ;v MOVQ     MM1, MM0     ;v PSRLD    MM0, 1       ;v >> 1 PAND     MM0, [C55]   ;(v >> 1) & 0x55555555 PSUBD    MM1, MM0     ;w = v - ((v >> 1) & 0x55555555) MOVQ     MM0, MM1     ;w PSRLD    MM1, 2       ;w >> 2 PAND     MM0, [C33]   ;w & 0x33333333 PAND     MM1, [C33]   ;(w >> 2) & 0x33333333 PADDD    MM0, MM1     ;x = (w & 0x33333333) + ((w >> 2) & 0x33333333) MOVQ     MM1, MM0     ;x PSRLD    MM0, 4       ;x >> 4 PADDD    MM0, MM1     ;x + (x >> 4) PAND     MM0, [C0F]   ;y = (x + (x >> 4) & 0x0F0F0F0F) PXOR     MM1, MM1     ; 0 PSADBW   MM0, MM1     ;sum across all 8 bytes MOVD     EAX, MM0     ;result in EAX per calling ; convention FEMMS ;clear MMX state RET 8 ;pop 8-byte argument off } }

=See also=
 * AltiVec
 * SIMD and SWAR Techniques
 * SSE2

=Manuals=

Intel

 * Intel Architecture Software Developer’s Manual, Volume 1: Basic Architecture

AMD

 * AMD Athlon Processor x86 Code Optimization Guide (pdf)
 * 3DNow! Technology Manual (pdf)
 * AMD Extensions to the 3DNow! and MMX Instruction Sets Manual (pdf)
 * AMD64 Architecture Volume 5: 64-Bit Media and x87 Floating-Point Instructions (pdf)

=Forum Posts=
 * Using mmx instructions by Frans Morsch, comp.lang.asm.x86, February 03, 2000
 * Re: Atomic write of 64 bits by Frans Morsch, comp.lang.asm.x86, September 25, 2000
 * Re: Chezzz 1.0.1 - problem solved - for David Rasmussen by David Rasmussen, CCC, February 05, 2003 » Population Count, Chezzz

=External Links=
 * MMX (instruction set) from Wikipedia
 * 3DNow!
 * Intel Intrinsics Guide

=References=

Up one Level