MMX is a SIMD (Single instruction, multiple data) instruction set of x86 processors, starting in 1996 with Intel's Pentium MMX. In 1998, AMD enhanced Intel's MMX with the 3DNow! extension, mostly related to the Float data type. MMX instructions are available through Assembly language, inline assembly and C-Compiler intrinsics along with the _m64 intrinsic data type ^[2] .

Register File

MMX uses eight 64-bit registers MM0 through MM7, treated each as vector of eight bytes, four words, two double words or one quad word. The eight registers were aliased for the existing x87 FPU stack registers, and are therefor implicitly saved and restored during context switch in existing operating systems. The drawback is, it is somewhat difficult to work with x87 floating point and MMX data in the same application, since the original emms-instruction to switch the register file was relatively slow.

MMX and 64-bit Windows

Since 64-bit Windows applications merely use SSE for floating point arithmetic, there was some early confusion whether MMX/x87 registers are safe to use due to context switching. Quote from Agner Fog's Calling conventions manual: ^[3]

6.1 Can floating point registers be used in 64-bit Windows?

There has been widespread confusion about whether 64-bit Windows allows the use of the floating point registers ST(0)-ST(7) and the MM0 - MM7 registers that are aliased upon these. One early technical document found at Microsoft's website says x87/MMX registers are unavailable to Native Windows64 applications" (Rich Brunner: Technical Details Of Microsoft® Windows® For The AMD64 Platform, Dec. 2003). An AMD document says: "64-bit Microsoft Windows does not strongly support MMX and 3Dnow! instruction sets in the 64-bit native mode" (Porting and Optimizing Multimedia Codecs for AMD64 architecture on Microsoft® Windows®, July 21, 2004). A document in Microsoft's MSDN says: "A caller must also handle the following issues when calling a callee: [...] Legacy Floating-Point Support: The MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are volatile. That is, these legacy floating-point stack registers do not have their state preserved across context switches" (MSDN: Kernel-Mode Driver Architecture: Windows DDK: Other Calling Convention Process Issues. Preliminary, June 14, 2004; February 18, 2005).

This description is nonsense because it confuses saving registers across function calls and saving registers across context switches. Some versions of the Microsoft assembler ml64 (e.g. v. 8.00.40310) gives the following message when attempts are made to use floating point registers in 64 bit mode: "error A2222: x87 and MMX instructions disallowed; legacy  FP state not saved in Win64". However, a public discussion forum quotes the following answers from Microsoft engineers regarding this issue: "From: Program Manager in Visual C++ Group, Sent: Thursday, May 26, 2005 10:38 AM. It does preserve the state. It's the DDK page that has stale information, which I've requested it to be changed. Let them know that the OS does preserve state of x87 and MMX registers on context switches." and "From: Software Engineer in Windows Kernel Group, Sent: Thursday, May 26, 2005 11:06 AM. For user threads the state of legacy floating point is preserved at context switch. But it is not true for kernel threads. Kernel mode drivers can not use legacy floating point instructions."

The issue has finally been resolved with the long overdue publication of a more detailed ABI for x64 Windows in the form of a document entitled "x64 Software Conventions", well hidden in the bin directory (not the help directory) of some compiler packages. This document says: "The MMX and floating-point stack registers (MM0-MM7/ST0-ST7) are preserved across context switches. There is no explicit calling convention for these registers. The use of these registers is strictly prohibited in kernel mode code." The same text has later appeared at the Microsoft website ^[4].

Applications

Almost the same bitboard applications as mentioned in the SSE2 application samples are possible with MMX, despite with scalar bitboards rather than vector of two.

East Fill

For instance East Attacks based on SIMD-wise Fill by Subtraction.

__m64 eastAttacks (__m64 occ, __m64 rooks) {
   __m64 tmp;
   occ  = _mm_or_si64 (occ, rooks);  //  make rooks member of occupied
   tmp  = _mm_xor_si64(occ, rooks);  // occ - rooks
   tmp  = _mm_sub_pi8 (tmp, rooks);  // occ - 2*rooks
   return _mm_xor_si64(occ, tmp);    // occ ^ (occ - 2*rooks)
}

MMX Popcount

AMD's proposed Efficient 64-Bit Population Count using MMX, 3DNow! and inline assembly ^[5] :

#include "amd3d.h"

__declspec (naked) unsigned int __stdcall popcount64 (unsigned __int64 v)
{
   static const __int64 C55 = 0x5555555555555555;
   static const __int64 C33 = 0x3333333333333333;
   static const __int64 C0F = 0x0F0F0F0F0F0F0F0F;
   __asm {
      MOVD      MM0, [ESP+4] ;v_low
      PUNPCKLDQ MM0, [ESP+8] ;v
      MOVQ      MM1, MM0     ;v
      PSRLD     MM0, 1       ;v >> 1
      PAND      MM0, [C55]   ;(v >> 1) & 0x55555555
      PSUBD     MM1, MM0     ;w = v - ((v >> 1) & 0x55555555)
      MOVQ      MM0, MM1     ;w
      PSRLD     MM1, 2       ;w >> 2
      PAND      MM0, [C33]   ;w & 0x33333333
      PAND      MM1, [C33]   ;(w >> 2) & 0x33333333
      PADDD     MM0, MM1     ;x = (w & 0x33333333) + ((w >> 2) & 0x33333333)
      MOVQ      MM1, MM0     ;x
      PSRLD     MM0, 4       ;x >> 4
      PADDD     MM0, MM1     ;x + (x >> 4)
      PAND      MM0, [C0F]   ;y = (x + (x >> 4) & 0x0F0F0F0F)
      PXOR      MM1, MM1     ; 0
      PSADBW    MM0, MM1     ;sum across all 8 bytes
      MOVD      EAX, MM0     ;result in EAX per calling
      ; convention
      FEMMS ;clear MMX state
      RET 8 ;pop 8-byte argument off
   }
}

Manuals

Intel

Intel Architecture Software Developer’s Manual, Volume 1: Basic Architecture

AMD

Forum Posts

Using mmx instructions by Frans Morsch, comp.lang.asm.x86, February 03, 2000
Re: Atomic write of 64 bits by Frans Morsch, comp.lang.asm.x86, September 25, 2000
Re: Chezzz 1.0.1 - problem solved - for David Rasmussen by David Rasmussen, CCC, February 05, 2003 » Population Count, Chezzz

External Links

References

↑ Intel P5 (microarchitecture) from Wikipedia, Source: Sergei Frolov, Soviet Calculators Collection, September 2007
↑ MMX Technology Intrinsic Groups
↑ Calling conventions for different C++ compilers and operating systems (pdf) by Agner Fog
↑ Legacy Floating-Point Support (C++) from MSDN Library
↑ AMD Athlon Processor x86 Code Optimization Guide (pdf) Efficient 64-Bit Population Count Using MMX™ Instructions Page 184

Up one Level

[1] Intel P5 (microarchitecture) from Wikipedia, Source: Sergei Frolov, Soviet Calculators Collection, September 2007

[2] MMX Technology Intrinsic Groups

[3] Calling conventions for different C++ compilers and operating systems (pdf) by Agner Fog

[4] Legacy Floating-Point Support (C++) from MSDN Library

[5] AMD Athlon Processor x86 Code Optimization Guide (pdf) Efficient 64-Bit Population Count Using MMX™ Instructions Page 184

[1]

[2]

[3]

[4]

[5]

MMX

Contents

Register File

MMX and 64-bit Windows

Applications

East Fill

MMX Popcount

See also

Manuals

Intel

AMD

Forum Posts

External Links

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools