NUMA

Home * Hardware * Memory * NUMA



NUMA, (Non-uniform memory access) a multiprocessing memory design where the main memory is partitioned between processors. Opposed to SMP, where all processors compete for access to the centralized shared memory bus, making it difficult to scale well bejoind 8 to 12 CPUs, NUMA splits the main memory into so called nodes with separate memory busses for subsets of processors, and high speed interconnection between nodes, either directly in so called 1-hop distance, or indirectly in 2-hop distance. Despite the high speed interconnection, NUMA memory access time varies considerably between faster local memory and remote memory of other nodes. Maintaining cache coherence of processor caches adds significant overhead to NUMA Systems, addressed by ccNUMA which is mostly used synonymous for current NUMA implementations.

=x86= AMD implemented NUMA with its Opteron processor in 2003, using HyperTransport. Intel announced NUMA compatibility for their x86 servers in late 2007 with Nehalem CPUs using QuickPath Interconnect.

=Considerations= Scheduling of threads across nodes and cores of a system is a complicated topic due to access of independent or shared data. There are several considerations in ccNUMA aware operating systems and software, such as keeping data local by virtue of first touch. NUMA and processor affinity APIs help application programmers to bind threads or processes to NUMA nodes or to allocate memory from a certain node.

=See also=
 * Optimization
 * Parallel Search
 * SMP
 * Thread

=Selected Publications=

1998 ...

 * Ante Grbić, Stephen Brown, Steve Caranci, Robin Grindley, Mitchell Gusat, Guy Lemieux, K. Loveless, Naraig Manjikian, Sinisa Srbljic, Michael Stumm, Zvonko Vranesic, Zeljko Zilic (1998). Design and Implementation of the NUMAchine Multiprocessor. DAC 1998, pdf

2000 ...

 * Robin Grindley, Tarek Abdelrahman, Stephen Brown, Steve Caranci, D. DeVries, Benjamin Gamsa, Ante Grbić, Mitchell Gusat, R. Ho, Orran Krieger, Guy Lemieux, K. Loveless, Naraig Manjikian, P. McHardy, Sinisa Srbljic, Michael Stumm, Zvonko Vranesic, Zeljko Zilic (2000). The NUMAchine Multiprocessor. ICPP 2000, pdf
 * Andi Kleen (2004). An NUMA API for Linux. SUSE Labs, pdf
 * Ulrich Drepper (2007). What Every Programmer Should Know About Memory. pdf, also hosted by LWN.net
 * Memory part 1
 * Memory part 2: CPU caches
 * Memory part 3: Virtual Memory


 * Memory part 4: NUMA support
 * Memory part 5: What programmers can do

2010 ...

 * Nakul Manchanda, Karan Anand (2010). Non-Uniform Memory Access (NUMA). New York University, pdf
 * Stefan Lankes, Thomas Roehl, Christian Terboven, Thomas Bemmerl (2012). Node-Based Memory Management for Scalable NUMA Architectures. RWTH Aachen, ROSS 2012, slides as pdf
 * Georg Hager, Jan Treibig, Gerhard Wellein (2013). The Practitioner's Cookbook for Good Parallel Performance on Multi- and Many-Core Systems. RRZE, SC13, slides as pdf
 * Rik van Riel, Vinod Chegu (2014). Automatic NUMA Balancing. Red Hat Summit 2014, slides as pdf, video lecture by Rik van Riel

=Forum Posts=

2000 ...

 * DTS NUMA by Vincent Diepeveen, CCC, September 03, 2002 » Dynamic Tree Splitting
 * What's the difference between NUMA, SMP and MPI for chess? by Joachim Rang, CCC, April 15, 2004 » SMP
 * Opteron NUMA/SMP question by Matthew Hull, CCC, February 09, 2005

2010 ...

 * optimizing performance on dual Xeon systems (NUMA) by Jon Dart, CCC, February 28, 2013
 * Smp concepts by Michael Hoffmann, CCC, June 01, 2014 » SMP

2015 ...

 * NUMA-awareness by Louis Zulli, CCC, February 25, 2015
 * thread affinity by Martin Sedlak, CCC, July 03, 2015 » Thread
 * Re: thread affinity by Robert Hyatt, CCC, July 03, 2015


 * Actual speedups from YBWC and ABDADA on 8+ core machines? by Tom Kerrigan, CCC, July 10, 2015 » Young Brothers Wait Concept, ABDADA
 * NUMA 101 by Robert Hyatt, CCC, January 07, 2016 » Crafty
 * NUMA in a YBWC implementation by Edsel Apostol, CCC, July 20, 2016 » Young Brothers Wait Concept
 * lets get the ball moving down the field on numa awareness by Mohammed Li, FishCooking, August 30, 2016
 * search thread memory allocation (NUMA) by Ronald de Man, FishCooking, September 06, 2016
 * What do you do with NUMA? by Matthew Lai, CCC, September 19, 2016
 * NUMA test compilation by Joachim Müller, FishCooking, November 05, 2016 » Stockfish
 * What Linux compatible Numa aware engines are available? by Dann Corbit, CCC, March 29, 2017 » Linux
 * Ethereal 10.88 NUMA by Norman Schmidt, CCC, August 24, 2018 » Ethereal
 * Some NUMA data for Stockfish-dev and Cfish-dev by Louis Zulli, CCC, June 17, 2019 » Stockfish, CFish

=External Links=
 * Non-Uniform Memory Access (NUMA) from Wikipedia
 * NUMA Frequently Asked Questions
 * Multiprocessing - OSDev Wiki
 * ccNUMA machines in Aad J. van der Steen, Jack Dongarra (2004). Overview of Recent Supercomputers.

Linux

 * numa(7) - Linux manual page
 * A NUMA API for Linux (pdf, April 2015)

Windows

 * Allocating Memory from a NUMA Node, MSDN
 * NUMA Support (Windows), MSDN
 * Processor Groups (Windows), MSDN

x86

 * Optimizing Applications for NUMA | Intel® Developer Zone
 * Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ ccNUMA Multiprocessor Systems (pdf)

=References=

Up one Level