NUMA
Home * Hardware * Memory * NUMA
NUMA, (Non-uniform memory access)
a multiprocessing memory design where the main memory is partitioned between processors. Opposed to SMP, where all processors compete for access to the centralized shared memory bus, making it difficult to scale well bejoind 8 to 12 CPUs [2], NUMA splits the main memory into so called nodes with separate memory busses for subsets of processors, and high speed interconnection between nodes, either directly in so called 1-hop distance, or indirectly in 2-hop distance. Despite the high speed interconnection, NUMA memory access time varies considerably between faster local memory and remote memory of other nodes. Maintaining cache coherence of processor caches adds significant overhead to NUMA Systems, addressed by ccNUMA which is mostly used synonymous for current NUMA implementations [3].
x86
AMD implemented NUMA with its Opteron processor in 2003, using HyperTransport. Intel announced NUMA compatibility for their x86 servers in late 2007 with Nehalem CPUs using QuickPath Interconnect [4].
Considerations
Scheduling of threads across nodes and cores of a system is a complicated topic due to access of independent or shared data. There are several considerations in ccNUMA aware operating systems and software, such as keeping data local by virtue of first touch [5] [6]. NUMA and processor affinity APIs help application programmers to bind threads or processes to NUMA nodes or to allocate memory from a certain node.
See also
Selected Publications
- Andi Kleen (2004). An NUMA API for Linux. SUSE Labs, pdf
- Ulrich Drepper (2007). What Every Programmer Should Know About Memory. pdf, also hosted by LWN.net
- Nakul Manchanda, Karan Anand (2010). Non-Uniform Memory Access (NUMA). New York University, pdf
- Stefan Lankes, Thomas Roehl, Christian Terboven, Thomas Bemmerl (2012). Node-Based Memory Management for Scalable NUMA Architectures. RWTH Aachen, ROSS 2012, slides as pdf
- Georg Hager [7], Jan Treibig, Gerhard Wellein (2013). The Practitioner's Cookbook for Good Parallel Performance on Multi- and Many-Core Systems. RRZE, SC13, slides as pdf
- Rik van Riel, Vinod Chegu (2014). Automatic NUMA Balancing. Red Hat Summit 2014, slides as pdf, video lecture by Rik van Riel
Forum Posts
2000 ...
- DTS NUMA by Vincent Diepeveen, CCC, September 03, 2002 » Dynamic Tree Splitting
- What's the difference between NUMA, SMP and MPI for chess? by Joachim Rang, CCC, April 15, 2004 » SMP
- Opteron NUMA/SMP question by Matthew Hull, CCC, February 09, 2005
2010 ...
- optimizing performance on dual Xeon systems (NUMA) by Jon Dart, CCC, February 28, 2013
- Smp concepts by Michael Hoffmann, CCC, June 01, 2014 » SMP
2015 ...
- NUMA-awareness by Louis Zulli, CCC, February 25, 2015
- thread affinity by Martin Sedlak, CCC, July 03, 2015 » Thread
- Re: thread affinity by Robert Hyatt, CCC, July 03, 2015
- Actual speedups from YBWC and ABDADA on 8+ core machines? by Tom Kerrigan, CCC, July 10, 2015 » Young Brothers Wait Concept, ABDADA
- NUMA 101 by Robert Hyatt, CCC, January 07, 2016 » Crafty
- NUMA in a YBWC implementation by Edsel Apostol, CCC, July 20, 2016 » Young Brothers Wait Concept
- lets get the ball moving down the field on numa awareness by Mohammed Li, FishCooking, August 30, 2016
- search thread memory allocation (NUMA) by Ronald de Man, FishCooking, September 06, 2016
- What do you do with NUMA? by Matthew Lai, CCC, September 19, 2016
- NUMA test compilation by Joachim Müller, FishCooking, November 05, 2016 » Stockfish
- What Linux compatible Numa aware engines are available? by Dann Corbit, CCC, March 29, 2017 » Linux
External Links
- Non-Uniform Memory Access (NUMA) from Wikipedia
- NUMA Frequently Asked Questions
- Multiprocessing - OSDev Wiki
- ccNUMA machines in Aad J. van der Steen, Jack J. Dongarra (2004). Overview of Recent Supercomputers.
Linux
- numa(7) - Linux manual page
- A NUMA API for Linux (pdf, April 2015)
Windows
- Allocating Memory from a NUMA Node, MSDN
- NUMA Support (Windows), MSDN
- Processor Groups (Windows), MSDN
x86
References
- ↑ One possible architecture of a NUMA system. Originally created in Visio 2010, cleaned up with Inkscape, by Moop2000, October 4, 2010, Wikimedia Commons
- ↑ NUMA Frequently Asked Questions - 9. Why should I use NUMA? What are the benefits of NUMA?
- ↑ NUMA Frequently Asked Questions - 4. What is the difference between NUMA and ccNUMA?
- ↑ Non-Uniform Memory Access (NUMA) from Wikipedia
- ↑ Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ ccNUMA Multiprocessor Systems (pdf) - 3.2.1 Keeping Data Local by Virtue of first Touch, pp. 22
- ↑ Re: thread affinity by Robert Hyatt, CCC, July 03, 2015
- ↑ Georg Hager's Blog | Random thoughts on High Performance Computing