NUMA
Home * Hardware * Memory * NUMA
NUMA, (Non-uniform memory access)
a multiprocessing memory design where the main memory is partitioned between processors. Opposed to SMP, where all processors compete for access to the centralized shared memory bus, making it difficult to scale well bejoind 8 to 12 CPUs [2], NUMA splits the main memory into so called nodes with separate memory busses for subsets of processors, and high speed interconnection between nodes, either directly in so called 1-hop distance, or indirectly in 2-hop distance. Despite the high speed interconnection, NUMA memory access time varies considerably between faster local memory and remote memory of other nodes. Maintaining cache coherence of processor caches adds significant overhead to NUMA Systems, addressed by ccNUMA which is mostly used synonymous for current NUMA implementations [3].
Contents
x86
AMD implemented NUMA with its Opteron processor in 2003, using HyperTransport. Intel announced NUMA compatibility for their x86 servers in late 2007 with Nehalem CPUs using QuickPath Interconnect [4].
Considerations
Scheduling of threads across nodes and cores of a system is a complicated topic due to access of independent or shared data. There are several considerations in ccNUMA aware operating systems and software, such as keeping data local by virtue of first touch [5] [6]. NUMA and processor affinity APIs help application programmers to bind threads or processes to NUMA nodes or to allocate memory from a certain node.
See also
Selected Publications
1998 ...
- Ante Grbić, Stephen Brown, Steve Caranci, Robin Grindley, Mitchell Gusat, Guy Lemieux, K. Loveless, Naraig Manjikian, Sinisa Srbljic, Michael Stumm, Zvonko Vranesic, Zeljko Zilic (1998). Design and Implementation of the NUMAchine Multiprocessor. DAC 1998, pdf [7]
2000 ...
- Robin Grindley, Tarek Abdelrahman, Stephen Brown, Steve Caranci, D. DeVries, Benjamin Gamsa, Ante Grbić, Mitchell Gusat, R. Ho, Orran Krieger, Guy Lemieux, K. Loveless, Naraig Manjikian, P. McHardy, Sinisa Srbljic, Michael Stumm, Zvonko Vranesic, Zeljko Zilic (2000). The NUMAchine Multiprocessor. ICPP 2000, pdf
- Andi Kleen (2004). An NUMA API for Linux. SUSE Labs, pdf
- Ulrich Drepper (2007). What Every Programmer Should Know About Memory. pdf, also hosted by LWN.net
2010 ...
- Nakul Manchanda, Karan Anand (2010). Non-Uniform Memory Access (NUMA). New York University, pdf
- Stefan Lankes, Thomas Roehl, Christian Terboven, Thomas Bemmerl (2012). Node-Based Memory Management for Scalable NUMA Architectures. RWTH Aachen, ROSS 2012, slides as pdf
- Georg Hager [8], Jan Treibig, Gerhard Wellein (2013). The Practitioner's Cookbook for Good Parallel Performance on Multi- and Many-Core Systems. RRZE, SC13, slides as pdf
- Rik van Riel, Vinod Chegu (2014). Automatic NUMA Balancing. Red Hat Summit 2014, slides as pdf, video lecture by Rik van Riel
- Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos K. Aguilera (2017). Black-box Concurrent Data Structures for NUMA Architectures. ACM SIGPLAN Notices, Vol. 52, No. 4, pdf
- Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos K. Aguilera (2017). How to implement any concurrent data structure for modern servers. ACM SIGOPS, Vol. 51, No. 1
- Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos K. Aguilera (2018). How to implement any concurrent data structure. Communications of the ACM, Vol. 61, No. 12
Forum Posts
2000 ...
- DTS NUMA by Vincent Diepeveen, CCC, September 03, 2002 » Dynamic Tree Splitting
- What's the difference between NUMA, SMP and MPI for chess? by Joachim Rang, CCC, April 15, 2004 » SMP
- Opteron NUMA/SMP question by Matthew Hull, CCC, February 09, 2005
2010 ...
- optimizing performance on dual Xeon systems (NUMA) by Jon Dart, CCC, February 28, 2013
- Smp concepts by Michael Hoffmann, CCC, June 01, 2014 » SMP
2015 ...
- NUMA-awareness by Louis Zulli, CCC, February 25, 2015
- thread affinity by Martin Sedlak, CCC, July 03, 2015 » Thread
- Re: thread affinity by Robert Hyatt, CCC, July 03, 2015
- Actual speedups from YBWC and ABDADA on 8+ core machines? by Tom Kerrigan, CCC, July 10, 2015 » Young Brothers Wait Concept, ABDADA
- NUMA 101 by Robert Hyatt, CCC, January 07, 2016 » Crafty
- NUMA in a YBWC implementation by Edsel Apostol, CCC, July 20, 2016 » Young Brothers Wait Concept
- lets get the ball moving down the field on numa awareness by Mohammed Li, FishCooking, August 30, 2016
- search thread memory allocation (NUMA) by Ronald de Man, FishCooking, September 06, 2016
- What do you do with NUMA? by Matthew Lai, CCC, September 19, 2016
- NUMA test compilation by Joachim Müller, FishCooking, November 05, 2016 » Stockfish
- What Linux compatible Numa aware engines are available? by Dann Corbit, CCC, March 29, 2017 » Linux
- Ethereal 10.88 NUMA by Norman Schmidt, CCC, August 24, 2018 » Ethereal
- Some NUMA data for Stockfish-dev and Cfish-dev by Louis Zulli, CCC, June 17, 2019 » Stockfish, CFish
- NUMA by lucasart, CCC, December 30, 2019
External Links
- Non-Uniform Memory Access (NUMA) from Wikipedia
- NUMA Frequently Asked Questions
- Multiprocessing - OSDev Wiki
- ccNUMA machines in Aad J. van der Steen, Jack Dongarra (2004). Overview of Recent Supercomputers.
Linux
- numa(7) - Linux manual page
- A NUMA API for Linux (pdf, April 2015)
Windows
- Allocating Memory from a NUMA Node, MSDN
- NUMA Support (Windows), MSDN
- Processor Groups (Windows), MSDN
x86
- Optimizing Applications for NUMA | Intel® Developer Zone
- Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ ccNUMA Multiprocessor Systems (pdf)
Misc
- The Who - Magic Bus, Live at Leeds (1970), YouTube Video
References
- ↑ One possible architecture of a NUMA system. Originally created in Visio 2010, cleaned up with Inkscape, by Moop2000, October 4, 2010, Wikimedia Commons
- ↑ NUMA Frequently Asked Questions - 9. Why should I use NUMA? What are the benefits of NUMA?
- ↑ NUMA Frequently Asked Questions - 4. What is the difference between NUMA and ccNUMA?
- ↑ Non-Uniform Memory Access (NUMA) from Wikipedia
- ↑ Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ ccNUMA Multiprocessor Systems (pdf) - 3.2.1 Keeping Data Local by Virtue of first Touch, pp. 22
- ↑ Re: thread affinity by Robert Hyatt, CCC, July 03, 2015
- ↑ Documentation on the NUMAchine Multiprocessor
- ↑ Georg Hager's Blog | Random thoughts on High Performance Computing