NUMA

From Chessprogramming wiki

Jump to: navigation, search

Home * Hardware * Memory * NUMA

Possible NUMA system ^[1]

NUMA, (Non-uniform memory access)
a multiprocessing memory design where the main memory is partitioned between processors. Opposed to SMP, where all processors compete for access to the centralized shared memory bus, making it difficult to scale well bejoind 8 to 12 CPUs ^[2], NUMA splits the main memory into so called nodes with separate memory busses for subsets of processors, and high speed interconnection between nodes, either directly in so called 1-hop distance, or indirectly in 2-hop distance. Despite the high speed interconnection, NUMA memory access time varies considerably between faster local memory and remote memory of other nodes. Maintaining cache coherence of processor caches adds significant overhead to NUMA Systems, addressed by ccNUMA which is mostly used synonymous for current NUMA implementations ^[3].

Contents

1 x86
2 Considerations
3 See also
4 Selected Publications
5 Forum Posts
6 External Links
7 References

x86

AMD implemented NUMA with its Opteron processor in 2003, using HyperTransport. Intel announced NUMA compatibility for their x86 servers in late 2007 with Nehalem CPUs using QuickPath Interconnect ^[4].

Considerations

Scheduling of threads across nodes and cores of a system is a complicated topic due to access of independent or shared data. There are several considerations in ccNUMA aware operating systems and software, such as keeping data local by virtue of first touch ^[5] ^[6]. NUMA and processor affinity APIs help application programmers to bind threads or processes to NUMA nodes or to allocate memory from a certain node.

See also

Selected Publications

1998 ...

Ante Grbić, Stephen Brown, Steve Caranci, Robin Grindley, Mitchell Gusat, Guy Lemieux, K. Loveless, Naraig Manjikian, Sinisa Srbljic, Michael Stumm, Zvonko Vranesic, Zeljko Zilic (1998). Design and Implementation of the NUMAchine Multiprocessor. DAC 1998, pdf ^[7]

2000 ...

Robin Grindley, Tarek Abdelrahman, Stephen Brown, Steve Caranci, D. DeVries, Benjamin Gamsa, Ante Grbić, Mitchell Gusat, R. Ho, Orran Krieger, Guy Lemieux, K. Loveless, Naraig Manjikian, P. McHardy, Sinisa Srbljic, Michael Stumm, Zvonko Vranesic, Zeljko Zilic (2000). The NUMAchine Multiprocessor. ICPP 2000, pdf
Andi Kleen (2004). An NUMA API for Linux. SUSE Labs, pdf
Ulrich Drepper (2007). What Every Programmer Should Know About Memory. pdf, also hosted by LWN.net

Memory part 2: CPU caches

Memory part 3: Virtual Memory

Memory part 4: NUMA support

Memory part 5: What programmers can do

2010 ...

Nakul Manchanda, Karan Anand (2010). Non-Uniform Memory Access (NUMA). New York University, pdf
Stefan Lankes, Thomas Roehl, Christian Terboven, Thomas Bemmerl (2012). Node-Based Memory Management for Scalable NUMA Architectures. RWTH Aachen, ROSS 2012, slides as pdf
Georg Hager ^[8], Jan Treibig, Gerhard Wellein (2013). The Practitioner's Cookbook for Good Parallel Performance on Multi- and Many-Core Systems. RRZE, SC13, slides as pdf
Rik van Riel, Vinod Chegu (2014). Automatic NUMA Balancing. Red Hat Summit 2014, slides as pdf, video lecture by Rik van Riel
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos K. Aguilera (2017). Black-box Concurrent Data Structures for NUMA Architectures. ACM SIGPLAN Notices, Vol. 52, No. 4, pdf
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos K. Aguilera (2017). How to implement any concurrent data structure for modern servers. ACM SIGOPS, Vol. 51, No. 1
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos K. Aguilera (2018). How to implement any concurrent data structure. Communications of the ACM, Vol. 61, No. 12

Forum Posts

2000 ...

DTS NUMA by Vincent Diepeveen, CCC, September 03, 2002 » Dynamic Tree Splitting
What's the difference between NUMA, SMP and MPI for chess? by Joachim Rang, CCC, April 15, 2004 » SMP
Opteron NUMA/SMP question by Matthew Hull, CCC, February 09, 2005

2010 ...

optimizing performance on dual Xeon systems (NUMA) by Jon Dart, CCC, February 28, 2013
Smp concepts by Michael Hoffmann, CCC, June 01, 2014 » SMP

2015 ...

NUMA-awareness by Louis Zulli, CCC, February 25, 2015
thread affinity by Martin Sedlak, CCC, July 03, 2015 » Thread

Re: thread affinity by Robert Hyatt, CCC, July 03, 2015

Actual speedups from YBWC and ABDADA on 8+ core machines? by Tom Kerrigan, CCC, July 10, 2015 » Young Brothers Wait Concept, ABDADA
NUMA 101 by Robert Hyatt, CCC, January 07, 2016 » Crafty
NUMA in a YBWC implementation by Edsel Apostol, CCC, July 20, 2016 » Young Brothers Wait Concept
lets get the ball moving down the field on numa awareness by Mohammed Li, FishCooking, August 30, 2016
search thread memory allocation (NUMA) by Ronald de Man, FishCooking, September 06, 2016
What do you do with NUMA? by Matthew Lai, CCC, September 19, 2016
NUMA test compilation by Joachim Müller, FishCooking, November 05, 2016 » Stockfish
What Linux compatible Numa aware engines are available? by Dann Corbit, CCC, March 29, 2017 » Linux
Ethereal 10.88 NUMA by Norman Schmidt, CCC, August 24, 2018 » Ethereal
Some NUMA data for Stockfish-dev and Cfish-dev by Louis Zulli, CCC, June 17, 2019 » Stockfish, CFish
NUMA by lucasart, CCC, December 30, 2019

External Links

Linux

numa(7) - Linux manual page
A NUMA API for Linux (pdf, April 2015)

Windows

x86

Misc

The Who - Magic Bus, Live at Leeds (1970), YouTube Video

References

↑ One possible architecture of a NUMA system. Originally created in Visio 2010, cleaned up with Inkscape, by Moop2000, October 4, 2010, Wikimedia Commons
↑ NUMA Frequently Asked Questions - 9. Why should I use NUMA? What are the benefits of NUMA?
↑ NUMA Frequently Asked Questions - 4. What is the difference between NUMA and ccNUMA?
↑ Non-Uniform Memory Access (NUMA) from Wikipedia
↑ Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ ccNUMA Multiprocessor Systems (pdf) - 3.2.1 Keeping Data Local by Virtue of first Touch, pp. 22
↑ Re: thread affinity by Robert Hyatt, CCC, July 03, 2015
↑ Documentation on the NUMAchine Multiprocessor
↑ Georg Hager's Blog | Random thoughts on High Performance Computing

Up one Level

Retrieved from "https://www.chessprogramming.org/index.php?title=NUMA&oldid=22882"

The Who