Revision as of 23:52, 8 December 2020

Possible NUMA system ^[1]

NUMA, (Non-uniform memory access)
a multiprocessing memory design where the main memory is partitioned between processors. Opposed to SMP, where all processors compete for access to the centralized shared memory bus, making it difficult to scale well bejoind 8 to 12 CPUs ^[2], NUMA splits the main memory into so called nodes with separate memory busses for subsets of processors, and high speed interconnection between nodes, either directly in so called 1-hop distance, or indirectly in 2-hop distance. Despite the high speed interconnection, NUMA memory access time varies considerably between faster local memory and remote memory of other nodes. Maintaining cache coherence of processor caches adds significant overhead to NUMA Systems, addressed by ccNUMA which is mostly used synonymous for current NUMA implementations ^[3].

x86

AMD implemented NUMA with its Opteron processor in 2003, using HyperTransport. Intel announced NUMA compatibility for their x86 servers in late 2007 with Nehalem CPUs using QuickPath Interconnect ^[4].

Considerations

Scheduling of threads across nodes and cores of a system is a complicated topic due to access of independent or shared data. There are several considerations in ccNUMA aware operating systems and software, such as keeping data local by virtue of first touch ^[5] ^[6]. NUMA and processor affinity APIs help application programmers to bind threads or processes to NUMA nodes or to allocate memory from a certain node.

Selected Publications

1998 ...

Ante Grbić, Stephen Brown, Steve Caranci, Robin Grindley, Mitchell Gusat, Guy Lemieux, K. Loveless, Naraig Manjikian, Sinisa Srbljic, Michael Stumm, Zvonko Vranesic, Zeljko Zilic (1998). Design and Implementation of the NUMAchine Multiprocessor. DAC 1998, pdf ^[7]

2000 ...

Robin Grindley, Tarek Abdelrahman, Stephen Brown, Steve Caranci, D. DeVries, Benjamin Gamsa, Ante Grbić, Mitchell Gusat, R. Ho, Orran Krieger, Guy Lemieux, K. Loveless, Naraig Manjikian, P. McHardy, Sinisa Srbljic, Michael Stumm, Zvonko Vranesic, Zeljko Zilic (2000). The NUMAchine Multiprocessor. ICPP 2000, pdf
Andi Kleen (2004). An NUMA API for Linux. SUSE Labs, pdf
Ulrich Drepper (2007). What Every Programmer Should Know About Memory. pdf, also hosted by LWN.net

Memory part 1

Memory part 2: CPU caches

Memory part 3: Virtual Memory

Memory part 4: NUMA support

Memory part 5: What programmers can do

2010 ...

Nakul Manchanda, Karan Anand (2010). Non-Uniform Memory Access (NUMA). New York University, pdf
Stefan Lankes, Thomas Roehl, Christian Terboven, Thomas Bemmerl (2012). Node-Based Memory Management for Scalable NUMA Architectures. RWTH Aachen, ROSS 2012, slides as pdf
Georg Hager ^[8], Jan Treibig, Gerhard Wellein (2013). The Practitioner's Cookbook for Good Parallel Performance on Multi- and Many-Core Systems. RRZE, SC13, slides as pdf
Rik van Riel, Vinod Chegu (2014). Automatic NUMA Balancing. Red Hat Summit 2014, slides as pdf, video lecture by Rik van Riel
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos K. Aguilera: (2017). Black-box Concurrent Data Structures for NUMA Architectures. ACM SIGPLAN Notices, Vol. 52, No. 4, pdf
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, Marcos K. Aguilera: (2017). How to implement any concurrent data structure for modern servers. ACM SIGOPS, Vol. 51, No. 1

Forum Posts

2000 ...

DTS NUMA by Vincent Diepeveen, CCC, September 03, 2002 » Dynamic Tree Splitting
What's the difference between NUMA, SMP and MPI for chess? by Joachim Rang, CCC, April 15, 2004 » SMP
Opteron NUMA/SMP question by Matthew Hull, CCC, February 09, 2005

2010 ...

optimizing performance on dual Xeon systems (NUMA) by Jon Dart, CCC, February 28, 2013
Smp concepts by Michael Hoffmann, CCC, June 01, 2014 » SMP

2015 ...

NUMA-awareness by Louis Zulli, CCC, February 25, 2015
thread affinity by Martin Sedlak, CCC, July 03, 2015 » Thread

Re: thread affinity by Robert Hyatt, CCC, July 03, 2015

Actual speedups from YBWC and ABDADA on 8+ core machines? by Tom Kerrigan, CCC, July 10, 2015 » Young Brothers Wait Concept, ABDADA
NUMA 101 by Robert Hyatt, CCC, January 07, 2016 » Crafty
NUMA in a YBWC implementation by Edsel Apostol, CCC, July 20, 2016 » Young Brothers Wait Concept
lets get the ball moving down the field on numa awareness by Mohammed Li, FishCooking, August 30, 2016
search thread memory allocation (NUMA) by Ronald de Man, FishCooking, September 06, 2016
What do you do with NUMA? by Matthew Lai, CCC, September 19, 2016
NUMA test compilation by Joachim Müller, FishCooking, November 05, 2016 » Stockfish
What Linux compatible Numa aware engines are available? by Dann Corbit, CCC, March 29, 2017 » Linux
Ethereal 10.88 NUMA by Norman Schmidt, CCC, August 24, 2018 » Ethereal
Some NUMA data for Stockfish-dev and Cfish-dev by Louis Zulli, CCC, June 17, 2019 » Stockfish, CFish
NUMA by lucasart, CCC, December 30, 2019

External Links

Linux

numa(7) - Linux manual page
A NUMA API for Linux (pdf, April 2015)

Windows

x86

Misc

The Who - Magic Bus, Live at Leeds (1970), YouTube Video

References

↑ One possible architecture of a NUMA system. Originally created in Visio 2010, cleaned up with Inkscape, by Moop2000, October 4, 2010, Wikimedia Commons
↑ NUMA Frequently Asked Questions - 9. Why should I use NUMA? What are the benefits of NUMA?
↑ NUMA Frequently Asked Questions - 4. What is the difference between NUMA and ccNUMA?
↑ Non-Uniform Memory Access (NUMA) from Wikipedia
↑ Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ ccNUMA Multiprocessor Systems (pdf) - 3.2.1 Keeping Data Local by Virtue of first Touch, pp. 22
↑ Re: thread affinity by Robert Hyatt, CCC, July 03, 2015
↑ Documentation on the NUMAchine Multiprocessor
↑ Georg Hager's Blog | Random thoughts on High Performance Computing

Up one Level

[1] One possible architecture of a NUMA system. Originally created in Visio 2010, cleaned up with Inkscape, by Moop2000, October 4, 2010, Wikimedia Commons

[2] NUMA Frequently Asked Questions - 9. Why should I use NUMA? What are the benefits of NUMA?

[3] NUMA Frequently Asked Questions - 4. What is the difference between NUMA and ccNUMA?

[4] Non-Uniform Memory Access (NUMA) from Wikipedia

[5] Performance Guidelines for AMD Athlon™ 64 and AMD Opteron™ ccNUMA Multiprocessor Systems (pdf) - 3.2.1 Keeping Data Local by Virtue of first Touch, pp. 22

[6] Re: thread affinity by Robert Hyatt, CCC, July 03, 2015

[7] Documentation on the NUMAchine Multiprocessor

[8] Georg Hager's Blog | Random thoughts on High Performance Computing

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

@@ Line 35: / Line 35: @@
 * [https://www.rrze.fau.de/wir-ueber-uns/organigramm/mitarbeiter/index.shtml/georg-hager.shtml Georg Hager] <ref>[https://blogs.fau.de/hager/ Georg Hager's Blog | Random thoughts on High Performance Computing]</ref>, [http://dblp.uni-trier.de/pers/hd/t/Treibig:Jan Jan Treibig], [http://dblp.uni-trier.de/pers/hd/w/Wellein:Gerhard Gerhard Wellein] ('''2013'''). ''The Practitioner's Cookbook for Good Parallel Performance on Multi- and Many-Core Systems''. [https://de.wikipedia.org/wiki/Regionales_Rechenzentrum_Erlangen RRZE], [http://sc13.supercomputing.org/ SC13], [https://blogs.fau.de/hager/files/2013/11/sc13_tutorial_134.pdf slides as pdf]
 * [https://www.linkedin.com/in/rikvanriel Rik van Riel], [https://www.linkedin.com/in/chegu Vinod Chegu] ('''2014'''). ''Automatic NUMA Balancing''. [https://www.redhat.com/en/about/press-releases/red-hat-announces-dates-for-red-hat-summit-2014-in-san-francisco Red Hat Summit 2014], [http://events.linuxfoundation.org/sites/events/files/slides/summit2014_riel_chegu_w_0340_automatic_numa_balancing_0.pdf slides as pdf], [https://youtu.be/mjVw_oe1hEA video lecture] by Rik van Riel
+* [https://scholar.google.com/citations?user=5aWoGywAAAAJ&hl=en Irina Calciu], [[Siddhartha Sen]], [[Mathematician#MBalakrishnan|Mahesh Balakrishnan]], [[Mathematician#MKAguilera|Marcos K. Aguilera:]] ('''2017'''). ''[https://dl.acm.org/doi/10.1145/3093336.3037721 Black-box Concurrent Data Structures for NUMA Architectures]''. [[ACM#SIGPLAN|ACM SIGPLAN Notices]], Vol. 52, No. 4, [https://cs.brown.edu/people/irina/papers/asplos2017-final.pdf pdf]
+* [https://scholar.google.com/citations?user=5aWoGywAAAAJ&hl=en Irina Calciu], [[Siddhartha Sen]], [[Mathematician#MBalakrishnan|Mahesh Balakrishnan]], [[Mathematician#MKAguilera|Marcos K. Aguilera:]] ('''2017'''). ''[https://dl.acm.org/doi/10.1145/3139645.3139650 How to implement any concurrent data structure for modern servers]''. [[ACM#SIGOPS|ACM SIGOPS]], Vol. 51, No. 1
 =Forum Posts=

Difference between revisions of "NUMA"

Revision as of 23:52, 8 December 2020

Contents

x86

Considerations

See also

Selected Publications

1998 ...

2000 ...

2010 ...

Forum Posts

2000 ...

2010 ...

2015 ...

External Links

Linux

Windows

x86

Misc

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools