Dynamic Tree Splitting

From Chessprogramming wiki
Revision as of 10:50, 17 August 2018 by GerdIsenberg (talk | contribs)
Jump to: navigation, search

Home * Search * Parallel Search * Dynamic Tree Splitting

Splitted tree [1]

Dynamic Tree Splitting, (DTS)
a peer-to-peer model of parallelism which divides the search tree among several processors on a shared memory parallel machine, devised by Robert Hyatt [2] , implemented in Cray Blitz and tested on a 16 processor Cray C916/1024 with 1024 mebiwords of memory (8 gibibytes).DTS overcomes one problem of principle variation splitting (PVS) and enhanced principle variation splitting (EPVS) [3] with bushy trees, that processors assigned to a node become and stay idle while other processors are still busy on the same node. In DTS, busy processors publish their state of the tree in shared memory for idle processors to examine, to let them decide which (if any) of the busy processors to join, and to inform a busy processor of the chosen expceted all-node or pv-node [4] , dubbed split point. The busy processor with that subtree in progress in turn, then copies the complete tree state with current board position, bounds, move list, repetition list and so forth to a shared memory area, becoming a so called "split block". Owner and helper processors can now extract moves from this shared data to search in parallel [5] .

Performance Test

Rather than to use a group of unrelated chess problems, the CB team took a segment of a real chess game from the ACM 1993 M-Chess Pro vs. Cray Blitz, a sharp Vienna game aka delayed kings gambit accepted after CB was out of book after 1. e4 e5 2. Nc3 Nc6 3. f4 exf4 4. Nf3 g5 5. d4 g4 6. Bc4 gxf3 7. o-o d5 8. exd5 Bg4 9. Qd2, 24 positions in total. The results include two sets of numbers for each test, where a test used either one, two, four, eight or sixteen processors. The numbers are the clock time required for the search (wall clock time) and the number of nodes searched which shows how much extra work is added by using additional processors [6] .

DTS Speedups

DTS Cray Blitz speedups on Cray C916/1024 over 24 test positions [7]

#proc 1 2 4 8 16
pos sec nodes speedup sec nodes speedup sec nodes speedup sec nodes speedup sec nodes speedup
1 2,830 87,735,974 1 1,415 89,052,012 2.0 832 105,025,123 3.4 435 109,467,495 6.5 311 155,514,410 9.1
2 2,849 88,954,757 1 1,424 90,289,077 2.0 791 100,568,301 3.6 438 110,988,161 6.5 274 137,965,406 10.4
3 3,274 101,302,792 1 1,637 102,822,332 2.0 884 111,433,074 3.7 467 117,366,515 7.0 239 119,271,093 13.7
4 2,308 71,726,853 1 1,154 72,802,754 2.0 591 74,853,409 3.9 349 88,137,085 6.6 208 104,230,094 11.1
5 1,584 49,386,616 1 792 50,127,414 2.0 440 55,834,316 3.6 243 61,619,298 6.5 178 89,506,306 8.9
6 4,294 133,238,718 1 2,147 135,237,296 2.0 1,160 146,562,594 3.7 670 168,838,428 6.4 452 226,225,307 9.5
7 1,888 58,593,747 1 993 62,602,792 1.9 524 66,243,490 3.6 273 68,868,878 6.9 187 93,575,946 10.1
8 7,275 225,906,282 1 3,637 229,294,872 2.0 1,966 248,496,917 3.7 1,039 261,728,552 7.0 680 340,548,431 10.7
9 3,940 122,264,617 1 1,970 124,098,584 2.0 1,094 138,226,951 3.6 635 159,930,005 6.2 398 199,204,874 9.9
10 2,431 75,301,353 1 1,215 76,430,872 2.0 639 80,651,716 3.8 333 83,656,702 7.3 187 93,431,597 13.0
11 3,062 95,321,494 1 1,531 96,751,315 2.0 827 104,853,646 3.7 425 107,369,070 7.2 247 123,994,812 12.4
12 2,518 79,975,416 1 1,325 85,447,418 1.9 662 85,657,884 3.8 364 94,000,085 6.9 219 112,174,209 11.5
13 2,131 66,100,160 1 1,121 70,622,802 1.9 560 70,796,754 3.8 313 78,834,155 6.8 192 96,053,649 11.1
14 1,871 58,099,574 1 935 58,971,066 2.0 534 67,561,507 3.5 296 74,791,668 6.3 191 95,627,150 9.8
15 2,648 84,143,340 1 1,324 85,405,488 2.0 715 92,557,676 3.7 378 97,486,065 7.0 243 124,516,703 10.9
16 2,347 75,738,094 1 1,235 80,920,173 1.9 601 79,039,499 3.9 321 84,141,904 7.3 182 94,701,972 12.9
17 4,884 154,901,225 1 2,872 184,970,278 1.7 1,878 242,480,013 2.6 1,085 279,166,418 4.5 814 416,426,105 6.0
18 646 20,266,629 1 358 22,856,254 1.8 222 28,443,165 2.9 124 31,608,146 5.2 84 42,454,639 7.7
19 2,983 93,858,903 1 1,491 95,266,785 2.0 785 100,527,830 3.8 426 108,742,238 7.0 226 114,692,731 13.2
20 7,473 231,206,390 1 3,736 234,674,482 2.0 1,916 241,284,621 3.9 1,083 271,751,263 6.9 530 264,493,531 14.1
21 3,626 112,457,464 1 1,813 114,144,324 2.0 906 114,425,474 4.0 489 123,247,294 7.4 237 118,558,091 15.3
22 2,560 81,302,340 1 1,347 86,865,131 1.9 691 89,432,576 3.7 412 106,348,704 6.2 264 135,196,568 9.7
23 2,039 63,598,940 1 1,019 64,552,923 2.0 536 68,117,815 3.8 323 81,871,010 6.3 206 103,621,303 9.9
24 2,563 80,413,971 1 1,281 81,620,179 2.0 657 83,919,196 3.9 337 85,810,169 7.6 178 90,074,814 14.4
avg 1 2.0 3.7 6.6 11.1

Despite Robert Hyatt mentioned DTS really will only scale as far as a shared memory architecture scales, and any delays (such as those in some architectures that use a hierarchical memory organization with varying delays depending on how far the memory is from the requesting processor) will certainly have an adverse effect on DTS, the published speedups were heavily criticized by Vincent Diepeveen in 2002 [8] .

Comparison

Average Cray Blitz speedups on Cray C916/1024 over 24 test positions

#Procs
Algorithm
1 2 4 8 16
PVS 1.0 1.8 3.0 4.1 4.6
EPVS 1.0 1.9 3.4 5.4 6.0
DTS 1.0 2.0 3.7 6.6 11.1

Recursive DTS

While searching various subtrees with different ply levels from the root was quite straighforward to implement in Cray Blitz' iterative search written in Fortran, a recursive Negamax would have made it much more difficult, since the call stack is inaccessible to the search code and moving the tree state around would have been more difficult [9] . A recursive DTS-like algorithm was proposed by Peter Österlund in 2013 [10] , as implemented in Texel.

See also

Publications

Forum Posts

DTS NUMA by Vincent Diepeveen, CCC, September 03, 2002 » NUMA

External Links

References

Up one level