Avoiding Branches
Home * Programming * Optimization * Avoiding Branches
Miss-predicted branches causes huge penalties on todays super pipelined processors. While processors become smarter to predict branches with several heuristics, branches on random data should be avoided. The techniques shown here often use arithmetical shift right (by bit-width - 1, that is 31 for 32-bit double words as integers) to determine a mask of sign-bits, either all bits set (-1) or all bits clear 0. x86 compiler may emit an cdq (Convert Double to Quad) instruction, which sign extends a 32 bit register to two 32 bit registers. Since arithmetical shift right is not strictly specified in C, it might be not portable through all compilers and architectures. Note that in C, a comparison or a boolean expression with the result {false, true} might be treated as numerical value {0, 1}.
Contents
Abs, Max, Min
It is recommend to use functions provided by the programming language. In C or C++ one should use appropriate compiler intrinsics and/or template functions provided by the C Runtime Library or Standard Template Library.
The tricks shown here, might be useful if compiler don't support those functions or don't generate the intended branchless assembly and the input is quite random, so that the branch prediction heuristics will fail often.
Absolute value of an Integer
Abs as C intrinsic [1] is likely implemented based on following code snippet ...
int abs(int a) { int s = a >> 31; // cdq, signed shift, -1 if negative, else 0 a ^= s; // ones' complement if negative a -= s; // plus one if negative -> two's complement if negative return a; }
... by compilers, on x86 a sequence three instructions: {cdq, xor, sub} or {cdq, add, xor}.
Maximum of two Integers
By CRT
Microsoft Visual C++ Run-Time Library provides a _max macro [2].
By Sign-Mask
Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: INT_MIN <= a - b <= INT_MAX: If a is greater b, a - 0 is returned, otherwise a - (a - b) == +b
int max(int a, int b) { int diff = a - b; int dsgn = diff >> 31; return a - (diff & dsgn); }
Minimum of two Integers
By CRT
Microsoft Visual C++ Run-Time Library provides a _min macro [3].
By Sign-Mask
Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: INT_MIN <= a - b <= INT_MAX: If a is greater b, b + 0 is returned, otherwise b + (a - b) == +a
int min(int a, int b) { int diff = a - b; int dsgn = diff >> 31; return b + (diff & dsgn); }
Conditional Expressions
Conditional Assignment
A conditional assignment in C or C++ may be implemented by compilers as x86 conditional move (cmovCC) instruction.
x = ( a > b ) ? C : D;
otherwise it might be reformulated with conditional increment:
x = D; if ( a > b ) x += C - D;
Conditional Increment
If a > b is hard to predict,
if ( a > b ) x += C;
it might be reformulated branch-less in C, which likely emits a x86 setCC instruction:
x += -( a > b ) & C; // with any boolean expression
With a reduced value range and INT_MIN <= b - a <= INT_MAX, greater and less relations might be implemented using a sign mask:
x += (( b - a ) >> 31) & C;
Conditional Write
During list generation, while conditionally writing data to an array with post-incrementing a pointer or index, one may try to avoid the conditional branch by storing always and to increment the pointer by the condition, which is either 0 or 1 [4] [5].
if (a > b) *ptr++ = value;
might be rewritten by
*ptr = value; ptr += (a > b);
Indirect Branch
Robert Hyatt on x86 Branch predictor, Branch target predictor, and Indirect branch in CCC [6]:
There are two parts to predicting a branch on x86. 1. Is the branch taken (for a call it is always "yes")? 2. Where is the branch going?
(2) is more interesting because when you fetch and then predict the branch, you don't have a clue where it is going since the register being used might not yet be ready for access. The solution is a "branch target buffer" which simply predicts the branch AND where it is going, based on the last time it was encountered. You can do a conditional jump to an indirect address and predict the jump correctly and miss the address (entire thing is then predicted wrong) or you can predict the address correctly and miss the jump (again, entire thing is wrong), or you can miss both. Only when you get both right do you have any success.
Your code always jumps to the same place, whether you use the explicit jump address, or the indirect address through a register. When you get into a call where the address changes, performance will drop. Your code really is not testing that at all...
See also
Forum Posts
2000 ...
- branch misprediction by Eric Oldre, CCC, July 14, 2004
- Re: Fruit 2.0 Toga : Recapture extension by Gerd Isenberg, CCC, March 19, 2005
2010 ...
- Re: Function pointers hurt performance? by Robert Hyatt, CCC, September 22, 2011
- Branch-poor looping by Harm Geert Muller, CCC, September 02, 2015
- Mispredicted branch VS cache miss by Natale Galioto, CCC, September 09, 2015
- Tipical cache and branch misses for a chess engine by Nicu Ionita, CCC, September 14, 2016 » Memory, Profiling
- Misprediction-poor looping by Harm Geert Muller, CCC, December 09, 2019
2020 ...
- A Neat Trick by Michael Sherwin, CCC, April 11, 2020
External Links
- Compute the integer absolute value (abs) without branching by Sean Eron Anderson
- Compute the minimum (min) or maximum (max) of two integers without branching by Sean Eron Anderson
- Programming Optimization by Paul Hsieh
- Avoiding the Cost of Branch Misprediction - Intel® Software Network by Rajiv Kapoor, February 20, 2009
- Branch (computer science) from Wikipedia
- Branch table
- Indirect branch
- Conditional (programming)
- Branch predictor
- Branch target predictor
- Defunkt - Avoid The Funk, Live at Drom, April 14, 2010, YouTube Video
- lineup: Joe Bowie, Ronny Drayton, Bill Bickford, Kim Clarke, John Mulkerin, Kenny Martin
References
- ↑ abs, labs, llabs, _abs64 Visual C++ Developer Center - Run-Time Library Reference
- ↑ _max Visual C++ Developer Center - Run-Time Library Reference
- ↑ _min Visual C++ Developer Center - Run-Time Library Reference
- ↑ Software Optimization Guide for AMD Family 10h and 12h Processors (pdf) see pp. 102 on Conditional Write
- ↑ Write-combining from Wikipedia
- ↑ Re: Function pointers hurt performance? by Robert Hyatt, CCC, September 22, 2011