Home * Programming * Optimization * Avoiding Branches

Miss-predicted branches causes huge penalties on todays super pipelined processors. While processors become smarter to predict branches with several heuristics, branches on random data should be avoided. The techniques shown here often use arithmetical shift right (by bit-width - 1, that is 31 for 32-bit double words as integers) to determine a mask of sign-bits, either all bits set (-1) or all bits clear 0. x86 compiler may emit an cdq (Convert Double to Quad) instruction, which sign extends a 32 bit register to two 32 bit registers. Since arithmetical shift right is not strictly specified in C, it might be not portable through all compilers and architectures. Note that in C, a comparison or a boolean expression with the result {false, true} might be treated as numerical value {0, 1}.

Abs, Max, Min

It is recommend to use functions provided by the programming language. In C or C++ one should use appropriate compiler intrinsics and/or template functions provided by the C Runtime Library or Standard Template Library.

The tricks shown here, might be useful if compiler don't support those functions or don't generate the intended branchless assembly and the input is quite random, so that the branch prediction heuristics will fail often.

Absolute value of an Integer

Abs as C intrinsic ^[1] is likely implemented based on following code snippet ...

int abs(int a) {
   int s = a >> 31; // cdq, signed shift, -1 if negative, else 0
   a ^= s;  // ones' complement if negative
   a -= s;  // plus one if negative -> two's complement if negative
   return a;
}

... by compilers, on x86 a sequence three instructions: {cdq, xor, sub} or {cdq, add, xor}.

Maximum of two Integers

By CRT

Microsoft Visual C++ Run-Time Library provides a _max macro ^[2].

By Sign-Mask

Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: INT_MIN <= a - b <= INT_MAX: If a is greater b, a - 0 is returned, otherwise a - (a - b) == +b

int max(int a, int b) {
  int diff = a - b;
  int dsgn = diff >> 31;
  return a - (diff & dsgn);
}

Minimum of two Integers

By CRT

Microsoft Visual C++ Run-Time Library provides a _min macro ^[3].

By Sign-Mask

Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: INT_MIN <= a - b <= INT_MAX: If a is greater b, b + 0 is returned, otherwise b + (a - b) == +a

int min(int a, int b) {
  int diff = a - b;
  int dsgn = diff >> 31;
  return b + (diff & dsgn);
}

Conditional Expressions

Conditional Assignment

A conditional assignment in C or C++ may be implemented by compilers as x86 conditional move (cmovCC) instruction.

x = ( a > b ) ? C : D;

otherwise it might be reformulated with conditional increment:

x = D;
if ( a > b ) x += C - D;

Conditional Increment

If a > b is hard to predict,

if ( a > b ) x += C;

it might be reformulated branch-less in C, which likely emits a x86 setCC instruction:

x += -( a > b ) & C; // with any boolean expression

With a reduced value range and INT_MIN <= b - a <= INT_MAX, greater and less relations might be implemented using a sign mask:

x += (( b - a ) >> 31) & C;

Conditional Write

During list generation, while conditionally writing data to an array with post-incrementing a pointer or index, one may try to avoid the conditional branch by storing always and to increment the pointer by the condition, which is either 0 or 1 ^[4] ^[5].

if (a > b)
  *ptr++ = value;

might be rewritten by

  *ptr = value;
  ptr += (a > b);

Indirect Branch

Robert Hyatt on x86 Branch predictor, Branch target predictor, and Indirect branch in CCC ^[6]:

There are two parts to predicting a branch on x86. 1. Is the branch taken (for a call it is always "yes")? 2. Where is the branch going?

(2) is more interesting because when you fetch and then predict the branch, you don't have a clue where it is going since the register being used might not yet be ready for access. The solution is a "branch target buffer" which simply predicts the branch AND where it is going, based on the last time it was encountered. You can do a conditional jump to an indirect address and predict the jump correctly and miss the address (entire thing is then predicted wrong) or you can predict the address correctly and miss the jump (again, entire thing is wrong), or you can miss both. Only when you get both right do you have any success.

Your code always jumps to the same place, whether you use the explicit jump address, or the indirect address through a register. When you get into a call where the address changes, performance will drop. Your code really is not testing that at all...

Forum Posts

2000 ...

branch misprediction by Eric Oldre, CCC, July 14, 2004
Re: Fruit 2.0 Toga : Recapture extension by Gerd Isenberg, CCC, March 19, 2005

2010 ...

Re: Function pointers hurt performance? by Robert Hyatt, CCC, September 22, 2011
Branch-poor looping by Harm Geert Muller, CCC, September 02, 2015
Mispredicted branch VS cache miss by Natale Galioto, CCC, September 09, 2015
Tipical cache and branch misses for a chess engine by Nicu Ionita, CCC, September 14, 2016 » Memory, Profiling
Misprediction-poor looping by Harm Geert Muller, CCC, December 09, 2019

2020 ...

A Neat Trick by Michael Sherwin, CCC, April 11, 2020

External Links

Compute the integer absolute value (abs) without branching by Sean Eron Anderson
Compute the minimum (min) or maximum (max) of two integers without branching by Sean Eron Anderson
Programming Optimization by Paul Hsieh
Avoiding the Cost of Branch Misprediction - Intel® Software Network by Rajiv Kapoor, February 20, 2009
Branch (computer science) from Wikipedia
Branch table
Indirect branch
Conditional (programming)
Branch predictor
Branch target predictor
Defunkt - Avoid The Funk, Live at Drom, April 14, 2010, YouTube Video

lineup: Joe Bowie, Ronny Drayton, Bill Bickford, Kim Clarke, John Mulkerin, Kenny Martin

References

↑ abs, labs, llabs, _abs64 Visual C++ Developer Center - Run-Time Library Reference
↑ _max Visual C++ Developer Center - Run-Time Library Reference
↑ _min Visual C++ Developer Center - Run-Time Library Reference
↑ Software Optimization Guide for AMD Family 10h and 12h Processors (pdf) see pp. 102 on Conditional Write
↑ Write-combining from Wikipedia
↑ Re: Function pointers hurt performance? by Robert Hyatt, CCC, September 22, 2011

Up one Level

[1] s, labs, llabs, _abs64 Visual C++ Developer Center - Run-Time Library Reference

[2] _max Visual C++ Developer Center - Run-Time Library Reference

[3] _min Visual C++ Developer Center - Run-Time Library Reference

[4] Software Optimization Guide for AMD Family 10h and 12h Processors (pdf) see pp. 102 on Conditional Write

[5] Write-combining from Wikipedia

[6] Re: Function pointers hurt performance? by Robert Hyatt, CCC, September 22, 2011

[1]

[2]

[3]

[4]

[5]

[6]

Avoiding Branches

Contents

Abs, Max, Min

Absolute value of an Integer

Maximum of two Integers

By CRT

By Sign-Mask

Minimum of two Integers

By CRT

By Sign-Mask

Conditional Expressions

Conditional Assignment

Conditional Increment

Conditional Write

Indirect Branch

See also

Forum Posts

2000 ...

2010 ...

2020 ...

External Links

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools