# Avoiding Branches

Home * Programming * Optimization * Avoiding Branches

Miss-predicted branches causes huge penalties on todays super pipelined processors. While processors become smarter to predict branches with several heuristics, branches on random data should be avoided. The techniques shown here often use arithmetical shift right (by bit-width - 1, that is 31 for 32-bit double words as integers) to determine a mask of sign-bits, either all bits set (-1) or all bits clear 0. x86 compiler may emit an cdq (Convert Double to Quad) instruction, which sign extends a 32 bit register to two 32 bit registers. Since arithmetical shift right is not strictly specified in C, it might be not portable through all compilers and architectures. Note that in C, a comparison or a boolean expression with the result {false, true} might be treated as numerical value {0, 1}.

# Abs, Max, Min

It is recommend to use functions provided by the programming language. In C or C++ one should use appropriate compiler intrinsics and/or template functions provided by the C Runtime Library or Standard Template Library.

The tricks shown here, might be useful if compiler don't support those functions or don't generate the intended branchless assembly and the input is quite random, so that the branch prediction heuristics will fail often.

## Absolute value of an Integer

Abs as C intrinsic  is likely implemented based on following code snippet ...

```int abs(int a) {
int s = a >> 31; // cdq, signed shift, -1 if negative, else 0
a ^= s;  // ones' complement if negative
a -= s;  // plus one if negative -> two's complement if negative
return a;
}
```

... by compilers, on x86 a sequence three instructions: {cdq, xor, sub} or {cdq, add, xor}.

## Maximum of two Integers

### By CRT

Microsoft Visual C++ Run-Time Library provides a _max macro .

Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: INT_MIN <= a - b <= INT_MAX: If a is greater b, a - 0 is returned, otherwise a - (a - b) == +b

```int max(int a, int b) {
int diff = a - b;
int dsgn = diff >> 31;
return a - (diff & dsgn);
}
```

## Minimum of two Integers

### By CRT

Microsoft Visual C++ Run-Time Library provides a _min macro .

Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: INT_MIN <= a - b <= INT_MAX: If a is greater b, b + 0 is returned, otherwise b + (a - b) == +a

```int min(int a, int b) {
int diff = a - b;
int dsgn = diff >> 31;
return b + (diff & dsgn);
}
```

# Conditional Expressions

## Conditional Assignment

A conditional assignment in C or C++ may be implemented by compilers as x86 conditional move (cmovCC) instruction.

```x = ( a > b ) ? C : D;
```

otherwise it might be reformulated with conditional increment:

```x = D;
if ( a > b ) x += C - D;
```

## Conditional Increment

If a > b is hard to predict,

```if ( a > b ) x += C;
```

it might be reformulated branch-less in C, which likely emits a x86 setCC instruction:

```x += -( a > b ) & C; // with any boolean expression
```

With a reduced value range and INT_MIN <= b - a <= INT_MAX, greater and less relations might be implemented using a sign mask:

```x += (( b - a ) >> 31) & C;
```

## Conditional Write

During list generation, while conditionally writing data to an array with post-incrementing a pointer or index, one may try to avoid the conditional branch by storing always and to increment the pointer by the condition, which is either 0 or 1  .

```if (a > b)
*ptr++ = value;
```

might be rewritten by

```  *ptr = value;
ptr += (a > b);
```

# Indirect Branch

```There are two parts to predicting a branch on x86. 1. Is the branch taken (for a call it is always "yes")? 2. Where is the branch going?
```
```(2) is more interesting because when you fetch and then predict the branch, you don't have a clue where it is going since the register being used might not yet be ready for access. The solution is a "branch target buffer" which simply predicts the branch AND where it is going, based on the last time it was encountered. You can do a conditional jump to an indirect address and predict the jump correctly and miss the address (entire thing is then predicted wrong) or you can predict the address correctly and miss the jump (again, entire thing is wrong), or you can miss both. Only when you get both right do you have any success.
```
```Your code always jumps to the same place, whether you use the explicit jump address, or the indirect address through a register. When you get into a call where the address changes, performance will drop. Your code really is not testing that at all...
```