Changes

Jump to: navigation, search

Avoiding Branches

10,211 bytes added, 10:31, 7 December 2018
Created page with "'''Home * Programming * Optimization * Avoiding Branches''' Miss-predicted [https://en.wikipedia.org/wiki/Branch_%28computer_science%29 branches] causes..."
'''[[Main Page|Home]] * [[Programming]] * [[Optimization]] * Avoiding Branches'''

Miss-predicted [https://en.wikipedia.org/wiki/Branch_%28computer_science%29 branches] causes huge penalties on todays super [https://en.wikipedia.org/wiki/Instruction_pipeline pipelined] processors. While processors become smarter to predict branches with several heuristics, branches on random data should be avoided. The techniques shown here often use arithmetical shift right (by <span style="background-color: rgb(214, 214, 214)">bit-width - 1</span>, that is 31 for 32-bit [[Double Word|double words]] as integers) to determine a mask of sign-bits, either all bits set (-1) or all bits clear 0. [[x86]] compiler may emit an cdq (Convert Double to Quad) instruction, which sign extends a 32 bit register to two 32 bit registers. Since arithmetical shift right is not strictly specified in [[C]], it might be not portable through all compilers and architectures. Note that in C, a comparison or a boolean expression with the result <span style="background-color: rgb(214, 214, 214)">{false, true}</span> might be treated as numerical value <span style="background-color: rgb(214, 214, 214)">{0, 1}</span>.

=Abs, Max, Min=
It is recommend to use functions provided by the [[Languages|programming language]]. In [[C]] or [[Cpp|C++]] one should use appropriate compiler intrinsics and/or template functions provided by the [[C Runtime Library]] or [[Standard Template Library]].

The tricks shown here, might be useful if compiler don't support those functions or don't generate the intended branchless assembly and the input is quite random, so that the branch prediction heuristics will fail often.
<span id="Abs"></span>
==Absolute value of an Integer==
Abs as [[C]] intrinsic <ref>[https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/abs-labs-llabs-abs64?view=vs-2017 abs, labs, llabs, _abs64] Visual C++ Developer Center - Run-Time Library Reference</ref> is likely implemented based on following code snippet ...
<pre>
int abs(int a) {
int s = a >> 31; // cdq, signed shift, -1 if negative, else 0
a ^= s; // ones' complement if negative
a -= s; // plus one if negative -> two's complement if negative
return a;
}
</pre>
... by compilers, on [[x86]] a sequence three instructions: {cdq, xor, sub} or {cdq, add, xor}.

<span id="Max"></span>
==Maximum of two Integers==
===By CRT===
[https://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B Microsoft Visual C++] Run-Time Library provides a _max macro <ref>[https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/max?view=vs-2017 _max] Visual C++ Developer Center - Run-Time Library Reference</ref>.

===By Sign-Mask===
Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: <span style="background-color: rgb(214, 214, 214)">INT_MIN <= a - b <= INT_MAX</span>:
If a is greater b, <span style="background-color: rgb(214, 214, 214)">a - 0</span> is returned, otherwise <span style="background-color: rgb(214, 214, 214)">a - (a - b) == +b</span>
<pre>
int max(int a, int b) {
int diff = a - b;
int dsgn = diff >> 31;
return a - (diff & dsgn);
}
</pre>

<span id="Min"></span>
==Minimum of two Integers==
===By CRT===
[https://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B Microsoft Visual C++] Run-Time Library provides a _min macro <ref>[https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/min?view=vs-2017 _min] Visual C++ Developer Center - Run-Time Library Reference</ref>.

===By Sign-Mask===
Following trick only works for a reduced integer range of effectively one bit less, which is most often no problem for 32-bit integers in chess programs, like scores and that like: <span style="background-color: rgb(214, 214, 214)">INT_MIN <= a - b <= INT_MAX</span>:
If a is greater b, <span style="background-color: rgb(214, 214, 214)">b + 0</span> is returned, otherwise <span style="background-color: rgb(214, 214, 214)">b + (a - b) == +a</span>
<pre>
int min(int a, int b) {
int diff = a - b;
int dsgn = diff >> 31;
return b + (diff & dsgn);
}
</pre>

=Conditional Expressions=
==Conditional Assignment==
A conditional assignment in [[C]] or [[Cpp|C++]] may be implemented by compilers as [[x86]] conditional move (cmovCC) instruction.
<pre>
x = ( a > b ) ? C : D;
</pre>
otherwise it might be reformulated with conditional increment:
<pre>
x = D;
if ( a > b ) x += C - D;
</pre>

==Conditional Increment==
If <span style="background-color: rgb(214, 214, 214)">a > b</span> is hard to predict,
<pre>
if ( a > b ) x += C;
</pre>
it might be reformulated branch-less in [[C]], which likely emits a [[x86]] setCC instruction:
<pre>
x += -( a > b ) & C; // with any boolean expression
</pre>
With a reduced value range and <span style="background-color: rgb(214, 214, 214)">INT_MIN <= b - a <= INT_MAX</span>, greater and less relations might be implemented using a sign mask:
<pre>
x += (( b - a ) >> 31) & C;
</pre>
<span id="ConditionalWrite"></span>
==Conditional Write==
During list generation, while conditionally writing data to an [[array]] with post-incrementing a pointer or index, one may try to avoid the conditional branch by storing always and to increment the pointer by the condition, which is either 0 or 1 <ref>[https://support.amd.com/techdocs/40546.pdf Software Optimization Guide for AMD Family 10h and 12h Processors] (pdf) see pp. 102 on Conditional Write</ref> <ref>[https://en.wikipedia.org/wiki/Write-combining Write-combining from Wikipedia]</ref>.
<pre>
if (a > b)
*ptr++ = value;
</pre>
might be rewritten by
<pre>
*ptr = value;
ptr += (a > b);
</pre>

=Indirect Branch=
[[Robert Hyatt]] on [[x86]] [https://en.wikipedia.org/wiki/Branch_predictor Branch predictor], [https://en.wikipedia.org/wiki/Branch_target_predictor Branch target predictor], and [https://en.wikipedia.org/wiki/Indirect_branch Indirect branch] in [[CCC]] <ref>[http://www.talkchess.com/forum/viewtopic.php?p=425259#425259 Re: Function pointers hurt performance?] by [[Robert Hyatt]], [[CCC]], September 22, 2011</ref>:
There are two parts to predicting a branch on [[x86]]. 1. Is the branch taken (for a call it is always "yes")? 2. Where is the branch going?

(2) is more interesting because when you fetch and then predict the branch, you don't have a clue where it is going since the register being used might not yet be ready for access. The solution is a "branch target buffer" which simply predicts the branch AND where it is going, based on the last time it was encountered. You can do a conditional jump to an indirect address and predict the jump correctly and miss the address (entire thing is then predicted wrong) or you can predict the address correctly and miss the jump (again, entire thing is wrong), or you can miss both. Only when you get both right do you have any success.

Your code always jumps to the same place, whether you use the explicit jump address, or the indirect address through a register. When you get into a call where the address changes, performance will drop. Your code really is not testing that at all...

=See also=
* [[Bit-Twiddling]]
* [[DirGolem]]
* [[Profiling]]
* [[Table-driven Move Generation]]

=Forum Posts=
* [https://www.stmintz.com/ccc/index.php?id=376742 branch misprediction] by [[Eric Oldre]], [[CCC]], July 14, 2004
* [https://www.stmintz.com/ccc/index.php?id=417440 Re: Fruit 2.0 Toga : Recapture extension] by [[Gerd Isenberg]], [[CCC]], March 19, 2005
* [http://www.talkchess.com/forum/viewtopic.php?p=425259#425259 Re: Function pointers hurt performance?] by [[Robert Hyatt]], [[CCC]], September 22, 2011
* [http://www.talkchess.com/forum/viewtopic.php?t=57477 Branch-poor looping] by [[Harm Geert Muller]], [[CCC]], September 02, 2015
* [http://www.talkchess.com/forum/viewtopic.php?t=57577 Mispredicted branch VS cache miss] by [[Natale Galioto]], [[CCC]], September 09, 2015
* [http://www.talkchess.com/forum/viewtopic.php?t=61423 Tipical cache and branch misses for a chess engine] by [[Nicu Ionita]], [[CCC]], September 14, 2016 » [[Memory]], [[Profiling]]

=External Links=
* [http://graphics.stanford.edu/%7Eseander/bithacks.html#IntegerAbs Compute the integer absolute value (abs) without branching] by [http://graphics.stanford.edu/%7Eseander/ Sean Eron Anderson]
* [http://graphics.stanford.edu/%7Eseander/bithacks.html#IntegerMinOrMax Compute the minimum (min) or maximum (max) of two integers without branching] by [http://graphics.stanford.edu/%7Eseander/ Sean Eron Anderson]
* [http://www.azillionmonkeys.com/qed/optimize.html Programming Optimization] by [[Paul Hsieh]]
* [https://software.intel.com/en-us/articles/avoiding-the-cost-of-branch-misprediction/ Avoiding the Cost of Branch Misprediction - Intel® Software Network] by [https://patents.justia.com/inventor/rajiv-kapoor Rajiv Kapoor], February 20, 2009
* [https://en.wikipedia.org/wiki/Branch_%28computer_science%29 Branch (computer science) from Wikipedia]
* [https://en.wikipedia.org/wiki/Branch_table Branch table]
* [https://en.wikipedia.org/wiki/Indirect_branch Indirect branch]
* [https://en.wikipedia.org/wiki/Conditional_statement Conditional (programming)]
* [https://en.wikipedia.org/wiki/Branch_predictor Branch predictor]
* [https://en.wikipedia.org/wiki/Branch_target_predictor Branch target predictor]
* [[:Category:Defunkt|Defunkt]] - [http://www.allmusic.com/album/avoid-the-funk-a-defunkt-anthology-mw0000197423 Avoid The Funk], Live at [http://www.dromnyc.com/ Drom], April 14, 2010, [https://en.wikipedia.org/wiki/YouTube YouTube] Video
: lineup: [[:Category:Joseph Bowie|Joe Bowie]], [https://en.wikipedia.org/wiki/Ronny_Drayton Ronny Drayton], [https://de.wikipedia.org/wiki/Bill_Bickford Bill Bickford], [[:Category:Kim Clarke|Kim Clarke]], [http://johnmulkerin.com/John_Mulkerin/Main.html John Mulkerin], [http://meinlcymbals.com/artists/Artist/show/kenny-martin-488 Kenny Martin]
: {{#evu:https://www.youtube.com/watch?v=NmPUCNyCtpU|alignment=left|valignment=top}}

=References=
<references />
'''[[Optimization|Up one Level]]'''
[[Category:Defunkt]]
[[Category:Joseph Bowie]]
[[Category:Kim Clarke]]

Navigation menu