B*

Home * Search * B*

Moonlit Beauties [1]

B*,
a best-first search algorithm that finds the least-cost path from the root node to any goal node out of one or more possible goals. B* was first published by Hans Berliner in 1977 [2], it is related to the A* search algorithm.

Underlying Idea

There are things that must be found and things that must be known

The value of a position, as calculated by the evaluation function, can’t be trusted unless it is confirmed by some kind of search. That is the “Raison d’être” of the quiescence search. If one replaces the quiescence by a full search with depth one, one gets a better value of this position. The deeper the search, the more reliable the evaluation. The reversal of this argument is to search further as long one can’t trust the backed up evaluation. This is the underlying idea of B*.

Multiple Values

The first approach of this idea is to use three values to describe a position. First the classical static evaluation, and an upper and lower value of what can be achieved, a pessimistic and an optimistic value of the position. The closer these values converge the static evaluation, the better. But they are dependent - the pessimistic value for one player is the optimistic value of the opponent player.

Once separated the "risk factor" from the static evaluation, one can use the evaluation to drive the search. This first trial of B* was not successful. The "risk factor" was not well measured by a static function. The step was to go for a specific search for more reliable values.

Threats

One further idea is to consider threats to drive the search! Resulting from careful thought, one can measure an upper bound of a deep search by a null move search. You play, your opponent pass and you play again, and now do a normal search (alpha-beta or PVS). This is the threat detection used by Andrew James Palay on PB* [3] . The two values to drive the search are:

1. the evaluation issued from a standard search
2. and an optimistic value from a null move search

Probability based Search

The next major step in the development of PB* is not to use the raw optimistic value to drive the search, but a probability to get a target value. If you select the target value, as the value that will force a change in the best move at root, the probability to get the target value fit the probability that this move will be better than the one selected at root.

Once you evaluate a leaf node you propagate the value and the probability to the root node. As the search evolves, the target value change and the probability also evolve. When there is no hope to get the target, one can terminate the search. Now we can define two logical modes to select a move: The move with best real value and the move with best probability to get the target value (optimism). These two modes must be used alternatively by the two players; otherwise you just get no sense. (You reply to a sac with another sac and then another sac...).

Two Step Search

As we must take this alternate choice into account for both players, we are forced to implement a two phases search. In the first stage we put to use the optimism for the player (Select phase). In the second stage, the opponent exert his optimism (verify phase). In the select phase, the player tries to raise the value for moves not in the principal variation (PV). In the verify phase, the opponent tries to pull down the value of the principal variation (below the second best).

Comparison: B* versus PVS

In the principal variation search (PVS) we can also consider two steps. In the first, you try to exact the value of your principal variation doing a search with an open window. In the second step you just try to demonstrate that nothing is better doing a search with a closed window. In B* these steps are somehow inverted. The first select-phase tries to demonstrate there is nothing better than the PV. The second verify-phase tries to exact the principal variation.

Rethinking PB*

With a lot of abstraction we can think that B* is a search algorithm where some part of the tree search is saved in memory and another part is volatile and done on the fly. Tree values are propagated the “normal” way, which is following minimax. The selectivity is done on the tree in memory.

PB* is a minimax on top of standard alpha-beta or PVS. The rest of the algorithm is just the selectivity added to minimax to choose the node that deserve deeper search. If we disregard the selectivity and explore the tree on a systematic approach we get a brute force algorithm.

Quality of Evaluation

As PB is an algorithm driven by values, the question that emerges is how good this PB*'s evaluation should be? The two values that are used in PB* meet different tasks. The real value determine which move is considered better.

The optimistic value defines the probability that this move is better, and drives the search. The better the optimistic value, the more selective is the search. PB* is based largely in the fact that we can assess, with some reliability, the maximum/minimum value that the position can reach. The criterion of goodness of the optimistic value is the percentage of evaluation (real value) that are inside the optimistic pessimist range. As the evaluation is done by a PVS search,the deeper the PVS search the more readily we meet the criterion with closer values and better selectivity is achieved. A balance between the cost of each expands (depth of PVS evaluation) and the number of expansions needed (selectivity) must be adjusted. If we add a bonus in proportion to the risk taken on each hand (just to fit the criterion), we improve in tactical test set. But this reduce selectivity of the search as we encourage the search of sacs.

Somehow we split the dynamic factors of the position and the more long term (trustable) of the evaluation. The dynamics factors can be explored with a deeper PVS search or with some bonus/penalty on the optimistic evaluation for some tactical weakness (Berliner). The bonus approach get good performance on tactical set, but the deep search get better general result. End game positions require more in-depth search. A minimal number of expands is also required, this is why a fixed nodes approach for each PVS probe search get better overall performance.

Performance Considerations

Suppose there is a PB* program of a similar performance to PVS. For PB* the total number of PVS probe search is:

```R * Moves * Expands * PVS(ProbeDepth)
```

with

```Moves = average moves in the position.
R = ratio (average probes per moves from 1 to 2 ) ~1.5
Expands = number of expands
```

For a standard PVS:

```(BranchFactor ^ PlyEquiv) * PVS(ProbeDepth)
```

The comparison becomes:

```R * Moves * Expands <=> BranchFactor ^ PlyEquiv
```

A large branching factor favours PB*. PB* is weaker in endgame position (low branching factor). PVS improves with BF and PB* reducing the number of expands needed (selectivity). Berliner reported that PB* (with 225 node expansions on average) get a similar performance than alpha-beta with a branching factor of 4.5

Implementations

B* and later PB* was implemented in HiTech. B*HiTech played the ACM 1993. C source code for first own experiments with B* Probability Search was posted by Antonio Torrecillas in CCC [4], further implemented in his experimental open source engine Rocinante written in C++ [5].