Nicolò Cesa-Bianchi

From Chessprogramming wiki
Jump to: navigation, search

Home * People * Nicolò Cesa-Bianchi

Nicolò Cesa-Bianchi [1]

Nicolò Cesa-Bianchi,
an Italian computer scientist and professor at department of computer science, University of Milan. His research interests include a wide range in the fields of machine learning and computational learning theory, such as reinforcement learning, game-theoretic learning, statistical learning theory, prediction with expert advice, and bandit problems. Along with Gábor Lugosi, he authored Prediction, Learning, and Games in 2006 [2].

Bandit Problems

In probability theory, the multi-armed bandit problem faces the tradeoff between exploitation of the slot machine that has the highest expected payoff and exploration to get more information about the expected payoffs of the other machines. The trade-off between exploration and exploitation is also topic in reinforcement learning [3]. The gambler has to decide at time steps t = 1, 2, ... which of the finitely many available arms to pull. Each arm produces a reward in a stochastic manner. The goal is to maximize the reward accumulated over time. In 2002, along with Peter Auer and Paul Fischer, Nicolò Cesa-Bianchi introduced the UCB1 (Upper Confidence Bounds) bandit algorithm [4], which was applied as selection algorithm UCT to Monte-Carlo Tree Search as elaborated by Levente Kocsis and Csaba Szepesvári in 2006 [5].

Selected Publications

[6] [7]

1990 ...

2000 ...

2010 ...

External Links

References

Up one Level