Dynamic resource allocation as an estimation problem <br />An agent sequentially chooses actions in a set of possible options. Each action leads to a stochastic reward, whose distribution is unknown. What dynamic allocation rule should he choose so as to maximize his cumulated reward? <br />The study of "bandit problems" (the word refers to the paradigmatic situation of a gambler facing a row of slot-machines and deciding which one to choose in order to maximize his rewards), dating back to the 1930s, was originally motivated by medical trials. In has recently raised a renewed interest because of computer-driven applications, from computer experiments to recommender systems and Big Data. <br />One possible solution, called UCB, relies on upper confidence bounds of the expected rewards associated to all actions. In this presentation, I shall explain why fine confidence bounds are required in order to obtain efficient allocation rules, and I shall present the statistical challenges involved in their construction.