The focus is on two extreme cases in which the analysis of regret is particularly simple and elegant: independent and identically distributed payoffs and adversarial payoffs.Expand

This monograph presents the main complexity theorems in convex optimization and their corresponding algorithms and provides a gentle introduction to structural optimization with FISTA, saddle-point mirror prox, Nemirovski's alternative to Nesterov's smoothing, and a concise description of interior point methods.Expand

Model-free reinforcement learning (RL) algorithms, such as Q-learning, directly parameterize and update value functions or policies without explicitly modeling the environment. They are typically… Expand

The efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression is verified.Expand

It is demonstrated through extensive experimentation that this method consistently outperforms all existing provably $\ell-2$-robust classifiers by a significant margin on ImageNet and CIFAR-10, establishing the state-of-the-art for provable $\ell_ 2$-defenses.Expand

It is proved that the UCB procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples is optimal up to constants and also shows through simulations that it provides superior performance with respect to the state-of-the-art.Expand

We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to a… Expand

It is able to prove that the separable metric spaces are exactly the metric spaces on which these regrets can be minimized with respect to the family of all probability distributions with continuous mean-payoff functions.Expand

This paper examines the bandit problem under the weaker assumption that the distributions have moments of order 1 + ε, and derives matching lower bounds that show that the best achievable regret deteriorates when ε <; 1.Expand

This work addresses online linear optimization problems when the possible actions of the decision maker are represented by binary vectors and shows that the standard exponentially weighted average forecaster is a provably suboptimal strategy.Expand