Optimal, "no regret," online learning algorithm solve the seemingly impossible "experts problem," with non-asymptotic worst-case guarantees. One textbook trick in theoretical computer science casts the linear programming approximate feasibility problem as an instance of the experts problem. Theoretician are then satisfied by the fact that optimisation reduces to feasibility \cite{kun}.
Recognising that the conversion of linear programming to the experts problem is a surrogate relaxation (rather than a Lagrangian relaxation) lets us directly reduce approximate linear optimisation to online learning algorithms. We can then exploit the structure of the surrogate relaxation for a decomposition reformulation (with cloned variable) to derive theoretically stronger bounds than Lagrangian decomposition, without destroying the subproblems' structure, and while preserving the non-asymptotic convergence bounds of no-regret learning.
The experts problem
We are interested in the adversarial (worst-case) variant of the experts problem. The problem is to come up with strategies in a two-player game against an omniscient adversary. Each game has a fixed set of "experts," or moves. At each turn, we must define a mixed strategy, i.e., define the probability that we'll play each expert or move. The omniscient adversary then reveals an arbitrary payoff function, and our actual payoff for this turn is the expected payoff for the mixed strategy we picked. We continue this game for a number of turns, potentially forever.
Our "regret," for a given game, is the difference between our actual (expected) payoff and the best payoff achieved by any single expert (pure strategy), a posteriori. Optimal, "no regret" online learning algorithms guarantee that this regret is in \(\mathcal{O}\left(\sqrt{T\ \ln k}\right)\), for \(T\) turns and \(k\) experts (once \(T\) is large enough); corresponding lower bounds show that the fully adversarial regret must be in \(\Omega\left(\sqrt{T}\right)\).
This sounds like an almost impossibly powerful guarantee, at first. The key is that the regret is compared to a static pure strategy. In an iterated rock-paper-scissors game, that's equivalent to coming up with a strategy that's never much worse than always playing rock, or always playing paper, or always playing scissors… not a hard feat. An adversarial no-regret strategy can't expect anything about the future; it must look at the cumulative payoff for turns we have already played, and simply give more weight to moves or experts that are doing well so far. If the resulting mixed strategy is heavily penalised, the current best experts also were penalised, so the incremental regret shouldn't be too bad.
Algorithms and bounds for the experts problem
Classic algorithms for the experts problem look like the multiplicative weight update algorithm \cite{kale} or mirror descent \cite{orecchia2016}. They must know the number of iterations ahead of time, as well as an upper bound on a measure of "hardness" for the payoff defined by the adversary, in order to determine a fixed step size. For the MWU algorithm, hardness is the range of the payoffs. If we assume the payoffs are in \([-1, 1]\), letting the step size \(\eta = \sqrt{\frac{\ln k}{T}}\) is guaranteed to offer regret at most