a straightforward knapsack problem. We may choose how to solve this subproblem however we want, with an exact method, a continuous relaxation (e.g., the linear multiple choice knapsack problem, if \(X\) includes GUB constraints), or any other hybrid that guarantees an optimal or super-optimal solution. Again, we can also stop filling the knapsack when we reach a super-optimal objective value. If the knapsack problem is infeasible, the original problem is also infeasible; if all the clone variables agree with the knapsack subproblem, we have a feasible optimal solution to the original problem.
This surrogate decomposition method dominates Lagrangian decomposition in theory, without requiring anything new from the subproblem: just like Lagrangian decomposition, each subproblem need only be equipped with an optimisation oracle. We also need a knapsack solver, but that's a well understood problem. Crucially, the surrogate decomposition's weight-setting (pricing) problem can be solved with AdaHedge, or any other no-regret online learning algorithm, with an iteration count that scales with \(\frac{\ln 2k}{\varepsilon^2}\), where \(k\) is the number of cloned variables. The number of clones is in \(\mathcal{O}(m n),\) so this compare favourably with subgradient methods' \(\frac{k}{\varepsilon^2}\) (or \(\frac{k}{\varepsilon}\) if subproblems can be smoothed).
The \(\varepsilon\) for no-regret surrogate decomposition measures the worst-case infeasibility of all linking constraints, while the \(\varepsilon\) for subgradient methods measures suboptimality in the bound value (with weak guarantees on the feasibility of the associated fractional primal solution \cite{frangioni2005} \cite{anbil2000}). In practice, I think the no-regret decomposition's measure is more useful: we have a solution that's feasible for the convexified subproblem, within some measurement error for each block of constraint. If the result is widely super-optimal, maybe the instance is badly conditioned and should be formulated differently.
In the common case where \(X \subseteq [0,1]^n\) (and similarly for the clones \(Y, Z, \ldots\)), the range of losses is \([-1, 1]\), which is pretty much ideal for online learning. When relaxing discrete optimisation problems, the subproblems will usually be solved to integrality, and, even if it's impractical to solve the knapsack problem to integrality, its continuous relaxation will have very few (one) fractional value at optimum. The end result is that the we actually expect the losses to cover \([-1, 1]\) at each iteration. This means it probably doesn't make sense to look into complex online learning algorithms that can adapt to the range of losses; adaptiveness to path length (variation in each expert's loss, i.e., constraint violation, across consecutive iterations) might be.
Another annoying thing in the practice of Lagrangian decomposition is that we must always solve the subproblems to near-optimality in order to obtain valid subgradients. However, the pricing problem will often yield sequences of similar Lagrange multipliers, so, when solving subproblems as MIPs, we end up exploring very similar branch-and-bound trees. That's what lead to the reformulation in  my dissertation. With surrogate decomposition, we only have to generate a solution that's known to be optimal or super-optimal, and feasible for the relaxed problem. We can heuristically assemble solutions by looking back at the history of each component, solve the resulting knapsack subproblem, and see if the result is super-optimal; if so, we've generated a valid superoptimal solution without solving each component to optimality!
For large enough instances, surrogate relaxation with no-regret master problems should eventually outperform Lagrangian decomposition, not only in terms of theoretical bound value, but also with respect to the number of iterations. Moreover, the surrogate relaxation will either detect infeasibility, or yields a super-optimal and approximately feasible solution in the intersection of the convex hull of all the subproblems' feasible spaces. Infeasibility detection and fractional solutions are both useful when the relaxation is embedded in a larger search algorithm, or guides primal heuristics. Finally, the subproblems are trivially solvable in parallel (and even the knapsack can be parallelised), so we should be able to exploit GPUs and distributed computing to speed up the surrogate relaxation subproblems.

What's different this time?

Lagrangian decomposition has been around since the 70s, and surrogate relaxation even earlier than that. Surrogate relaxation, let alone the decomposition approach in particular, hasn't made it past a niche role, e.g., in multidimensional knapsack solvers. Why makes the approach described here better?
The main difficulty in applying surrogate relaxation is the search for dual multipliers. The surrogate dual function isn't convex, it's only quasi-convex. While its level sets do form convex sets, the dual functions has plateaux, and not just around the optimum: it's not hard to see how a marginal change in the importance of a few constraints might fail to change the optimal solution. In his doctoral dissertation \cite{karwan1976surrogate}, Karwan constructs an example where the subgradient is not an ascent direction, although there does exist one.
In reaction to this lack of subgradient, subgradient-like methods have to look for strictly infeasible solutions, and slowly turn down the degree of infeasibility. Not only do the simple algorithms inherit the unpredictability of subgradients methods, but they now have another parameter that must be tuned in an outer loop.
Alternatively, it is possible to adapt the cutting plane method \cite{Kelley_Jr__1960} to this setting. Rather than finding the set of multipliers that result in the maximin objective value for all relaxed solutions we have so far, we search for the set of multipliers that result in the maximin violation for all surrogate relaxed solutions. Again, we have the same instability issues as the usual method, which can be solved with various trust region tricks. However, the cutting plane approach is still slow (requires solving an LP, if not a QP, at each iteration), and needs to use more and more memory in order to guarantee convergence.
It sounds like, except for the 2-dimensional cases, where specialised dual ascent methods do well, neither subgradient-style nor cutting plane methods scale well. That's not surprising: Lagrangian relaxation methods have the same issue with their pricing problems.
The no-regret approach is different. It directly handles the quasi-convex dual, does not need to store more than \(\mathcal{O}(1)\) solution and violation vectors, and offers strict convergence guarantees. We might have a practical approach here.
As for the decomposition approach… decomposition introduces a lot more dual multipliers than relaxation. The pricing problem is already hard enough for surrogate relaxation, there's no reason to think it'll do better on decomposition. With no-regret pricing algorithms, we can scale to a large number of dual multipliers, and it makes sense to introduce a ton of multipliers to both improve the relaxation's strength and simplify the (separable!) subproblem.