SGD with Variance Reduction beyond Empirical Risk Minimization

Published in International Conference of Monte Carlo Methods and Applications, 2017

M. Achab, A. Guilloux, S. Gaïffas and E. Bacry

We introduce a doubly stochastic proximal gradient algorithm for optimizing a finite average of smooth convex functions, whose gradients depend on numerically expensive expectations. Indeed, the effectiveness of SGD-like algorithms relies on the assumption that the computation of a subfunction’s gradient is cheap compared to the computation of the total function’s gradient. This is true in the Empirical Risk Minimization (ERM) setting, but can be false when each subfunction depends on a sequence of examples. Our main motivation is the acceleration of the optimization of the regularized Cox partial-likelihood (the core model in survival analysis), but other settings can be considered as well.

The proposed algorithm is doubly stochastic in the sense that gradient steps are done using stochastic gradient descent (SGD) with variance reduction, and the inner expectations are approximated by a Monte-Carlo Markov-Chain (MCMC) algorithm. We derive conditions on the MCMC number of iterations guaranteeing convergence, and obtain a linear rate of convergence under strong convexity and a sublinear rate without this assumption. We illustrate the fact that our algorithm improves the state-of-the-art solver for regularized Cox partial-likelihood on several datasets from survival analysis.

PDF Download paper here