1. What is the main limitation of the mirror prox algorithm?
The main limitation of the mirror prox algorithm is the requirement of a full spectral decomposition of a symmetric matrix at each iteration. This can be computationally expensive and may hinder its efficiency for large-scale problems. Alternative approaches, such as stochastic smoothing, have been developed to address this limitation and provide more efficient solutions for minimizing the maximum eigenvalue of a symmetric matrix over a convex set.
read more
2. How does regularization help in optimization problems?
Regularization introduces a regularization term in optimization problems with limited information on regularity parameters. Techniques from complementary composite stochastic optimization are used to handle this term. An oblivious step-size scheme is developed to allow the algorithm to converge without prior knowledge of Lipschitz or smoothness constants. By considering u small, the regularized problem can be made arbitrarily close to the original one. The gap between the minimum value of the composite setting and the minimum value of F is smaller than D20/T. This approach can be extended to relative scale precision targets, resulting in significant computational savings.
read more
3. What are the key differences between non-smooth and smooth cases in optimization?
In the non-smooth case, the stochastic oracle needs to have a M-bounded second moment, i.e., E x G(X, x) 2 <= M 2. This requires prior knowledge on D 0 and M for optimal step-size tuning. In the smooth case, the objective function F needs to be L-smooth and the stochastic noise should have a finite variance s 2. This allows the use of the accelerated framework in [14] to achieve optimal convergence. The relative scale setting imposes a quadratic lower bound for F and a relative constraint on the stochastic oracle. Adaptive methods like D-Adaptation and Adagrad are used to handle uncertainties in the non-smooth and smooth cases, respectively. The convergence rates differ between the two cases, with the non-smooth case achieving EQUATION and the smooth case achieving EQUATION. Over or under-estimation of parameters can make algorithms slow and inefficient, highlighting the importance of adaptive methods in optimization.
read more
4. What is the relationship between F and Ps?
The relationship between F and Ps is quantified by Lemma 2.1, which states that if E[Ps(X T )] - Ps <= T, then E[F (X T )] - F <= uD 2 0 + T. This lemma provides a rough relation between F and Ps, indicating that if u is small, Ps and F will be arbitrarily close. Lemma 2.2 further shows that if F has a quadratic lower bound, Ps can be bounded by F 1 + u G, allowing for the choice of u G to make F and Ps close in the relative scale. The proof of these lemmas is provided in the supplementary material 5.
read more