Research Topics

Many real-world problems can be formulated as an optimization problem of the form, \[ \min_{z \in \mathcal{C}} f(z), \tag{1} \] which consists of an objective function \(f\) to be minimized over a feasible set \(\mathcal{C} \subseteq \mathcal{Z}\) in some parameter space.
For example, we can think of classical supervised learning problems of the form: \[ \min_{h \in \mathcal{H}} \mathbb{E}_{(x,y) \sim \mathcal{D}} [\ell(h(x), y)] \tag{2} \] where the objective often represents a loss function \( \ell: \mathcal{X} \times \mathcal{Y} \to \mathbb{R} \) that measures the discrepancy between predicted and actual outcomes, and the goal is to find a hypothesis \(h\) from a hypothesis class \(\mathcal{H}\) that minimizes the expected loss over the data distribution \(\mathcal{D}\).

Optimization

Optimization theory is all about designing an algorithm that efficiently finds a (near-)optimal solution to problems of the form \((1)\), based on (i) the knowledge we have about the objective function and feasible set, and (ii) what kind of access we have to them. For example, we may consider the following cases that cover a wide range of (but not all) scenarios:

  1. If \(f\) is convex (and smooth), what algorithms can efficiently find a global minimum?
  2. If \(f\) is nonconvex (and/or nonsmooth), what algorithms can find a local minimum or a stationary point?
  3. What do we have access to; exact/noisy gradients, second order information, or something else?
  4. What can we do if the space \(\mathcal{Z}\) or constraint \(\mathcal{C}\) is complicated and/or hard to deal with?
  5. What if we have a objective different from \((1)\), i.e., something beyond function minimization?

Cases 1 and 2 are the most classical settings in optimization theory, where Case 1 is famously known as convex optimization (if, of course, \(\mathcal{C}\) is convex). There is a rich and beautiful literature on convex optimization, but active research is still ongoing in many directions, one of which is to create solvers and software that can efficiently handle large-scale convex optimization problems in practice. Case 2 also covers a wide range of problems in modern ML/AI applications, especially as many problems under such settings are inherently nonconvex (e.g., training deep neural networks). For Case 3, the literature of stochastic optimization is quite well-developed when we have access to noisy gradients; some studies think of harder settings of heavy-tailed noise of the gradient oracles, and some consider more realistic models of information retrieval, as in online learning or reinforcement learning. Others consider either using more powerful (e.g., second-order derivatives) or weaker (e.g., zeroth-order information) oracles and retain good performance. Case 4 includes settings such as optimization over discrete structures (e.g., graphs, sets), problems with special constraints (e.g., Riemannian manifolds), or even optimization in infinite-dimensional spaces (e.g., Wasserstein gradient flows). Case 5 subsumes minimax or multi-objective optimizations in game theory, problems related to monotone operators (e.g., variational inequalities, monotone inclusion problems), and many more.

Statistics

Under construction...