Kantorovich’s Inequality

2 minute read


Kantorovich’s inequality is an inequality used in convergence analysis of the method of steepest descent. Specifically, we concern ourselves with the optimization problem

\[\begin{align*} \mathbf{x}^* = argmin_{\mathbf{x}} f(\mathbf{x}) := \frac{1}{2} \langle A\mathbf{x}{,} \mathbf{x} \rangle - \langle \mathbf{b}{,} \mathbf{x} \rangle \end{align*}\]

By considering a gradient-based method, we can iterate at each point by considering a direction to move and an amount by which we move. In the case of the method of steepest descent, we select the direction to be in the opposite direction of the gradient of \(f(\mathbf{x})\). That is, for step size \(\alpha_n\), we have

\[\begin{align*} \mathbf{x}_{n+1} & = \mathbf{x}_{n} - \alpha_n \nabla f(\mathbf{x}_n) \\ & = \mathbf{x}_{n} - \alpha_n \left(A\mathbf{x}_n - \mathbf{b}\right) \end{align*}\]

Because we seek \(\alpha_n\) to be the largest step we can take in the direction of \(-\nabla f(\mathbf{x}_n)\), we find

\[\begin{align*} \max_{\alpha_n} f(\mathbf{x}_{n} + \alpha_n \nabla f(\mathbf{x}_n)) - f(\mathbf{x}_n)\\ \max_{\alpha_n} \alpha_n \langle \nabla f(\mathbf{x}_n){,} A\mathbf{x}_n - \mathbf{b}\rangle + \frac{1}{2} \alpha_n^2 \langle \nabla f(\mathbf{x}_n){,} A \nabla f(\mathbf{x}_n) \rangle & \implies \\ \alpha_n = - \frac{\langle \nabla f(\mathbf{x}_n){,} \nabla f(\mathbf{x}_n)\rangle}{|\nabla f(\mathbf{x}_n)|^2_A} = -\frac{|\nabla f(\mathbf{x}_n)|^2}{|\nabla f(\mathbf{x}_n)|^2_A} \end{align*}\]

Finally, we seek to understand the convergence properties which is where our story started. Specifically, if we define an error function \(E(\mathbf{x}_n) = f(\mathbf{x}_n) - f(\mathbf{x}^*)\), we can try to understand

\[\begin{align*} \frac{E(\mathbf{x}_n) - E(\mathbf{x}_{n+1})}{E(\mathbf{x}_n)} \end{align*}\]

Doing the algebra leads us to the punchline,

\[\begin{align*} \frac{E(\mathbf{x}_{n+1})}{E(\mathbf{x}_n)} = \left(1 - \frac{|\nabla f(\mathbf{x}_n|^4}{|\nabla f(\mathbf{x}_n)|^2_{A^{-1}}\cdot|\nabla f(\mathbf{x}_n)|^2_{A}} \right) \overbrace{\leq}^\text{by Kantorovich's inequality} \left(\frac{\lambda_1 - \lambda_N}{\lambda_1 + \lambda_N}\right)^2 \end{align*}\]

Kantorovich’s inequality can be proven many ways, but in this I’ll prove it using a probabilistic framing. Specifically, we seek to show that given \(0 < \lambda_1 < \dots < \lambda_N\) and \({\xi_i \geq 0 \text{ s.t. } \sum_i \xi_i = 1}\),

\[\begin{align*} \frac{1}{\sum_i \xi_i \lambda_i \sum_i \frac{\xi_i}{\lambda_i}} \geq \frac{4\lambda_1 \lambda_N}{(\lambda_1 + \lambda_N)^2} \end{align*}\]

We can first observe that \(\xi_i\) resembles a probability distribution and we can therefore define

\[\begin{align*} \mathbb{E}\left[X\right] & = \sum_i \xi_i \lambda_i \\ \mathbb{E}\left[Y\right] & = \mathbb{E}\left[X^{-1}\right] = \sum_i \frac{\xi_i}{\lambda_i} \end{align*}\]

We have for two random variables

\[\begin{align*} \operatorname{Cov}(X, Y) = \mathbb{E}\left[XY\right] - \mathbb{E}\left[X\right]\mathbb{E}\left[Y\right] \end{align*}\]

By Cauchy-Schwarz,

\[\begin{align*} |\operatorname{Cov}(X, Y)| \leq \sqrt{\operatorname{Var}(X)\operatorname{Var}(Y)} \end{align*}\]

By Popoviciu for bounded random variable $X$ with $a = \min X$, $b = \max X$,

\[\begin{align*} \operatorname{Var}(X) \leq \frac{(b-a)^2}{4} \end{align*}\]


\[\begin{align*} |\operatorname{Cov}(X, Y)| \leq \sqrt{\frac{(\lambda_N-\lambda_1)^2}{4}\frac{(\frac{1}{\lambda_1} - \frac{1}{\lambda_N})^2}{4}} = \frac{(\lambda_1 - \lambda_N)^2}{4\lambda_1 \lambda_N} \end{align*}\]

Because the random variables have an inverse relationship, we take the negative root; i.e., \(\operatorname{Cov}(X, Y) = -\frac{(\lambda_1 - \lambda_N)^2}{4\lambda_1 \lambda_N}\). Additionally, \(\mathbb{E}[XX^{-1}] = 1\). Thus,

\[\begin{align*} \mathbb{E}[X]\mathbb{E}[Y] \leq 1 + \frac{(\lambda_1 - \lambda_N)^2}{4\lambda_1 \lambda_N} = \frac{(\lambda_1 + \lambda_N)^2}{4\lambda_1 \lambda_N} \end{align*}\]


\[\begin{align*} \frac{1}{\mathbb{E}[X]\mathbb{E}[Y]} = \frac{1}{\sum_i \lambda_i \xi_i \cdot \sum_i \frac{\xi_i}{\lambda_i}} \geq \frac{4\lambda_1 \lambda_N}{(\lambda_1 + \lambda_N)^2} \end{align*}\]