Differential geometry can be seen as a generalization of calculus on Riemannian manifolds. Objects in calculus such as gradient, Jacobian, and Hessian on

In this article, we will focus on the most popular optimization there is, esp. in machine learning: the gradient descent method. We will begin with a review on the optimization problem of a real-valued function on

## Optimization problem and the gradient descent

Let

That is we would like to find a point

One of the most popular numerical method for solving this problem is the gradient descent method. Its algorithm is as follows.

**Algorithm 1 (Euclidean gradient descent).**

- Pick arbitrary
and let with - While the stopping criterion is not satisfied:
- Compute the gradient of
at , i.e. - Move in the direction of
, i.e.

- Compute the gradient of
- Return

The justification of the gradient descent method is because of the fact that the gradient is the direction in which

**Proposition 1.** *Let *

*Proof.* First, note that, by our assumption,

where

Thus, the length of

## Gradient descent on Riemannian manifolds

**Remark.** _We shall use the Einstein summation convention: Repeated indices above and below are implied to be summed, e.g.

We now want to break the confine of the Euclidean space. We would like to generalize the gradient descent algorithm on a function defined on a Riemannian manifold. Based on Algorithm 1, at least there are two parts of the algorithm that we need to adapt, namely, (i) the gradient of

Suppose

Although it seems innocent enough (we only replace

First, we shall discuss about the gradient of

Then, the gradient of

and in coordinates, it has the expression

At any

That is, pointwise, the inner product of the gradient and any tangent vector is the action of derivation

**Proposition 2.** *Let *

*Proof.* We simply note that by definition of inner product induced by

where

Proposition 2 therefore provides us with a justification of just simply substituting the Euclidean gradient with the Riemannian gradient in Algorithm 1.

To make this concrete, we do the computation in coordinates. In coordinates, we can represent

Given the matrix representation

**Example 1. (Euclidean gradient in coordinates).** Notice that in the Euclidean case,

The second modification to Algorithm 1 that we need to find the analogue of is the way we move between points on

On Riemannian manifolds, we move between points by the means of curves. There exist a special kind of curve

For any

and thus we can define a map

called the exponential map. The exponential map is the generalization of “moving straight in the direction

**Example 2. (Exponential map on a sphere).** Let

Let

is a geodesic, as its image is the great circle formed by the intersection of

Given the exponential map, our modification to Algorithm 1 is complete, which we show in Algorithm 2. The new modifications from Algorithm 1 are in blue.

**Algorithm 2 (Riemannian gradient descent).**

- Pick arbitrary
. Let with - While the stopping criterion is not satisfied:
- Compute the gradient of
at , i.e. - Move in the direction
, i.e.

- Compute the gradient of
- Return

## Approximating the exponential map

In general, the exponential map is difficult to compute, as to compute a geodesic, we have to solve a system of second-order ODE. Therefore, for a computational reason, we would like to approximate the exponential map with cheaper alternative.

Let ** retraction** map, by the following properties:

.

The second property is called the ** local rigidity** condition and it preserves gradients at

On an arbitrary embedded submanifold

**Example 3. (Retraction on a sphere).** Let

Therefore, the Riemannian gradient descent in Algorithm 2 can be modified to be

**Algorithm 3 (Riemannian gradient descent with retraction).**

- Pick arbitrary
. Let with - While the stopping criterion is not satisfied:
- Compute the gradient of
at , i.e. - Move in the direction
, i.e.

- Compute the gradient of
- Return

## Natural gradient descent

One of the most important applications of the Riemannian gradient descent in machine learning is for doing optimization of statistical manifolds. We define a statistical manifold

The most common objective function

The metric tensor *Fisher information matrix*. The Riemannian gradient in this manifold is therefore can be represented by a column vector *natural gradient descent* and is presented in Algorithm 4 below.

**Algorithm 4 (Natural gradient descent).**

- Pick arbitrary
. Let with - While the stopping criterion is not satisfied:
- Compute the gradient of
at , i.e. - Move in the direction
, i.e.

- Compute the gradient of
- Return

## Conclusion

Optimization in Riemannian manifold is an interesting and important application in the field of geometry. It generalizes the optimization methods from Euclidean spaces onto Riemannian manifolds. Specifically, in the gradient descent method, adapting it to a Riemannian manifold requires us to use the Riemannian gradient as the search direction and the exponential map or retraction to move between points on the manifold.

One major difficulty exists: Computing and storing the matrix representation

## References

- Lee, John M. “Smooth manifolds.” Introduction to Smooth Manifolds. Springer, New York, NY, 2013. 1-31.
- Lee, John M. Riemannian manifolds: an introduction to curvature. Vol. 176. Springer Science & Business Media, 2006.
- Fels, Mark Eric. “An Introduction to Differential Geometry through Computation.” (2016).
- Absil, P-A., Robert Mahony, and Rodolphe Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, 2009.
- Boumal, Nicolas. Optimization and estimation on manifolds. Diss. Catholic University of Louvain, Louvain-la-Neuve, Belgium, 2014.
- Graphics: https://tex.stackexchange.com/questions/261408/sphere-tangent-to-plane.
- Martens, James, and Roger Grosse. “Optimizing neural networks with kronecker-factored approximate curvature.” International conference on machine learning. 2015.
- Patterson, Sam, and Yee Whye Teh. “Stochastic gradient Riemannian Langevin dynamics on the probability simplex.” Advances in neural information processing systems. 2013.
- Suzuki, Atsushi, Yosuke Enokida, and Kenji Yamanishi. “Riemannian TransE: Multi-relational Graph Embedding in Non-Euclidean Space.” (2018).