/ 11 min read

# The Invariance of the Hessian and Its Eigenvalues, Determinant, and Trace

Let ** reparametrization**, i.e., a differentiable map with a differentiable inverse, mapping

Suppose we transform

- The
*eigenvalues*of are not invariant. - The
*determinant*of is not invariant. - The
*trace*of is not invariant. - Seen as a
*bilinear map*, the Hessian is not invariant outside the critical points of .

In this post, we shall see that these quantities are actually invariant under reparametrization! Although the argument comes from Riemannian geometry, it will also hold even if we use the default assumption found in calculus—the standard setting assumed by deep learning algorithms and practitioners.

**Note.**
Throughout this post, we use the Einstein summation convention.
That is, we sum two variables together if one has an upper index and the other has a lower index, while omitting the summation symbol.
For example:

## The Hessian as a Bilinear Map

In calculus, the Hessian matrix

The Hessian matrix defines a bilinear function, i.e., given arbitrary vectors

where we have defined

Under the reparametrization

However, notice that if we evaluate

Meanwhile, if

because the Jacobian of the reparametrization (i.e. change of coordinates)

Notice that

under the reparametization *invariant* under reparametrization.

## The Non-Invariance of the Hessian

While the Hessian, as a bilinear map at a minimum, is (functionally) invariant, some of its downstream quantities are not. Let us illustrate this using the determinant—one can also easily show similar results for trace and eigenvalues.

First, recall that the components

In matrix notation, this is

Thus, in general,

## The Riemannian Hessian

From the Riemannian-geometric perspective, the component

where

Under a reparametrization *connection coefficient*

And thus, combined with the transformation of the “calculus Hessian” (i.e. second partial derivatives) from the previous section, the Riemannian Hessian transform as:

Note that while this transformation rule is very similar to the transformation of the “calculus Hessian” *at a critical point*, the transformation rule of the Riemannian Hessian applies everywhere on

**This means, seen as a bilinear map, the Hessian is invariant everywhere on **. (Not just at the critical points as before.)
How does this discrepancy happen?
This is because we ignore

## The Invariance of the Hessian Eigenvalues, Determinant, and Trace

Let us focus on the determinant of the Hessian. As discussed above, it is not invariant. This is true even if the Riemannian Hessian above is used. How do we make sense of this?

To make sense of this, we need to fully understand the object we care about when we talk about the determinant of the Hessian as a measure of the flatness of the loss landscape of

The loss landscape of *graph*

These curvatures can actually be derived from the Hessian of *must* first derive the ** shape operator** with the help of the metric. (The shape operator is a linear operator, mapping a vector to a vector.)
Suppose the matrix representation of the metric on

The principal, Gaussian, and mean curvatures of the loss landscape are then the eigenvalues, determinant, and trace of

But notice that under a reparametrization

So, even when *must not* ignore the metric in the shape operator, however trivial it might be, if we care about reparametrization.
*This is the cause of the non-invariance of the Hessian’s eigenvalues, determinant, and trace observed in deep learning!*

First, let us see the transformation of the shape operator by combining the transformation rules of

If we take the determinant of both sides, we have:

That is, **the determinant of the Hessian, seen as a shape operator, is invariant!**

What about the trace of

and so **the trace is also invariant**.

Finally, we can also show a general invariance result for eigenvalues.
Recall that

Let

where the last step is done by multiplying both sides by the inverse of the Jacobian—recall that

Therefore, we identify that **all eigenvalues of **.

## Non-Invariance from the Tensor Analysis Viewpoint

In tensor analysis, this issue is very easy to identify.
First, the Hessian represents a bilinear map, so it is a *covariant 2-tensor*.
Meanwhile, when we talk about eigenvalues, we refer to the spectral theorem and this theorem applies to *linear maps*.
So, there is a *type mismatch* here.

To apply the spectral theorem on the Hessian, we need to express it as a linear map.
This can be done by viewing the Hessian as a linear map on the tangent space onto itself, which is a *1-contravariant 1-covariant tensor*.
That is, we need to “raise” one of the indices of

## Conclusion

The reason why “flatness measures” derived from the calculus version of Hessian is not invariant is simply because we measure those “flatness measures” from an incorrect object. The correct object we should use is the shape operator, which is obtained with the help of the metric (even when the latter is Euclidean).

Moreover, the reason why Newton’s method is not invariant (see Sec. 12 of Martens, 2020) is that we ignore the second term involving the connection coefficient

Ignoring those geometric quantities are totally justified in calculus and deep learning since we always assume a Euclidean metric along with the Cartesian coordinates. But this simplification makes us “forget” about the correct transformation of the Hessian, giving rise to the pathological non-invariance issues observed in deep learning.