/ 11 min read
The Invariance of the Hessian and Its Eigenvalues, Determinant, and Trace
Let
Suppose we transform
- The eigenvalues of
are not invariant. - The determinant of
is not invariant. - The trace of
is not invariant. - Seen as a bilinear map, the Hessian is not invariant outside the critical points of
.
In this post, we shall see that these quantities are actually invariant under reparametrization! Although the argument comes from Riemannian geometry, it will also hold even if we use the default assumption found in calculus—the standard setting assumed by deep learning algorithms and practitioners.
Note.
Throughout this post, we use the Einstein summation convention.
That is, we sum two variables together if one has an upper index and the other has a lower index, while omitting the summation symbol.
For example:
The Hessian as a Bilinear Map
In calculus, the Hessian matrix
The Hessian matrix defines a bilinear function, i.e., given arbitrary vectors
where we have defined
Under the reparametrization
However, notice that if we evaluate
Meanwhile, if
because the Jacobian of the reparametrization (i.e. change of coordinates)
Notice that
under the reparametization
The Non-Invariance of the Hessian
While the Hessian, as a bilinear map at a minimum, is (functionally) invariant, some of its downstream quantities are not. Let us illustrate this using the determinant—one can also easily show similar results for trace and eigenvalues.
First, recall that the components
In matrix notation, this is
Thus, in general,
The Riemannian Hessian
From the Riemannian-geometric perspective, the component
where
Under a reparametrization
And thus, combined with the transformation of the “calculus Hessian” (i.e. second partial derivatives) from the previous section, the Riemannian Hessian transform as:
Note that while this transformation rule is very similar to the transformation of the “calculus Hessian” at a critical point, the transformation rule of the Riemannian Hessian applies everywhere on
This means, seen as a bilinear map, the Hessian is invariant everywhere on
The Invariance of the Hessian Eigenvalues, Determinant, and Trace
Let us focus on the determinant of the Hessian. As discussed above, it is not invariant. This is true even if the Riemannian Hessian above is used. How do we make sense of this?
To make sense of this, we need to fully understand the object we care about when we talk about the determinant of the Hessian as a measure of the flatness of the loss landscape of
The loss landscape of
These curvatures can actually be derived from the Hessian of
The principal, Gaussian, and mean curvatures of the loss landscape are then the eigenvalues, determinant, and trace of
But notice that under a reparametrization
So, even when
First, let us see the transformation of the shape operator by combining the transformation rules of
If we take the determinant of both sides, we have:
That is, the determinant of the Hessian, seen as a shape operator, is invariant!
What about the trace of
and so the trace is also invariant.
Finally, we can also show a general invariance result for eigenvalues.
Recall that
Let
where the last step is done by multiplying both sides by the inverse of the Jacobian—recall that
Therefore, we identify that
Non-Invariance from the Tensor Analysis Viewpoint
In tensor analysis, this issue is very easy to identify. First, the Hessian represents a bilinear map, so it is a covariant 2-tensor. Meanwhile, when we talk about eigenvalues, we refer to the spectral theorem and this theorem applies to linear maps. So, there is a type mismatch here.
To apply the spectral theorem on the Hessian, we need to express it as a linear map.
This can be done by viewing the Hessian as a linear map on the tangent space onto itself, which is a 1-contravariant 1-covariant tensor.
That is, we need to “raise” one of the indices of
Conclusion
The reason why “flatness measures” derived from the calculus version of Hessian is not invariant is simply because we measure those “flatness measures” from an incorrect object. The correct object we should use is the shape operator, which is obtained with the help of the metric (even when the latter is Euclidean).
Moreover, the reason why Newton’s method is not invariant (see Sec. 12 of Martens, 2020) is that we ignore the second term involving the connection coefficient
Ignoring those geometric quantities are totally justified in calculus and deep learning since we always assume a Euclidean metric along with the Cartesian coordinates. But this simplification makes us “forget” about the correct transformation of the Hessian, giving rise to the pathological non-invariance issues observed in deep learning.