In machine learning, especially in neural networks, the Hessian matrix is often treated synonymously with curvatures, in the following sense. Suppose

Often, one calls the Hessian matrix the “curvature matrix” of *loss landscape* of

## Loss Landscapes as Hypersurfaces

We begin by formalizing what exactly is a *loss landscape* via the Euclidean hypersurface theory. We call an ** (Euclidean) hypersurface** of

for all tangent vectors

Intuitively, the induced inner product on

Let ** graph** of

**which is a function**

*graph parametrization*Coming back to our neural network setting, assuming that the loss

## The Second Fundamental Form and Shape Operator

Consider vector fields

where ** second fundamental form** as the map

See the following figure for an intuition.

Since *outward* relative to

Another choice is the same unit normal field but oriented *inward* relative to

Fix a unit normal field ** scalar second fundamental form** of

Furthermore, we define the ** shape operator** of

Based on the characterization above, we can alternatively view

Note that, at each point

Altogether, this means that at each

## Principal Curvatures

The previous fact about the matrix of ** principal curvatures** of

**. Moreover, we also define the**

*principal directions***as**

*Gaussian curvature***as**

*mean curvature*The intuition of the principal curvatures and directions in

Principal and mean curvatures are not intrinsic to a hypersurface. There are two hypersurfaces that are isometric, but have different principal curvatures and hence different mean curvatures. Consider the following two surfaces.

The first (left) surface is the plane described by the parametrization

Remarkably, the Gaussian curvature is intrinsic: All isometric hypersurfaces of dimension *Theorema Egregium*. For hypersurfaces with dimension

## The Loss Landscape’s Hessian

Now we are ready to draw a geometric connection between principal curvatures and the Hessian of

If we think of

for each

Let us suppose we are given a unit normal field to

**Proposition 1.** *Suppose *

*Where *

*Proof.* To show the first equality, one can refer to Proposition 8.23 in [1], which works for any parametrization and not just the graph parametrization. Now recall that

and thus

Taking the inner product with the unit normal field

where

Finally, we show the connection between the principal curvatures with the scalar second fundamental form, and hence the principal curvatures with the Hessian. The following proposition says that at a critical point, the unit normal vector can be chosen as

**Proposition 2.** _Suppose

*Proof.* We can assume w.l.o.g. that the basis

It follows by Proposition 1 that the matrix of the scalar second fundamental form

As a side note, we can actually have a more general statement: At any point in a hypersurface with any parametrization, the principal curvatures give a concise description of the local shape of the hypersurface by approximating it with the graph of a quadratic function. See Prop. 8.24 in [3] for a detailed discussion.

## Flatness and Generalization

In deep learning, there have been interesting works connecting the “flatness” of the loss landscape’s local minima with the generalization performance of an NN. The conjecture is that the flatter a minimum is, the better the network generalizes. “Flatness” here often refers to the eigenvalues or trace of the Hessian matrix at the minima. However, this has been disputed by e.g. [4] and rightly so.

As we have seen previously, at a minimum, the principal and mean curvature (the eigenvalues and trace of the Hessian of

It is clear that the principal curvature changes even though functionally, the NN still represents the same function. Thus, we cannot actually connect the notion of “flatness” that are common in literature to the generalization ability of the NN. A definitive connection between them must start with some intrinsic notion of flatness—for starter, the Gaussian curvature, which can be easily computed since it is just the determinant of the Hessian at the minima.

## References

- Martens, James. “New Insights and Perspectives on the Natural Gradient Method.” arXiv preprint arXiv:1412.1193 (2014).
- Dangel, Felix, Stefan Harmeling, and Philipp Hennig. “Modular Block-diagonal Curvature Approximations for Feedforward Architectures.” AISTATS. 2020.
- Lee, John M. Riemannian manifolds: an introduction to curvature. Vol. 176. Springer Science & Business Media, 2006.
- Dinh, Laurent, et al. “Sharp Minima can Generalize for Deep Nets.” ICML, 2017.
- Spivak, Michael D. A comprehensive introduction to differential geometry. Publish or perish, 1970.