Volume Forms and Probability Density Functions Under Change of Variables
Suppose we have equipped with the Cartesian coordinates; the latter represents a point in with , an -tuple of numbers via the identity function —this is because itself is already defined as the space of tuples of numbers.
(Note that is not a power, but just indexing; we write e.g. if we need to take the power.)
Here are some interesting objects to study in this setting.
Riemannian Metrics
In , we usually have the standard Euclidean inner product where and are two vectors.
We can write an inner product in terms of an inner product matrix .
The matrix , which is symmetric positive definite, is called (the matrix representation of) a Riemannian metric.
In the case of the Euclidean inner product, we have , the identity matrix.
Volume Forms
Another interesting object is the volume form.
This is a differential form of degree , meaning that it takes vectors as arguments and returns a number.
There is a deeper meaning in the notation, but for the purpose of this post, it suffices to say that measures the volume of a parallelepiped spanned by vectors.
Indeed, the evaluation on vectors is obtained by computing the determinant of the matrix resulting from stacking the tuples .
An important fact is that if is any continuous function on , then is also a volume form.
The Riemannian metric and the volume form can be combined to obtain a special volume form
called the Riemannian volume form.
In the case of with the Cartesian coordinates and the standard dot product, , so, is a special case.
The idea here is that non-identity ‘s “distort” the Cartesian grids and thus the volume changes proportionally to the distortion.
For this reason, is the natural volume form for any choice of metric and any manifold in general.
Indeed, technically speaking, it is the unique volume form that evaluates to one on parallelepipeds spanned by orthonormal basis vectors.
Volume Forms and Measures
A non-negative volume form induces a measure via for Borel measurable subset of .
One can then see that is the volume form corresponding to the Lebesgue measure .
Suppose we have a probability measure (with support ) and assume that it can be expressed as .
Then, is the probability density function (pdf) of under the reference measure , i.e., it is positive everywhere and it integrates to one under , that is, .
Another way to define as a pdf is via the Radon-Nikodym derivative
Then it’s clear that we can take any volume form as the reference measure, not just .
E.g., we can take
which is a pdf under since it’s still positive (note that is positive-definite) and
i.e., it integrates to one under .
Change of Variables
Now, assume that we have another coordinates for , say, representing each element of with instead.
The change of coordinates function, mapping is a diffeomorphism—a differentiable function with a differentiable inverse.
Let’s call it ; and call its Jacobian matrix with inverse .
Here are some rules for transforming a metric and a volume form.
If is a matrix representation of a Riemannian metric in -coordinates, then
is the matrix representation of the same metric in -coordinates.
Consequently, the determinant of the metric transforms into .
This transformation rule is to ensure that if are the representations of in -coordinates, then .
That is, the value of the inner product is independent of the choice of coordinates.
In other words, this rule is to make sure we are referring to the same abstract object (in this case inner product, which is an abstract function) even when we use a different representation.
Now, if is a volume form in -coordinates, then
is the same volume form in -coordinates.
In particular, we have the relation [2, Corollary 14.21].
Again, this rule is to ensure coordinate independence.
As a consequence, integrals are also invariant under a change of coordinates:
where .
Notice that this is just the standard change-of-variable rule in calculus.
But one thing to keep in mind is that the Jacobian-determinant term is part of the transformation of , not the function itself.
Pdfs Under Change of Variables
From elementary probability theory, we have the transformation of a pdf (defined w.r.t. ):
and this is known to be problematic because of the additional Jacobian-determinant term.
For instance, the mode of doesn’t correspond to the mode of .
That is, modes of pdfs are not coordinate-independent.
Maximum a posterior (MAP) estimation, which is the standard estimation method for neural networks is thus pathological since an arbitrary reparametrization/change of variables will yield a different MAP estimate, see e.g. [1, Sec. 5.2.1.4]
Or are they?
The reason for the above transformation rule between and is to ensure invariance in the integration, to ensure that is a valid pdf w.r.t. :
However, as we have seen before, is part of the transformation of , i.e. !
So, the problem in pdf maximization is actually because we attribute the Jacobian-determinant to the wrong part of the volume measure .
This can only be detected if we see things holistically as the transformation of the whole volume form, and not just view it as the transformation of the function independently.
This leads to a very straightforward solution to the non-invariance problem.
Simply transform into .
This is just the transformation rule of standard function, so its extrema will always be coordinate-independent.
It is still a pdf w.r.t. , just don’t forget to add a Jacobian-determinant term as part of the transformation from to .
Riemannian Pdfs Under Change of Variables
What about a Riemannian pdf under the Riemannian volume form ?
First, recall that .
So,
This seems problematic since now we have the Jacobian determinant term again, just like the “incorrect” transformation of pdf in the previous section.
It actually is!
Just look at the following integral that attempts to show that integrates to one under .
We now don’t have the term anymore.
So we can’t apply the relation to complete the steps.
What gives?
This is actually because there is a Jacobian-determinant term that we forget about because we don’t see things as a whole.
The complete way to see a pdf is in terms of the Radon-Nikodym derivative.
So, let’s see, in -coordinates, we have:
Now in -coordinates, we have the following by transforming both the volume forms in the numerator and the denominator:
The key is to view as a function in front of , which, by the transformation rule discussed previously, transforms into .
For brevity, we might as well write it down as , just remember that the domain of this function is the -coordinates.
Compare this to before: we now don’t have the Jacobian-determinant term!
Performing the integration as before:
And therefore, we have shown that is the correct transformation of .
Notice that this is again just a transformation of standard function and so the modes are coordinate-independent.
Conclusion
Two take-aways from this post.
First, be aware of the correct transformation of objects.
In particular, for a volume form , the Jacobian-determinant is part of the transformation of , not the function .
This way, we don’t have any problem with MAP estimation.
Second, it’s best to see things as a whole to avoid confusion.
For pdfs, write them holistically as Radon-Nikodym derivatives.
Then, the correct transformations can easily be applied without confusion.
References
Murphy, Kevin P. Machine learning: a probabilistic perspective. MIT Press, 2012.
Lee, John M. Introduction to Smooth Manifolds. 2003.