Suppose we have a model parameterized by parameter vector

that is, score function is the gradient of log likelihood function. The result about score function below is important building block on our discussion.

**Claim:**
The expected value of score wrt. our model is zero.

*Proof.* Below, the gradient is wrt.

But how certain are we to our estimate? We can define an uncertainty measure around the expected estimate. That is, we look at the covariance of score of our model. Taking the result from above:

We can then see it as an information. The covariance of score function above is the definition of Fisher Information. As we assume

However, usually our likelihood function is complicated and computing the expectation is intractable. We can approximate the expectation in

## Fisher and Hessian

One property of

**Claim:**
The negative expected Hessian of log likelihood is equal to the Fisher Information Matrix

*Proof.* The Hessian of the log likelihood is given by the Jacobian of its gradient:

where the second line is a result of applying quotient rule of derivative. Taking expectation wrt. our model, we have:

Thus we have

Indeed knowing this result, we can see the role of

## Conclusion

Fisher Information Matrix is defined as the covariance of score function. It is a curvature matrix and has interpretation as the negative expected Hessian of log likelihood function. Thus the immediate application of

One of the most exciting results of

## References

- Martens, James. “New insights and perspectives on the natural gradient method.” arXiv preprint arXiv:1412.1193 (2014).
- Ly, Alexander, et al. “A tutorial on Fisher information.” Journal of Mathematical Psychology 80 (2017): 40-55.