Caveats of the Cramer-Rao bound

I keep hearing the same questions about the Cramer-Rao lower error bound again and again over the years, so here I’ll collect some facts or intuition about the bound that address those questions and are hard to find in the literature. I will be as brief as possible, and you can find the technical details in books, such as Lehmann and Casella, van der Vaart, and the references below.

  1. Cramer and Rao were not the first provers of the bound; see wiki. For this reason it is also called the information inequality or the efficiency bound in statistics, but the name Cramer-Rao is so widespread that everyone outside statistics uses it for clarity.
  2. The simplest form of the Cramer-Rao bound applies only to unbiased estimators (https://en.wikipedia.org/wiki/Bias_of_an_estimator). It can be violated by biased estimators, which are actually very common. There is a simple modification of the bound for biased estimators, but you need to know the bias to apply the modification, and the bias depends on the estimator you use, so the modification is not very useful in practice.
  3. The maximum-likelihood estimator may often be biased. Under certain conditions, its bias asymptotically goes to zero with more samples, but for any finite sample size it may be biased and violate the Cramer-Rao bound by any amount.
  4. To make the Cramer-Rao bound a rigorous lower error bound (i.e., no-go theorem) without the unbiased assumption, mathematical statisticians such as Hajek and Le Cam have come up with amazing theorems such as the convolution theorem and the local asymptotic minimax theorem (see, e.g., Chap. 8 in van der Vaart’s book), which work in an asymptotic sense (infinite sample size). Beautiful mathematics, but in practice they are lipstick on a pig, because for any finite sample size a biased estimator can violate the Cramer-Rao bound by any amount at certain parameter values.
  5. The most airtight way to derive a lower error bound that works for any biased or unbiased estimator is to go Bayesian or minimax. For example, there exist Bayesian versions of the Cramer-Rao bound that apply to any estimator (often called the Van Trees inequality, but first discovered by Schutzenberger), and it’s easier to understand those convolution/local-minimax theorems through the Bayesian bound; see Gill and Levit and a recent paper of mine. For more Bayesian bounds, see, for example, the review + paper collection by Van Trees and Bell. For minimax bounds, see, for example, the book by Tsybakov.
  6. In terms of achievability, the maximum-likelihood estimator can asymptotically reach the Cramer-Rao bound under certain conditions, but the word “asymptotically” is again a big caveat. For any finite sample size, maximum-likelihood may violate the bound by any amount because it may be biased, as I previously mentioned, or it may also be much higher than the bound.
  7. [h/t Howard Wiseman] Sometimes the Cramer-Rao bound is not even asymptotically achievable. This happens, for example, when the probability distributions of the observation are identical for multiple parameter values, and no estimator can be consistent (i.e., converge to the true parameter), let alone efficient in the sense of attaining the Cramer-Rao bound.
  8. [h/t Vincent Tan] The regularity conditions for the bound deserve a mention, e.g., the support of the probability distributions of the observation should not vary with the parameter.
  9. People inexperienced in statistics (students, referees, talk audience, etc.) often express surprise or even anger when they first learn these caveats. Why do people still use the Cramer-Rao bound then? A few reasons:
    1. It is often easier to compute than the other options. For simple problems, it requires only undergrad calculus, and even for difficult problems such as infinite-dimensional models, it often remains tractable where other options are not; see the book by Bickel et al.. This simplicity is especially a virtue for non-statisticians, e.g., physicists, opticians, and biologists, who are uninterested or unable to learn the more advanced statistics.
    2. If you go Bayesian, you will have to assume a prior, which is a complication at best and a controversy at worst. The Bayesian bounds are also more difficult to compute.
    3. The minimax bounds are also difficult to derive usually. As minimax bounds are on the worst-case error, people may also complain about the worst-case error criterion being too pessimistic.
    4. The simplest version of the Cramer-Rao bound is what everyone uses so it’s become standard and accepted, whereas there are a whole zoo of Bayesian and minimax bounds, there’s no clear rule about which is better, and one is often paralyzed by the tyranny of choice.
    5. To obtain simple expressions for the Bayesian or minimax bounds, people often have to rely on a lot of inequalities or asymptotics, so the resulting expressions, even if one is able to derive them, are often very loose or have questionable relevance to practice. The Cramer-Rao bound, on the other hand, often just “works” in practice so there is little inclination to use the more advanced statistics.
  10. All these caveats also apply somewhat to quantum Cramer-Rao bounds, including the versions by Helstrom, Holevo, Nagaoka, Hayashi, and yours truly. Sometimes mathematicians like to pretend that they are relaxing the unbiased assumption by defining a “locally unbiased” condition, but it’s not really a meaningful generalization that can overcome any of the caveats.
  11. Some early papers on Helstrom’s version of the quantum Cramer-Rao bound, such as Nagaoka and Braunstein and Caves, are very sloppy about when a measurement achieves the quantum Fisher information, also called the Helstrom information. Their derivations lead to a measurement that may depend on the unknown parameter; the measurement shouldn’t know that obviously. In general, there may not exist a measurement that is parameter-independent and achieves the Helstrom information, and you need adaptive measurements over many copies of the quantum object to approach that asymptotically; see, e.g., Fujiwara, Hayashi’s book, and references therein.
  12. The one good thing in Braunstein and Caves is the proof that the Helstrom information is an upper bound on the classical Fisher information, but Nagaoka’s paper has done the same thing and is earlier. Again, you should ignore their claim that the upper bound is achievable; the measurement they assume may depend on the unknown parameter.
  13. The stuff in Braunstein and Caves about the Helstrom information being a Riemann metric is also confusing at best (it is a Riemann metric, but the way they define it is confusing); just ignore it and go to Uhlmann’s papers (a review by Uhlmann and Crell is here) if you need to learn about the Riemann metric for a density-operator space.
  14. All these caveats do not mean that the Cramer-Rao bound is a scam; you are invited to learn Bayesian and/or minimax statistics and derive better bounds without the caveats yourself. We and many others still use the Cramer-Rao bound knowing full well all these caveats because it remains a useful and widely accepted benchmark, and when we suspect that a caveat has a significant effect in practice we always try to address it carefully in our work.
  15. We also have tons of papers on Bayesian/minimax bounds (https://blog.nus.edu.sg/mankei/publications/) that you are welcome to read and—more importantly—cite! For example, this paper demonstrates that, for an imaging problem, the maximum-likelihood estimator can violate the Cramer-Rao bound by a lot, but it must still respect a Bayesian/minimax bound.

[Last update: 24 Aug 2023]