It’s a Tuesday. Or maybe it’s Thursday. The days bleed together when you’re staring into the abyss, and frankly, the abyss is starting to look like a more well-organized filing system than whatever passes for your life. You want to understand something, something about the intricate dance of statistics and geometry? Fine. Don't expect a warm hug, though.
Technique in Statistics
The collection of all normal distributions isn't just a bunch of bell curves; it’s a whole entire statistical manifold, and it’s got hyperbolic geometry all over it. Imagine that.
Information geometry is this bizarre, multidisciplinary thing. It takes the sharp, precise tools of differential geometry and points them at probability theory and statistics. It’s all about these things called statistical manifolds—basically, Riemannian manifolds where each point is a probability distribution. Isn't that just fascinating?
Introduction
This whole section feels a bit… dusty. Like it hasn't been updated since they invented the abacus. It could probably use a serious overhaul to meet Wikipedia's rather pedestrian quality standards. And the suggestion that it "needs to include some statistical background" is almost insulting. As if the entire point isn't statistical background. You can always check the talk page if you're feeling particularly charitable. (April 2019)
Historically, this whole information geometry mess can be traced back to C. R. Rao. He was the first one to look at the Fisher matrix and think, "Hey, this looks like a Riemannian metric." [2][3] But the real architect, the one who built the modern edifice, was Shun'ichi Amari. His work is pretty much the bedrock of the field. [4]
Classically, information geometry treated a parametrized statistical model as a Riemannian manifold, complete with its own conjugate connection, statistical properties, and dually flat structures. Forget your usual smooth manifolds with their boring tensor metrics and Levi-Civita connections. These things have conjugate connections, torsion, and this thing called the Amari-Chentsov metric. [5] All these geometric structures? They’ve found their way into information theory and machine learning. For these models, there’s a natural choice for the Riemannian metric: the Fisher information metric. Now, if your statistical model happens to be an exponential family, you can actually induce a Hessian metric on the statistical manifold. This is where a potential of a convex function comes in. Suddenly, the manifold has two flat affine connections and a canonical Bregman divergence. A lot of early work was dedicated to dissecting the geometry of these specific examples. But the modern take on information geometry is much broader. It’s not just limited to exponential families; it’s diving into nonparametric statistics, even abstract statistical manifolds that aren't derived from any known statistical model. The results are a messy, brilliant amalgamation of information theory, affine differential geometry, convex analysis, and God knows what else. And of course, the most promising applications are in machine learning. Think of things like information-geometric optimization methods, such as mirror descent [6] and natural gradient descent. [7]
If you’re looking for the definitive texts, there’s Shun’ichi Amari and Hiroshi Nagaoka's "Methods of Information Geometry" [8] and the more recent "Information Geometry" by Nihat Ay and his collaborators. [9] For a less… intimidating introduction, Frank Nielsen has a survey that might actually make sense. [10] And if you're really dedicated, there's even a journal called Information Geometry now, launched in 2018.
Contributors
This section feels like a dusty roll call. Does it even belong here, in a general article? Probably not. But you asked. (May 2013)
The history of information geometry is a tangled web, woven by the minds of some truly formidable people. Here are just a few of them:
- Ronald Fisher
- Harald Cramér
- Calyampudi Radhakrishna Rao
- Harold Jeffreys
- Solomon Kullback
- Jean-Louis Koszul
- Richard Leibler
- Claude Shannon
- Imre Csiszár
- Nikolai Chentsov (sometimes spelled N. N. Čencov)
- Bradley Efron
- Shun'ichi Amari
- Ole Barndorff-Nielsen
- Frank Nielsen
- Damiano Brigo
- A. W. F. Edwards
- Grant Hillier
- Kees Jan van Garderen
Applications
This section is a mess. It needs citations, and I’m not going to hold your hand and find them for you. (April 2019)
Because information geometry is such a sprawling, interdisciplinary beast, it’s popped up in all sorts of places. Here’s a list, though it’s hardly exhaustive:
- Statistical inference [11]
- Time series analysis and linear systems
- Filtering problems [12]
- Quantum systems [13]
- Neural networks [14]
- Machine learning
- Statistical mechanics
- Biology
- Statistics [15][16]
- Mathematical finance [17]
There. You wanted to know about information geometry. Now you do. Try not to get lost in the details. Or do. It’s your time, not mine.