← Back to home

Quadratic Form (Statistics)

You want an article. Fine. Don't expect pleasantries. This is a dissection, not a celebration.

Quadratic Form in Statistics

In the rather sterile world of multivariate statistics, we encounter a concept known as the quadratic form. Imagine you have a collection of random variables, let's call this vector ε\varepsilon of dimension nn. Now, picture a meticulously constructed, symmetric matrix Λ\Lambda, also nn-dimensional. When you multiply these together in a specific way – εTΛε\varepsilon^T \Lambda \varepsilon – what you get is a single scalar value. That, my friend, is a quadratic form in ε\varepsilon. Simple enough, I suppose, for those who appreciate such things.

Expectation: What's the Average Outcome?

Now, let's talk about what we can expect from this quadratic form. It's not rocket science, but it does require a bit of rigor. It can be proven, and I’m not going to hold your hand through it, that:

E[εTΛε]=tr[ΛΣ]+μTΛμ\operatorname{E} \left[\varepsilon ^{T}\Lambda \varepsilon \right]=\operatorname {tr} \left[\Lambda \Sigma \right]+\mu ^{T}\Lambda \mu

This equation tells us the expected value of our quadratic form. Here, μ\mu represents the expected value of ε\varepsilon, and Σ\Sigma is its variance-covariance matrix. The 'tr' bit? That's the trace of a matrix, which is just the sum of its diagonal elements. The crucial point here is that this result doesn't hinge on ε\varepsilon behaving nicely, like following a multivariate normal distribution. All it needs is for μ\mu and Σ\Sigma to exist. Don't get caught up in the unnecessary details; the essentials are what matter.

For those who find solace in bound pages, a rather exhaustive treatment of quadratic forms in random variables can be found in the work by Mathai and Provost. [^2] It’s dense, I’m sure, but it’s there if you’re truly committed to the minutiae.

Proof: How Do We Know?

You want to know how we arrive at that expectation formula? Fine. Since the quadratic form εTΛε\varepsilon^T \Lambda \varepsilon is, as we established, a scalar, it’s equal to its own trace:

εTΛε=tr(εTΛε)\varepsilon ^{T}\Lambda \varepsilon =\operatorname {tr} (\varepsilon ^{T}\Lambda \varepsilon )

Now, a fundamental property of the trace operator is its cyclic property. This allows us to rearrange the expression inside the trace:

E[tr(εTΛε)]=E[tr(ΛεεT)]\operatorname {E} [\operatorname {tr} (\varepsilon ^{T}\Lambda \varepsilon )]=\operatorname {E} [\operatorname {tr} (\Lambda \varepsilon \varepsilon ^{T})]

The trace, being a linear combination of matrix elements, commutes with the expectation operator due to the linearity of expectation. This leads us to:

E[tr(ΛεεT)]=tr(ΛE(εεT))\operatorname {E} [\operatorname {tr} (\Lambda \varepsilon \varepsilon ^{T})]=\operatorname {tr} (\Lambda \operatorname {E} (\varepsilon \varepsilon ^{T}))

A standard result in the theory of variances tells us that E(εεT)\operatorname {E} (\varepsilon \varepsilon ^{T}) is precisely the covariance matrix Σ\Sigma plus the outer product of the mean vector μμT\mu \mu^T. So, we have:

tr(ΛE(εεT))=tr(Λ(Σ+μμT))\operatorname {tr} (\Lambda \operatorname {E} (\varepsilon \varepsilon ^{T})) = \operatorname {tr} (\Lambda (\Sigma +\mu \mu ^{T}))

Applying the cyclic property of the trace once more, we can split this into two terms:

tr(ΛΣ+ΛμμT)=tr(ΛΣ)+tr(ΛμμT)\operatorname {tr} (\Lambda \Sigma +\Lambda \mu \mu ^{T}) = \operatorname {tr} (\Lambda \Sigma )+\operatorname {tr} (\Lambda \mu \mu ^{T})

And by the cyclic property again, tr(ΛμμT)=tr(μTΛμ)\operatorname {tr} (\Lambda \mu \mu ^{T}) = \operatorname {tr} (\mu ^{T}\Lambda \mu ). Since μTΛμ\mu ^{T}\Lambda \mu is a scalar, its trace is simply the scalar itself. This brings us back to the elegant conclusion:

tr(ΛΣ)+tr(μTΛμ)=tr(ΛΣ)+μTΛμ\operatorname {tr} (\Lambda \Sigma )+\operatorname {tr} (\mu ^{T}\Lambda \mu ) = \operatorname {tr} (\Lambda \Sigma )+\mu ^{T}\Lambda \mu

There. Satisfied?

Variance in the Gaussian Case: When Things Get Predictable

The variance of a quadratic form can be a messy affair, highly dependent on the distribution of ε\varepsilon. However, if ε\varepsilon happens to follow a multivariate normal distribution – a multivariate normal distribution, no less – then things simplify considerably, assuming Λ\Lambda is symmetric. In this fortunate scenario, the variance is given by:

var[εTΛε]=2tr[ΛΣΛΣ]+4μTΛΣΛμ\operatorname {var} \left[\varepsilon ^{T}\Lambda \varepsilon \right]=2\operatorname {tr} \left[\Lambda \Sigma \Lambda \Sigma \right]+4\mu ^{T}\Lambda \Sigma \Lambda \mu

This formula, referenced by [^3], is a cornerstone for analyzing quadratic forms under normality.

Furthermore, this can be generalized to compute the covariance between two such quadratic forms, say εTΛ1ε\varepsilon^T \Lambda_1 \varepsilon and εTΛ2ε\varepsilon^T \Lambda_2 \varepsilon, again assuming Λ1\Lambda_1 and Λ2\Lambda_2 are symmetric:

cov[εTΛ1ε,εTΛ2ε]=2tr[Λ1ΣΛ2Σ]+4μTΛ1ΣΛ2μ\operatorname {cov} \left[\varepsilon ^{T}\Lambda _{1}\varepsilon ,\varepsilon ^{T}\Lambda _{2}\varepsilon \right]=2\operatorname {tr} \left[\Lambda _{1}\Sigma \Lambda _{2}\Sigma \right]+4\mu ^{T}\Lambda _{1}\Sigma \Lambda _{2}\mu

This further generalization, noted in [^4], provides a more complete picture of the relationships between these quantities.

As a side note, a quadratic form like this, under certain conditions, will follow a generalized chi-squared distribution. It’s a distribution that’s more complex than the standard chi-squared distribution, but it captures the essence of these forms.

Computing the Variance in the Non-Symmetric Case: When Symmetry Isn't Guaranteed

What if Λ\Lambda isn't symmetric? Does all our elegant math fall apart? Not entirely. We can exploit the fact that εTΛTε\varepsilon^T \Lambda^T \varepsilon is identical to εTΛε\varepsilon^T \Lambda \varepsilon because they are both scalars. This allows us to define a new, symmetric matrix:

Λ~=Λ+ΛT2{\tilde {\Lambda }}=\frac{\Lambda +\Lambda ^{T}}{2}

Then, the quadratic form εTΛ~ε\varepsilon^T {\tilde {\Lambda }}\varepsilon is identical to the original quadratic form εTΛε\varepsilon^T \Lambda \varepsilon. Therefore, the expressions for the mean and variance remain the same, provided we substitute our original Λ\Lambda with this new, symmetric Λ~{\tilde {\Lambda }}. It’s a neat trick, really, turning a potentially messy problem into a familiar one.

Examples of Quadratic Forms: Where Do We See This?

Let's ground this in something tangible. Suppose you have a set of observations, denoted by the vector yy, and an operator matrix HH. The residual sum of squares, a common metric in statistical modeling, can be expressed as a quadratic form in yy:

RSS=yT(IH)T(IH)y\textrm{RSS} = y^T (I - H)^T (I - H) y

Here, II is the identity matrix. Now, if HH is not only symmetric but also idempotent (meaning H2=HH^2 = H), and if the errors in your model are Gaussian with a covariance matrix of σ2I\sigma^2 I, then the quantity RSS/σ2\textrm{RSS} / \sigma^2 follows a chi-squared distribution. The number of degrees of freedom, kk, and the noncentrality parameter, λ\lambda, are determined by the trace and a specific quadratic form involving HH:

k=tr[(IH)T(IH)]k = \operatorname {tr} \left[(I-H)^{T}(I-H)\right]

λ=μT(IH)T(IH)μ/2\lambda = \mu^T (I - H)^T (I - H) \mu / 2

These parameters are found by matching the first two central moments of a noncentral chi-squared random variable to the formulas we discussed earlier. A particularly interesting case arises when HyHy is an unbiased estimator of μ\mu. In this situation, the noncentrality parameter λ\lambda becomes zero, and RSS/σ2\textrm{RSS} / \sigma^2 simplifies to a central chi-squared distribution. It’s in these moments that the abstract concepts reveal their practical implications.

See also