← Back to home

Robust Measures Of Scale

Oh, you want me to rewrite Wikipedia? How… quaint. Like asking a raven to explain the mating habits of pigeons. Fine. But don't expect sunshine and daisies. This is about outliers, after all. Messy things.


Statistical Indicators of the Deviation of a Sample

In the grim, unforgiving landscape of statistics, there exist measures of scale—quantifications of statistical dispersion—that refuse to be swayed by the capricious whims of outliers. These are the robust measures, the stoic survivors in a sample of numerical data, unfazed by the dramatic intrusions that send conventional metrics into a tailspin. We contrast these with the fragile, easily shattered metrics like the sample standard deviation, whose very existence can be imperiled by a single, errant observation.

The reigning monarchs of this robust domain are the interquartile range (IQR) and the median absolute deviation (MAD). But their reign isn't unchallenged. A cadre of alternative robust estimators has emerged, some built upon the unsettling foundation of pairwise differences, others on the peculiar concept of biweight midvariance.

These robust statistics serve as vigilant estimators of a scale parameter. Their virtue lies not just in their resilience, but in their superior efficiency when faced with contaminated data. The price? A lamentable dip in efficiency when the data is pristine, when it flows cleanly from distributions like the normal distribution. It’s a trade-off, a grim calculus of survival.

Consider the standard deviation: a single, audacious observation can inflate it to absurd proportions. Its breakdown point is a mere zero; it can be corrupted by a solitary point. This is a vulnerability robust statistics, in their grim wisdom, do not share.

And let us not forget the siren song of finance, where the assumption of normality can lure one onto the rocks of excessive risk. The subtle nuances of kurtosis demand further scrutiny, lest one be caught unawares.

Approaches to Estimation

Robust measures of scale are more than just descriptive tools; they are instruments for estimating the hidden properties of a population. They can be employed for parameter estimation or as estimators of their own expected value.

When they are used to estimate the population standard deviation, they are typically augmented by a scale factor. This scaling ensures they become unbiased and consistent estimators, aligning them with the population’s true measure of spread. This is the realm of scale parameter estimation. For instance, the interquartile range, when applied to data drawn from a normal distribution, can be transformed into an unbiased, consistent estimator of the population standard deviation. This transformation involves dividing the IQR by a specific constant:

1.349 ≈ 2

2

erf

− 1

(

1 2

)

{\displaystyle 1.4826\approx 2{\sqrt {2}},\operatorname {erf} ^{-1}!\left({\tfrac {1}{2}}\right)}

Here,

erf

− 1

{\displaystyle \operatorname {erf} ^{-1}}

represents the inverse error function, a curious mathematical construct.

In other scenarios, a robust measure of scale isn't an approximation of the population standard deviation but rather an estimator of its own expected value. It stands as an alternative measure of spread, particularly when the population standard deviation itself is ill-defined. The median absolute deviation (MAD) of a sample from a standard Cauchy distribution, for example, estimates the population MAD, which is a respectable 1. Meanwhile, the population variance for such a distribution… it simply doesn't exist. A void.

Statistical Efficiency

Robust estimators, by their very nature, are often less statistically efficient than their conventional counterparts when dealing with data that adheres strictly to a distribution without outliers, such as the normal distribution. They are the pragmatic survivors, not the flamboyant optimizers.

However, their superiority shines through when the data is a chaotic mix, a mixture distribution, or drawn from a heavy-tailed distribution. In these turbulent waters, non-robust measures like the standard deviation are not merely suboptimal; they are dangerously misleading.

For data originating from the normal distribution, the median absolute deviation offers only 37% of the efficiency of the sample standard deviation. The Rousseeuw–Croux estimator, Q n, fares better, achieving 88% efficiency. These figures are grim reminders of the cost of robustness.

Common Robust Estimators

The interquartile range (IQR) is perhaps the most widely recognized robust measure of scale. It is the chasm between the 75th and 25th percentiles of a sample. This is a 25% trimmed range—a specific type of L-estimator. Other trimmed ranges, such as the interdecile range (a 10% trimmed range), also find their place in this arsenal.

For data that conforms to a Gaussian distribution, the IQR is linked to the standard deviation, σ\sigma, by the following approximation: [1]

σ0.7413IQR=IQR/1.349\sigma \approx 0.7413 \operatorname{IQR} = \operatorname{IQR} / 1.349

Another stalwart in the robust estimation toolkit is the median absolute deviation (MAD). It’s the median of the absolute differences between each data point and the overall median of the dataset. For data from a Gaussian distribution, the MAD relates to σ\sigma thusly: [2]

σ1.4826MADMAD/0.6745\sigma \approx 1.4826 \operatorname{MAD} \approx \operatorname{MAD} / 0.6745

For a deeper dive into this relationship, one might consult the section on relation to standard deviation within the primary article on MAD.

Sn and Qn

Rousseeuw and Croux [3] introduced two alternatives to the Median Absolute Deviation, driven by its perceived shortcomings:

Their proposed statistics, Sn and Qn, are built upon the foundation of pairwise differences.

Sn is defined as:

σSn:=1.1926medi(medj(xixj))\sigma \approx S_{n} := 1.1926 \operatorname{med}_{i} \left( \operatorname{med}_{j} (\,\left|x_{i}-x_{j}\right|\,)\right)

And Qn is defined as: [4]

σQn:=2.2219  {xixj:i<j}(k)\sigma \approx Q_{n} := 2.2219 \;\big\{|x_{i}-x_{j}|:i<j\big\}_{(k)}

Where:

  • The constant 2.2219 is a factor of consistency.
  • The set {xixj:i<j}\{|x_{i}-x_{j}|:i<j\} comprises all possible absolute differences between pairs of observations, xix_{i} and xjx_{j}.
  • The subscript (k)(k) denotes the kk-th order statistic, with k(n2)/4k \approx \binom{n}{2} / 4.

These estimators can be computed with a time complexity of O(nlogn)O(n \log n) and a space complexity of O(n)O(n). Crucially, neither requires a location estimate, as they operate solely on the differences between data points. Their efficiency under a Gaussian distribution surpasses that of the MAD: Sn achieves 58%, while Qn reaches 82%.

For samples drawn from a normal distribution, Sn remains remarkably close to an unbiased estimator of the population standard deviation, even for modest sample sizes (less than 1% bias for n=10n=10).

When dealing with large samples from a normal distribution, 2.22Qn2.22 Q_{n} approximates an unbiased estimator of the population standard deviation. However, for smaller or moderate samples, the expected value of Qn under a normal distribution is significantly influenced by the sample size. In such cases, finite-sample correction factors, derived from tables or simulations, are employed to calibrate the scale of Qn.

The Biweight Midvariance

Much like Sn and Qn, the biweight midvariance aims for robustness without a drastic sacrifice in efficiency. Its definition is intricate:

ni=1n(xiQ)2(1ui2)4I(ui<1)(i(1ui2)(15ui2)I(ui<1))2\frac {n\displaystyle \sum _{i=1}^{n}\left(x_{i}-Q\right)^{2}\left(1-u_{i}^{2}\right)^{4}I(|u_{i}|<1)}{\left(\displaystyle \sum _{i}\left(1-u_{i}^{2}\right)\left(1-5u_{i}^{2}\right)I(|u_{i}|<1)\right)^{2}}

Here, II is the indicator function, QQ represents the sample median of the XiX_{i}, and:

ui=xiQ9MADu_{i}={\frac {x_{i}-Q}{9\cdot {\rm {MAD}}}}

The square root of this expression yields a robust estimator of scale. Its mechanism involves downweighting data points as their distance from the median increases, rendering points beyond 9 MAD units from the median entirely inconsequential.

The efficiency of the biweight has been estimated at approximately 84.7% for sets of 20 samples drawn from distributions with added excess kurtosis ("stretched tails"). For Gaussian distributions, its efficiency is remarkably high, estimated at 98.2%. [6]

Location-Scale Depth

Mizera and Müller [7] expanded upon the concepts introduced by Rousseeuw and Hubert, proposing a robust estimator for both location and scale simultaneously, termed location-scale depth. Its definition is as follows:

d(\mu ,\sigma )={\begin{cases}\displaystyle \inf _{u\neq 0}\#{\Bigl \{}i:(u_{1},u_{2}){\Bigl (}{\begin{array}{c}\psi (\tau _{i})\\\chi (\tau _{i})-1\end{array}}{\Bigr )}\geq 0{\Bigr \}},&{\text{if }}\sigma >0,\\[1.5ex]\#\{i:y_{i}=\mu \},&{\text{if }}\sigma =0.\end{cases}}}

Where:

  • τi\tau _{i} is a shorthand for (yiμ)/σ(y_{i}-\mu )/\sigma.
  • ψ\psi and χ\chi are functions dependent on a chosen density ff.

They posit that the most practical implementation of location-scale depth arises from using Student's t-distribution.

Confidence Intervals

This section may require refinement. It has been consolidated from Robust confidence intervals.

A robust confidence interval is a modification of standard confidence intervals, engineered to resist the corrupting influence of outlying or aberrant observations within a dataset.

Example

Imagine a scenario involving the weighing of 1000 objects. In practical settings, it's not unreasonable to suspect operator error, leading to an erroneous mass report—a form of systematic error. Suppose an operator weighs 100 objects, repeating the entire process ten times. For each object, the operator can calculate a sample standard deviation and then scrutinize for outliers. Any object exhibiting an unusually large standard deviation likely contains an outlier in its measurements. Various non-parametric techniques can be employed to excise these anomalies. If the operator only repeated the process thrice, a simple approach would be to take the median of the three measurements and use σ\sigma to construct a confidence interval. The 200 extraneous weighings, in this context, served merely to identify and rectify operator errors; they offered no improvement to the confidence interval itself. With more repetitions, one could resort to a truncated mean, discarding the extreme values and averaging the rest. A bootstrap calculation might then yield a narrower confidence interval than one derived from σ\sigma, thus extracting some benefit from the substantial additional effort.

These procedures are robust against procedural errors that deviate from the idealized assumption of a balance with a fixed, known standard deviation σ\sigma. In real-world applications, where operator errors are plausible and equipment can malfunction, the assumptions underpinning simple statistical calculations are often precarious. Before placing faith in confidence intervals calculated from σ\sigma for 100 objects weighed only three times each, it is imperative to test for and eliminate a reasonable number of outliers—validating the assumption of operator diligence while correcting for its imperfections. Furthermore, one must test the assumption that the data truly conforms to a normal distribution with standard deviation σ\sigma.

Computer Simulation

While the theoretical analysis of such an experiment can be complex, setting up a spreadsheet to simulate the situation by drawing random numbers from a normal distribution with standard deviation σ\sigma is straightforward. This can be achieved in Microsoft Excel using the formula =NORMINV(RAND(),0,σ)), as detailed in [8]. Similar techniques are applicable in other spreadsheet software such as OpenOffice.org Calc and Gnumeric.

After identifying and removing obvious outliers, one could subtract the median from the remaining two values for each object, and then examine the distribution of the resulting 200 numbers. This distribution should approximate a normal distribution with a mean close to zero and a standard deviation slightly larger than σ\sigma. A simple Monte Carlo simulation using a spreadsheet would reveal typical standard deviation values (ranging from approximately 105% to 115% of σ\sigma). Alternatively, one could subtract the mean of each triplet from the values and analyze the distribution of 300 numbers. In this case, the mean is inherently zero, but the standard deviation is expected to be somewhat smaller (around 75% to 85% of σ\sigma).


There. Does that satisfy your… need for information? It’s just data, really. Don’t read too much into it. Unless, of course, you have a particularly interesting outlier you’d like to discuss. I’m always open to a challenge.