← Back to home

Euclidean Distance

Look, if you must understand distance, let's get this over with. Don't expect me to hold your hand.


In the tedious world of mathematics, the Euclidean distance between two points in Euclidean space is, to put it simply, the length of the line segment connecting them. It’s the straightest path. The one you can't take in real life because there's always something in the way. You can calculate this length using the Cartesian coordinates of the points and a tired, old theorem attributed to Pythagoras. For this reason, some people, presumably for lack of a better nickname, call it the Pythagorean distance.

These names are relics, tributes to the ancient Greek mathematicians Euclid and Pythagoras. In the rigid, deductive geometry that Euclid championed in his Elements, distances weren't abstract numbers. That would have been too convenient. Instead, they were represented by line segments of the same length, which were then declared "equal." The entire concept is baked into the function of a compass, a tool used to draw a circle, which is nothing more than a collection of points all equidistant from a single center point. The revolutionary leap of connecting the Pythagorean theorem to a numerical distance calculation didn't happen until the 18th century, because progress, as you know, is glacially slow.

The distance between two objects that aren't simple points is usually defined as the shortest possible distance between any pair of points, one from each object—a minimalist's approach to a complex problem. There are, of course, formulas for specific cases, like the distance from a point to a line. In the more esoteric realms of advanced mathematics, the idea of distance has been abstracted into things called metric spaces, where distances other than Euclidean are not only possible but studied with alarming enthusiasm. And in some corners of statistics and optimization, for reasons of convenience we'll get to later, they often use the square of the Euclidean distance instead of the distance itself. Try to keep up.

Distance formulas

One dimension

Let's start with the training wheels. The distance between any two points on the real line is just the absolute value of the numerical difference between their coordinates. It's their absolute difference. So, if you have two points, p and q, marooned on this line, the distance between them is given by a formula so simple it’s almost insulting:

d(p,q) = |p-q|

There is a more convoluted formula that gives you the exact same result but has the dubious honor of generalizing to higher dimensions. It looks like this:

d(p,q) = √((p-q)²)

In this version, the act of squaring and then taking the square root brutally forces any negative number to become positive while leaving positive numbers untouched. It's a mathematical sledgehammer to crack a nut, but it's the one we're stuck with.

Two dimensions

Now, for the Euclidean plane, where things get slightly more interesting. Let's say point p has Cartesian coordinates (p₁, p₂) and point q has coordinates (q₁, q₂). The distance between them is given by:

d(p,q) = √((p₁-q₁)² + (p₂-q₂)²)

You can derive this by applying the Pythagorean theorem to a right triangle constructed with horizontal and vertical sides, where the line segment from p to q serves as the hypotenuse. The two squared terms inside the square root represent the areas of the squares on the horizontal and vertical legs of the triangle. The outer square root then graciously converts the area of the square on the hypotenuse back into the length of the hypotenuse itself. It's a neat trick, I suppose.

For those who prefer their calculations pre-packaged, this can also be expressed using Pythagorean addition, an operation available in many software libraries as hypot. The formula then becomes a cryptic line of code:

d(p,q) = (p₁-q₁) ⊕ (p₂-q₂) = hypot(p₁-q₁, p₂-q₂)

Naturally, it's also possible to calculate this for points defined by polar coordinates, for those who enjoy making things difficult. If p has polar coordinates (r, θ) and q has (s, ψ), their distance is found using the law of cosines, a cousin of the Pythagorean theorem that has to work a bit harder:

d(p,q) = √(r² + s² - 2rs cos(θ-ψ))

And if p and q are expressed as complex numbers in the complex plane, you can use the same formula as the one-dimensional case. The absolute value sign here, however, indicates the complex norm, a slightly more sophisticated concept:

d(p,q) = |p-q|

Higher dimensions

In three dimensions, which is where you live, the formula is extended with a predictable lack of imagination. For points given by their Cartesian coordinates, the distance is:

d(p,q) = √((p₁-q₁)² + (p₂-q₂)² + (p₃-q₃)²)

In general, for points in an n-dimensional Euclidean space, you just keep adding terms. The distance is:

d(p,q) = √((p₁-q₁)² + (p₂-q₂)² + ⋯ + (pₙ-qₙ)²)

This can also be written more compactly, presumably to save ink, using the Euclidean norm of the Euclidean vector difference. It's just a fancy shorthand:

d(p,q) = ||p-q||

Objects other than points

When dealing with pairs of objects that are not both single points, the distance is most simply defined as the smallest distance between any two points from the two objects. It's the path of least resistance. Of course, more complicated generalizations like the Hausdorff distance exist for those who find simplicity boring.

Known formulas for these scenarios include:

The distance from a point to a curve can also be used to define its parallel curve, which is another curve whose points all maintain the same standoffish distance from the original.

Properties

Euclidean distance is the poster child for distance in a metric space. It dutifully obeys all the defining properties, which are as follows:

  • It is symmetric. For any two points p and q, d(p,q) = d(q,p). This means that, unlike a bitter argument or a city with one-way streets, the distance between two points doesn't depend on which one is the start and which is the destination. A rare instance of fairness.
  • It is positive. The distance between any two distinct points is a positive number. The distance from a point to itself is, mercifully, zero.
  • It obeys the triangle inequality. For any three points p, q, and r, d(p,q) + d(q,r) ≥ d(p,r). In layman's terms, traveling from p to r by way of q can never be shorter than going directly from p to r. The universe offers no shortcuts, a lesson it insists on teaching repeatedly.

Another property, Ptolemy's inequality, concerns the distances among four points p, q, r, and s. It states that:

d(p,q)⋅d(r,s) + d(q,r)⋅d(p,s) ≥ d(p,r)⋅d(q,s)

For points in a plane, this can be understood by imagining a quadrilateral. The sum of the products of its opposite sides is always greater than or equal to the product of its diagonals. Ptolemy's inequality, however, applies more generally to points in Euclidean spaces of any dimension, regardless of their arrangement. For points in metric spaces that are not Euclidean, this inequality may gleefully fail. The field of Euclidean distance geometry obsesses over properties like this, using them to test whether a given set of distances could possibly come from points in a Euclidean space.

According to the Beckman–Quarles theorem, any transformation of a Euclidean space that preserves unit distances must be an isometry, meaning it preserves all distances. It's a principle of surprising rigidity.

Squared Euclidean distance

In many applications, particularly when comparing distances, it's more convenient to just omit the final square root. The square root is a monotonic function; it doesn't change the order of the values. If d₁² > d₂², it follows that d₁ > d₂. The value you get from this omission is the square of the Euclidean distance, logically named the squared Euclidean distance.

For example, a Euclidean minimum spanning tree can be built using only the ordering of distances, not their actual values. Comparing squared distances gives the same result but neatly avoids the computational cost of square roots and sidesteps potential issues with numerical precision. As an equation, it's a simple sum of squares:

d²(p,q) = (p₁-q₁)² + (p₂-q₂)² + ⋯ + (pₙ-qₙ)²

Beyond this practical shortcut, squared Euclidean distance is fundamentally important in statistics. It's the core of the method of least squares, a standard technique for fitting models to data by minimizing the average of the squared distances between observed and estimated values. It also serves as the simplest form of divergence for comparing probability distributions. Adding squared distances together, as in least squares, corresponds to an operation on the original distances known as Pythagorean addition. In cluster analysis, using squared distances can amplify the effect of longer distances, making outliers even more pronounced.

Be warned: squared Euclidean distance does not form a metric space because it violates the triangle inequality. It is, however, a smooth, strictly convex function of the two points. The standard distance, by contrast, is not smooth (near pairs of equal points) and is convex but not strictly so. This smoothness makes the squared distance a favorite in optimization theory, where it allows the tools of convex analysis to be used. Since squaring is a monotonic function for non-negative values, minimizing the squared distance is equivalent to minimizing the Euclidean distance. The optimization problem is the same, but one version is far easier to solve.

The set of all squared distances between pairs of points in a finite set can be stored in a Euclidean distance matrix, a format used extensively in distance geometry.

Generalizations

In the more abstract corners of mathematics, when Euclidean space is viewed as a vector space, its distance is tied to a norm called the Euclidean norm, defined as the distance of any vector from the origin. One of its most important properties is that it remains unchanged under arbitrary rotations of space around the origin. According to Dvoretzky's theorem, every finite-dimensional normed vector space has a high-dimensional subspace where the norm is approximately Euclidean. The Euclidean norm is the only one with this property. This concept can be extended to infinite-dimensional vector spaces as the L² norm or L² distance. The Euclidean distance also gives Euclidean space the structure of a topological space, the Euclidean topology, with open balls serving as its neighborhoods.

Other common ways to measure distance in real coordinate spaces and function spaces include:

  • Chebyshev distance (L∞ distance), which measures distance as the maximum of the distances along each coordinate. It's the lazy way.
  • Taxicab distance (L¹ distance), or Manhattan distance, which measures distance as the sum of the absolute differences of the coordinates, as if you were a taxi forced to navigate a grid of streets.
  • Minkowski distance (Lᵖ distance), a generalization that manages to unify Euclidean, taxicab, and Chebyshev distances under one intimidating formula.

For points on surfaces in three dimensions, the Euclidean distance (a straight line through the object) must be distinguished from the geodesic distance (the shortest path along the surface). For measuring great-circle distances on Earth or other spherical bodies, you can't just tunnel through the planet. Instead, you use specialized tools like the haversine distance, which calculates great-circle distances from longitude and latitude, or Vincenty's formulae for even more precise measurements on a spheroid.

History

Euclidean distance is the distance in Euclidean space. Both are named after the ancient Greek mathematician Euclid, whose Elements was the standard geometry textbook for centuries, a testament to how long a good idea can last. The concepts of length and distance are, of course, far older, appearing in some of the earliest "protoliterate" bureaucratic documents from Sumer in the fourth millennium BC. It seems humanity has always been obsessed with measurement. It's even been hypothesized that children develop an understanding of distance before they grasp the related concepts of speed and time.

But the idea of distance as a single number defined from two points doesn't actually appear in Euclid's Elements. He approached it implicitly, through the congruence of line segments and the concept of proportionality.

The Pythagorean theorem is also ancient, but it couldn't take its central role in measuring distances until René Descartes invented Cartesian coordinates in 1637, finally bridging the gap between geometry and algebra. The distance formula itself was first published in 1731 by Alexis Clairaut. Because of this lineage, Euclidean distance is sometimes called Pythagorean distance.

While cultures have been accurately measuring long, non-Euclidean distances on the Earth's surface for ages (see history of geodesy), the notion that Euclidean distance might not be the only way to measure distance between points in a mathematical space didn't emerge until the 19th-century formulation of non-Euclidean geometry. The definition of the Euclidean norm and distance for geometries of more than three dimensions also first appeared in the 19th century, in the work of Augustin-Louis Cauchy, who evidently decided three dimensions weren't enough of a headache.