← Back to homeAndrey Sakharov

Orthogonal Matrix

For those matrices whose elements are from the complex number field, consult the entry on unitary matrices.

This particular article, while containing a commendable list of general references, unfortunately suffers from a distinct lack of sufficient corresponding inline citations. It is therefore imperative that this deficit be addressed, and this article improved by the introduction of more precise citations. (May 2023) ( Learn how and when to remove this message )

Orthogonal Matrix

In the realm of linear algebra, an orthogonal matrix, often also referred to as an orthonormal matrix, is a specific type of real square matrix. Its defining characteristic lies in the fact that both its columns and its rows are composed of orthonormal vectors.

This property can be succinctly expressed through the following matrix equation:

QTQ=QQT=IQ^{\mathrm {T} }Q=QQ^{\mathrm {T} }=I

Here, QTQ^{\mathrm {T} } denotes the transpose of the matrix QQ, and II stands for the identity matrix.

From this fundamental definition, an equivalent characterization emerges: a matrix QQ is deemed orthogonal if and only if its transpose is precisely equal to its inverse:

QT=Q1Q^{\mathrm {T} }=Q^{-1}

where Q1Q^{-1} represents the inverse of QQ.

An orthogonal matrix QQ, by its very nature, is guaranteed to be invertible. Its inverse is readily available, being its own transpose (Q1=QTQ^{-1} = Q^{\mathrm {T} }). Furthermore, it is also a unitary matrix (where Q1=QQ^{-1} = Q^{*}), with QQ^{*} being the Hermitian adjoint (or conjugate transpose) of QQ. Consequently, it is also a normal matrix (QQ=QQQ^{*}Q = QQ^{*}) when considered over the real numbers.

The determinant of any orthogonal matrix is restricted to one of two values: either +1 or −1. As a linear transformation, an orthogonal matrix possesses the remarkable ability to preserve the inner product of vectors. This means it acts as an isometry within Euclidean space, transforming vectors without altering their lengths or the angles between them. Such transformations can manifest as rotations, reflections, or combinations thereof, often termed rotoreflections. In essence, it is a unitary transformation.

The collection of all n×nn \times n orthogonal matrices, when subjected to the operation of matrix multiplication, forms a mathematical structure known as a group. This specific group is designated as O(nn) and is famously known as the orthogonal group. Within this group resides a significant subgroup denoted as SO(nn), which comprises all orthogonal matrices possessing a determinant of +1. The elements of SO(nn) are referred to as special orthogonal matrices, and each one, when interpreted as a linear transformation, exclusively represents a rotation.

Overview

A visual intuition for multiplication by the transpose of a matrix can be helpful. Imagine an orthogonal matrix AA and its transpose ATA^{\mathrm {T} }. The ijij-th element of the product AATAA^{\mathrm {T} } will be zero if iji \neq j. This occurs because the ii-th row of AA is, by definition, orthogonal to the jj-th row of AA.

An orthogonal matrix can be seen as a specific instance of a unitary matrix, tailored for real numbers, and thus it invariably qualifies as a normal matrix. While our focus here is on real matrices, the fundamental definition can be extended to matrices whose entries are drawn from any field. However, the natural setting for orthogonal matrices is intrinsically linked to dot products. When dealing with matrices of complex numbers, the requirement shifts towards that of a unitary matrix. The defining characteristic of orthogonal matrices is their preservation of the dot product. For any two vectors u\mathbf{u} and v\mathbf{v} residing in an nn-dimensional real Euclidean space, the following equality holds:

uv=(Qu)(Qv)\mathbf{u} \cdot \mathbf{v} = (Q\mathbf{u}) \cdot (Q\mathbf{v})

provided QQ is an orthogonal matrix. To grasp the connection to inner products, consider a vector v\mathbf{v} in an nn-dimensional real Euclidean space. When this vector is expressed with respect to an orthonormal basis, its squared length is given by vTv\mathbf{v}^{\mathrm {T} }\mathbf{v}. If a linear transformation, represented by the matrix QvQ\mathbf{v}, preserves vector lengths, then the following relationship must hold:

vTv=(Qv)T(Qv)=vTQTQv\mathbf{v}^{\mathrm {T} }\mathbf{v} = (Q\mathbf{v})^{\mathrm {T} }(Q\mathbf{v}) = \mathbf{v}^{\mathrm {T} }Q^{\mathrm {T} }Q\mathbf{v}

Consequently, linear isometries in finite-dimensional spaces—which include rotations, reflections, and their composites—are precisely those transformations that correspond to orthogonal matrices. The converse is equally true: orthogonal matrices embody orthogonal transformations. It is important to note, however, that the study of linear algebra encompasses orthogonal transformations between spaces that might be neither finite-dimensional nor of the same dimension; these transformations do not have a direct equivalent in the form of an orthogonal matrix.

Orthogonal matrices hold significant importance for a multitude of reasons, spanning both theoretical elegance and practical utility. The set of n×nn \times n orthogonal matrices, under the operation of matrix multiplication, constitutes the group O(nn), known as the orthogonal group. This group, along with its various subgroups, finds extensive application across mathematics and the physical sciences. A prime example is the point group characterizing the symmetry of a molecule, which is always a subgroup of O(3). Furthermore, due to their favorable properties in computations involving floating-point numbers, orthogonal matrices are absolutely crucial to numerous algorithms in numerical linear algebra, such as the ubiquitous QR decomposition. As another illustration, the discrete cosine transform, a cornerstone of data compression techniques like MP3, can be represented by an orthogonal matrix when appropriately normalized.

Examples

The following examples showcase some fundamental orthogonal matrices and offer a glimpse into their geometric interpretations:

  • [1001]\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} This is the identity matrix, representing the identity transformation – it leaves all vectors unchanged.

  • [cosθsinθsinθcosθ]\begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix} This matrix represents a rotation about the origin by an angle θ\theta.

  • [1001]\begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} This matrix corresponds to a reflection across the xx-axis.

  • [0001001010000100]\begin{bmatrix} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix} This is a permutation matrix that rearranges the coordinate axes.

Elementary Constructions
Lower Dimensions

In one dimension, the simplest orthogonal matrices are the 1×11 \times 1 matrices [1] and [−1]. These can be interpreted as the identity transformation and a reflection through the origin on the real line, respectively.

For 2×22 \times 2 matrices, the general form is:

[ptqu]\begin{bmatrix} p & t \\ q & u \end{bmatrix}

The condition of orthogonality imposes three specific equations on the elements p,t,q,up, t, q, u:

1=p2+t21=q2+u20=pq+tu\begin{aligned} 1 &= p^{2}+t^{2} \\ 1 &= q^{2}+u^{2} \\ 0 &= pq+tu \end{aligned}

Considering the first equation, we can, without loss of generality, set p=cosθp = \cos \theta and t=sinθt = \sin \theta. This implies two possibilities for qq and uu: either q=sinθq = -\sin \theta and u=cosθu = \cos \theta, or q=sinθq = \sin \theta and u=cosθu = -\cos \theta. The first case, where q=sinθq = -\sin \theta and u=cosθu = \cos \theta, corresponds to a rotation by an angle θ\theta (with θ=0\theta = 0 yielding the identity matrix). The second case, where q=sinθq = \sin \theta and u=cosθu = -\cos \theta, represents a reflection across a line that makes an angle of θ/2\theta/2 with the positive xx-axis.

[cosθsinθsinθcosθ](rotation),[cosθsinθsinθcosθ](reflection)\begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix} \quad \text{(rotation),} \qquad \begin{bmatrix} \cos \theta & \sin \theta \\ \sin \theta & -\cos \theta \end{bmatrix} \quad \text{(reflection)}

A particularly interesting special case of the reflection matrix occurs when θ=90\theta = 90^\circ. This yields the matrix:

[0110]\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}

This matrix represents a reflection across the line y=xy=x (at a 4545^\circ angle) and, as a consequence, it swaps the xx and yy coordinates. This is also a permutation matrix, characterized by having exactly one '1' in each row and each column, with all other entries being zero. The identity matrix is also a permutation matrix.

A reflection is an involutory matrix, meaning it is its own inverse. This property implies that a reflection matrix must be symmetric (i.e., equal to its transpose), in addition to being orthogonal. The product of two rotation matrices results in another rotation matrix, and the product of two reflection matrices also yields a rotation matrix.

Higher Dimensions

While it is always possible to categorize orthogonal matrices as purely rotational or not, regardless of the dimension, the non-rotational matrices in dimensions 3×33 \times 3 and higher can exhibit complexity beyond simple reflections. Consider these examples:

[100010001]and[010100001]\begin{bmatrix} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & -1 \end{bmatrix} \quad \text{and} \quad \begin{bmatrix} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & -1 \end{bmatrix}

The first matrix represents an inversion through the origin, effectively negating all coordinates. The second matrix depicts a rotoinversion – a rotation combined with an inversion – specifically about the zz-axis.

Rotations themselves become more intricate in higher dimensions. They can no longer be fully described by a single angle; instead, they may involve transformations within multiple planar subspaces simultaneously. While a 3×33 \times 3 rotation matrix can be conveniently described by its axis and angle, this descriptive approach is limited to three dimensions. For dimensions greater than three, two or more angles are required, each associated with a specific plane of rotation.

Despite this increased complexity, we possess fundamental building blocks for constructing permutations, reflections, and rotations that are universally applicable across all dimensions.

Primitives

The most basic form of permutation is a transposition, which can be obtained from the identity matrix by simply swapping two of its rows. Any n×nn \times n permutation matrix can be constructed as a product of at most n1n-1 such transpositions.

A Householder reflection is generated from a non-null vector v\mathbf{v} using the following formula:

Q=I2vvTvTvQ = I - 2 \frac{\mathbf{v}\mathbf{v}^{\mathrm {T} }}{\mathbf{v}^{\mathrm {T} }\mathbf{v}}

In this expression, the numerator vvT\mathbf{v}\mathbf{v}^{\mathrm {T} } is a symmetric matrix, while the denominator vTv\mathbf{v}^{\mathrm {T} }\mathbf{v} is a scalar representing the squared magnitude of v\mathbf{v}. Geometrically, this operation performs a reflection within the hyperplane that is perpendicular to v\mathbf{v}, effectively negating any component of a vector that lies parallel to v\mathbf{v}. If v\mathbf{v} is a unit vector, the formula simplifies to Q=I2vvTQ = I - 2 \mathbf{v}\mathbf{v}^{\mathrm {T} }. Householder reflections are particularly useful in numerical algorithms for systematically zeroing out the elements below the main diagonal in a column. It is a remarkable property that any n×nn \times n orthogonal matrix can be constructed as a product of at most nn such reflections.

A Givens rotation operates within a two-dimensional subspace, specifically one spanned by two coordinate axes, and rotates vectors within this plane by a predetermined angle. These rotations are typically employed to introduce a zero at a specific subdiagonal position within a matrix. Any n×nn \times n rotation matrix can be constructed as a product of at most n(n1)2\frac{n(n-1)}{2} such planar rotations. For 3×33 \times 3 matrices, three Givens rotations are sufficient to generate any rotation. By fixing the sequence of these rotations, it becomes possible to describe all 3×33 \times 3 rotation matrices (though not always uniquely) in terms of three angles, commonly referred to as Euler angles.

A Jacobi rotation shares the same mathematical form as a Givens rotation but is designed to simultaneously zero out both off-diagonal entries within a 2×22 \times 2 symmetric submatrix.

Properties

Matrix Properties

A real square matrix is orthogonal if and only if its columns form an orthonormal basis for the Euclidean space Rn\mathbb{R}^n equipped with the standard Euclidean dot product. Equivalently, this holds if and only if its rows also form an orthonormal basis for Rn\mathbb{R}^n. It might be tempting to assume that a matrix with merely orthogonal (but not necessarily orthonormal) columns would be termed an orthogonal matrix, but such matrices lack special interest and a distinct name; they only satisfy MTM=DM^{\mathrm {T} }M = D, where DD is a diagonal matrix.

The determinant of any orthogonal matrix must be either +1 or −1. This conclusion stems directly from fundamental properties of determinants:

1=det(I)=det(QTQ)=det(QT)det(Q)=(det(Q))21 = \det(I) = \det(Q^{\mathrm {T} }Q) = \det(Q^{\mathrm {T} })\det(Q) = (\det(Q))^2

However, the converse is not true: simply having a determinant of ±1\pm 1 does not guarantee orthogonality, even if the columns are orthogonal. The following counterexample illustrates this:

[20012]\begin{bmatrix} 2 & 0 \\ 0 & \frac{1}{2} \end{bmatrix}

This matrix has a determinant of 2×12=12 \times \frac{1}{2} = 1, and its columns are orthogonal, but it is not orthogonal because the column vectors are not unit vectors.

In the context of permutation matrices, the determinant corresponds precisely to the signature of the permutation, being +1 for an even permutation and −1 for an odd permutation. This is because the determinant is an alternating function of the rows (or columns) of a matrix.

A stronger condition than just restricting the determinant is that any orthogonal matrix can be diagonalized over the complex numbers to reveal a complete set of eigenvalues. Crucially, all of these eigenvalues must have a complex modulus equal to 1.

Group Properties

The inverse of any orthogonal matrix is itself an orthogonal matrix. Similarly, the matrix product of two orthogonal matrices yields another orthogonal matrix. These properties mean that the set of all n×nn \times n orthogonal matrices forms a group under matrix multiplication. This group is a compact Lie group with a dimension of n(n1)2\frac{n(n-1)}{2}, and it is known as the orthogonal group, denoted by O(nn).

The subset of orthogonal matrices with a determinant of +1 forms a path-connected normal subgroup of O(nn) with an index of 2. This subgroup is the special orthogonal group, SO(nn), and its elements are precisely the rotations. The quotient group O(nn)/SO(nn) is isomorphic to O(1), where the projection map assigns +1 or −1 based on the determinant of the matrix. Orthogonal matrices with a determinant of −1 do not include the identity element, and therefore do not form a subgroup in themselves, but rather constitute a coset. This coset is also (separately) path-connected. Consequently, every orthogonal group is divided into two distinct components. Because the projection map splits, O(nn) can be viewed as a semidirect product of SO(nn) by O(1). In practical terms, this means any orthogonal matrix can be obtained by taking a rotation matrix and, possibly, negating one of its columns, a phenomenon observed in the 2×22 \times 2 case. If nn is odd, the semidirect product simplifies to a direct product, and any orthogonal matrix can be formed by taking a rotation matrix and potentially negating all of its columns. This arises from the property of determinants that negating a single column negates the determinant, while negating an odd number of columns (but not an even number) also negates the determinant.

Consider (n+1)×(n+1)(n+1) \times (n+1) orthogonal matrices where the entry in the bottom-right corner is 1. For such matrices, the remaining entries in the last column (and the last row) must be zeros. The product of any two such matrices will also have this form. The upper-left n×nn \times n submatrix is itself an n×nn \times n orthogonal matrix. Therefore, O(nn) can be seen as a subgroup of O(n+1n+1) (and indeed, of all higher-dimensional orthogonal groups).

[0O(n)0001]\begin{bmatrix} & & & 0 \\ & \mathrm {O} (n) & & \vdots \\ & & & 0 \\ 0 & \cdots & 0 & 1 \end{bmatrix}

The fact that any orthogonal matrix can be reduced to this constrained form using elementary reflections (specifically, Householder matrices) implies that a sequence of such reflections can transform any orthogonal matrix into the identity. This establishes the orthogonal group as a reflection group. The last column can be fixed to any unit vector, and each distinct choice defines a different embedding of O(nn) within O(n+1n+1). In this manner, O(n+1n+1) can be conceptualized as a bundle over the unit sphere SnS^n, with O(nn) serving as the fiber.

Similarly, SO(nn) is a subgroup of SO(n+1n+1). Any special orthogonal matrix can be generated through a sequence of Givens plane rotations following an analogous procedure. This bundle structure is preserved:

SO(n)SO(n+1)Sn\mathrm {SO} (n) \hookrightarrow \mathrm {SO} (n+1) \to S^{n}

A single rotation can introduce a zero in the first row of the last column. A series of n1n-1 such rotations can zero out all entries in the last column of an n×nn \times n rotation matrix, except for the last one. Since the planes of rotation are fixed, each rotation introduces only one degree of freedom – its angle. By induction, SO(nn) therefore possesses n(n1)2\frac{n(n-1)}{2} degrees of freedom, a quantity shared by O(nn) as well.

Permutation matrices, while simpler, form a finite group, the symmetric group SnS_n of order n!n!, rather than a Lie group. Similar to the orthogonal groups, SnS_n is a subgroup of Sn+1S_{n+1}. The subset of permutation matrices corresponding to even permutations forms the alternating group, with order n!2\frac{n!}{2}.

Canonical Form

More generally, the action of any orthogonal matrix can be decomposed into independent actions on orthogonal two-dimensional subspaces. Specifically, if QQ is a special orthogonal matrix, there always exists an orthogonal matrix PP (representing a change of basis) such that PTQPP^{\mathrm {T} }QP transforms QQ into a block diagonal form:

PTQP=[R1Rk](n even),PTQP=[R1Rk1](n odd)P^{\mathrm {T} }QP = \begin{bmatrix} R_{1} & & \\ & \ddots & \\ & & R_{k} \end{bmatrix} \quad (n \text{ even}), \quad P^{\mathrm {T} }QP = \begin{bmatrix} R_{1} & & & \\ & \ddots & & \\ & & R_{k} & \\ & & & 1 \end{bmatrix} \quad (n \text{ odd})

Here, R1,,RkR_1, \dots, R_k are 2×22 \times 2 rotation matrices, and all other entries are zero. An exception is when a rotation block might be a diagonal matrix, ±I\pm I. By potentially negating a column, and recognizing that a 2×22 \times 2 reflection matrix can be diagonalized to yield a +1 and a −1, any orthogonal matrix can be brought to the following form:

PTQP=[R1Rk00±1±1]P^{\mathrm {T} }QP = \begin{bmatrix} \begin{matrix} R_{1} & & \\ & \ddots & \\ & & R_{k} \end{matrix} & 0 \\ 0 & \begin{matrix} \pm 1 & & \\ & \ddots & \\ & & \pm 1 \end{matrix} \end{bmatrix}

The matrices R1,,RkR_1, \dots, R_k correspond to conjugate pairs of eigenvalues that lie on the unit circle in the complex plane. This decomposition confirms that all eigenvalues of an orthogonal matrix have an absolute value of 1. If nn is odd, at least one real eigenvalue, +1 or −1, must exist. For a 3×33 \times 3 rotation matrix, the eigenvector associated with the eigenvalue +1 defines the axis of rotation.

Lie Algebra

Consider an orthogonal matrix QQ whose entries are differentiable functions of a parameter tt, such that Q(0)=IQ(0) = I. Differentiating the orthogonality condition QTQ=IQ^{\mathrm {T} }Q = I with respect to tt yields:

Q˙TQ+QTQ˙=0\dot{Q}^{\mathrm {T} }Q + Q^{\mathrm {T} }\dot{Q} = 0

Evaluating this equation at t=0t=0, where Q=IQ=I, leads to the condition:

Q˙T=Q˙\dot{Q}^{\mathrm {T} } = -\dot{Q}

In the language of Lie groups, this signifies that the Lie algebra of an orthogonal matrix group is composed of skew-symmetric matrices. Conversely, the matrix exponential of any skew-symmetric matrix results in an orthogonal matrix, specifically a special orthogonal matrix.

As an illustration, consider the three-dimensional quantity known in physics as angular velocity. This represents an infinitesimal rotation and thus corresponds to a vector in the Lie algebra so(3)\mathfrak{so}(3), which is tangent to SO(3). Given an angular velocity vector ω=(αθ,βθ,γθ)\boldsymbol{\omega} = (\alpha\theta, \beta\theta, \gamma\theta), where (α,β,γ)(\alpha, \beta, \gamma) is a unit vector, the corresponding skew-symmetric matrix form of ω\boldsymbol{\omega} is:

Ω=[0γθβθγθ0αθβθαθ0]\Omega = \begin{bmatrix} 0 & -\gamma\theta & \beta\theta \\ \gamma\theta & 0 & -\alpha\theta \\ -\beta\theta & \alpha\theta & 0 \end{bmatrix}

The exponential of this matrix Ω\Omega yields the orthogonal matrix representing a rotation around the axis (α,β,γ)(\alpha, \beta, \gamma) by an angle θ\theta. If we let c=cos(θ/2)c = \cos(\theta/2) and s=sin(θ/2)s = \sin(\theta/2), the exponential form can be expressed as:

exp(Ω)=[12s2+2α2s22αβs22γsc2αγs2+2βsc2αβs2+2γsc12s2+2β2s22βγs22αsc2αγs22βsc2βγs2+2αsc12s2+2γ2s2]\exp(\Omega) = \begin{bmatrix} 1-2s^{2}+2\alpha^{2}s^{2} & 2\alpha\beta s^{2}-2\gamma sc & 2\alpha\gamma s^{2}+2\beta sc \\ 2\alpha\beta s^{2}+2\gamma sc & 1-2s^{2}+2\beta^{2}s^{2} & 2\beta\gamma s^{2}-2\alpha sc \\ 2\alpha\gamma s^{2}-2\beta sc & 2\beta\gamma s^{2}+2\alpha sc & 1-2s^{2}+2\gamma^{2}s^{2} \end{bmatrix}

Numerical Linear Algebra

Benefits

Numerical analysis leverages many of the inherent properties of orthogonal matrices, particularly in the domain of numerical linear algebra, where they arise naturally. For instance, the construction of an orthonormal basis for a vector space or performing an orthogonal change of basis are common operations that inherently involve orthogonal matrices. The properties of having a determinant of ±1\pm 1 and all eigenvalues with a magnitude of 1 are immensely beneficial for achieving numeric stability. A key consequence is that the condition number of an orthogonal matrix is 1 (the minimum possible value), ensuring that errors are not amplified during matrix multiplication. Many algorithms rely on orthogonal matrices, such as Householder reflections and Givens rotations, precisely because of these stability advantages. Additionally, orthogonal matrices are not only invertible but their inverses are readily available at virtually no computational cost, simply by transposing the matrix.

Permutation matrices play a critical role in the success of numerous algorithms, including the widely used Gaussian elimination with partial pivoting, where permutations are employed to manage the pivoting process. However, these permutations are rarely explicitly represented as matrices; their specialized structure allows for more efficient representations, such as a simple list of nn indices.

Similarly, algorithms that utilize Householder and Givens matrices typically employ specialized methods for multiplication and storage. For example, a Givens rotation affects only two rows of a matrix it multiplies, transforming a full matrix multiplication complexity of order n3n^3 to a much more efficient order nn. When these reflections and rotations are used to introduce zeros into a matrix, the vacated space is often sufficient to store the data needed to reconstruct the transformation, doing so in a robust manner. (Following the approach of Stewart (1976), explicit storage of rotation angles, which can be computationally expensive and numerically unstable, is avoided.)

Decompositions

Several fundamental matrix decompositions (as detailed by Golub & Van Loan, 1996) prominently feature orthogonal matrices. These include:

  • QR decomposition: Expresses a matrix MM as a product M=QRM = QR, where QQ is orthogonal and RR is upper triangular.
  • Singular value decomposition (SVD): Decomposes a matrix MM into M=UΣVTM = U \Sigma V^{\mathrm {T} }, where UU and VV are orthogonal matrices, and Σ\Sigma is a diagonal matrix containing the singular values.
  • Eigendecomposition of a symmetric matrix (based on the spectral theorem): Decomposes a symmetric matrix SS into S=QΛQTS = Q \Lambda Q^{\mathrm {T} }, where QQ is orthogonal and Λ\Lambda is a diagonal matrix of eigenvalues.
  • Polar decomposition: Factors a matrix MM into M=QSM = QS, where QQ is orthogonal and SS is a symmetric positive-semidefinite matrix.

Examples

Consider an overdetermined system of linear equations, a scenario that frequently arises when multiple measurements of a physical phenomenon are taken to mitigate experimental errors. Such a system can be represented as Ax=bA\mathbf{x} = \mathbf{b}, where AA is an m×nm \times n matrix with m>nm > n.

A QR decomposition of AA effectively transforms it into an upper triangular matrix RR. For instance, if AA is a 5×35 \times 3 matrix, RR would have the following structure:

R=[000000000]R = \begin{bmatrix} \cdot & \cdot & \cdot \\ 0 & \cdot & \cdot \\ 0 & 0 & \cdot \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}

The linear least squares problem seeks to find the vector x\mathbf{x} that minimizes the norm Axb\|A\mathbf{x} - \mathbf{b}\|. This is equivalent to projecting the vector b\mathbf{b} onto the subspace spanned by the columns of AA. Assuming the columns of AA (and consequently, of RR) are linearly independent, the solution for the projection is found by solving the normal equations ATAx=ATbA^{\mathrm {T} }A\mathbf{x} = A^{\mathrm {T} }\mathbf{b}. Now, ATAA^{\mathrm {T} }A is an n×nn \times n square matrix and is invertible. Furthermore, it can be expressed as RTRR^{\mathrm {T} }R. The superfluous rows of zeros in RR do not affect the product, resulting in a matrix already in a factored form (both lower and upper triangular), akin to Gaussian elimination or Cholesky decomposition. In this context, orthogonality is crucial not only for simplifying ATA=(RTQT)QRA^{\mathrm {T} }A = (R^{\mathrm {T} }Q^{\mathrm {T} })QR to RTRR^{\mathrm {T} }R, but also for enabling a stable solution without magnifying numerical errors.

For underdetermined systems or cases involving a non-invertible matrix, the singular value decomposition (SVD) proves equally valuable. With AA factored as UΣVTU \Sigma V^{\mathrm {T} }, a robust solution can be obtained using the Moore-Penrose pseudoinverse, calculated as VΣ+UTV \Sigma^{+} U^{\mathrm {T} }, where Σ+\Sigma^{+} is derived from Σ\Sigma by replacing each non-zero diagonal entry with its reciprocal. The solution vector x\mathbf{x} is then given by VΣ+UTbV \Sigma^{+} U^{\mathrm {T} }\mathbf{b}.

The case of a square, invertible matrix also presents interesting applications. Suppose, for instance, that AA is a 3×33 \times 3 rotation matrix that has been computed as a sequence of numerous rotations. Due to the limitations of floating-point arithmetic, AA will inevitably deviate slightly from perfect orthogonality. While a Gram–Schmidt process could be used to re-orthogonalize the columns, it is not always the most reliable, efficient, or numerically stable method. The polar decomposition offers a way to factorize a matrix into two components: one is the unique orthogonal matrix closest to the original matrix, or one of the closest if the original matrix is singular. The measure of "closeness" can be defined by any matrix norm that is invariant under orthogonal changes of basis, such as the spectral norm or the Frobenius norm. For a matrix that is already nearly orthogonal, a rapid convergence to the orthogonal factor can be achieved using a Newton-like method developed by Higham (1986, 1990), which involves iteratively averaging the matrix with its inverse transpose. Dubrulle (1999) has proposed an accelerated method with a convenient convergence criterion.

Consider the example of a non-orthogonal matrix:

[3175]\begin{bmatrix} 3 & 1 \\ 7 & 5 \end{bmatrix}

Using a simple averaging algorithm, the convergence to an orthogonal matrix might take seven iterations:

[3175][1.81250.06253.43752.6875][0.80.60.60.8]\begin{bmatrix} 3 & 1 \\ 7 & 5 \end{bmatrix} \rightarrow \begin{bmatrix} 1.8125 & 0.0625 \\ 3.4375 & 2.6875 \end{bmatrix} \rightarrow \cdots \rightarrow \begin{bmatrix} 0.8 & -0.6 \\ 0.6 & 0.8 \end{bmatrix}

An accelerated version of this method can achieve the result in just two steps (using γ=0.353553,0.565685\gamma = 0.353553, 0.565685):

[3175][1.414211.060661.060661.41421][0.80.60.60.8]\begin{bmatrix} 3 & 1 \\ 7 & 5 \end{bmatrix} \rightarrow \begin{bmatrix} 1.41421 & -1.06066 \\ 1.06066 & 1.41421 \end{bmatrix} \rightarrow \begin{bmatrix} 0.8 & -0.6 \\ 0.6 & 0.8 \end{bmatrix}

In contrast, the Gram-Schmidt process yields a less accurate result, with a Frobenius distance of 8.28659 compared to the minimum achievable distance of 8.12404:

[3175][0.3939190.9191450.9191450.393919]\begin{bmatrix} 3 & 1 \\ 7 & 5 \end{bmatrix} \rightarrow \begin{bmatrix} 0.393919 & -0.919145 \\ 0.919145 & 0.393919 \end{bmatrix}

Randomization

Certain numerical applications, such as Monte Carlo methods and the exploration of high-dimensional data spaces, necessitate the generation of uniformly distributed random orthogonal matrices. In this context, "uniform" is defined with respect to the Haar measure, which essentially ensures that the distribution remains invariant under multiplication by any arbitrary orthogonal matrix. It is important to note that orthogonalizing matrices composed of independent, uniformly distributed random entries does not produce uniformly distributed orthogonal matrices [citation needed]. However, the QR decomposition of matrices with independent, normally distributed random entries does yield uniformly distributed orthogonal matrices, provided that the diagonal entries of RR are restricted to be positive (Mezzadri, 2006). Stewart (1980) introduced a more efficient approach, which was later generalized by Diaconis & Shahshahani (1987) as the "subgroup algorithm." This algorithm is also effective for generating permutations and rotations. To generate an (n+1)×(n+1)(n+1) \times (n+1) orthogonal matrix, one can take an n×nn \times n orthogonal matrix and a uniformly distributed unit vector of dimension n+1n+1. A Householder reflection is then constructed from this vector and applied to the smaller matrix (embedded within the larger dimension, with a '1' in the bottom-right corner).

Nearest Orthogonal Matrix

The problem of finding the orthogonal matrix QQ that is closest to a given matrix MM is closely related to the Orthogonal Procrustes problem. Several methods exist for obtaining the unique solution. The most straightforward approach involves computing the singular value decomposition of MM and then replacing its singular values with ones. An alternative method explicitly expresses RR but requires the computation of a matrix square root:

Q=M(MTM)12Q = M (M^{\mathrm {T} }M)^{-{\frac {1}{2}}}

This expression can be combined with the Babylonian method for extracting the matrix square root to form a recurrence relation that converges quadratically to an orthogonal matrix:

Qn+1=2M(Qn1M+MTQn)1Q_{n+1} = 2M \left(Q_{n}^{-1}M + M^{\mathrm {T} }Q_{n}\right)^{-1}

where Q0=MQ_0 = M.

These iterative methods are numerically stable provided that the condition number of MM is less than three [5].

Utilizing a first-order approximation of the inverse and the same initialization yields a modified iteration:

Nn=QnTQnN_{n} = Q_{n}^{\mathrm {T} }Q_{n} Pn=12QnNnP_{n} = \frac{1}{2}Q_{n}N_{n} Qn+1=2Qn+PnNn3PnQ_{n+1} = 2Q_{n} + P_{n}N_{n} - 3P_{n}

Spin and Pin

A subtle technical issue arises in certain applications of orthogonal matrices. Not only are the connected components of the orthogonal group with determinant +1 and −1 distinct, but even the component with determinant +1, SO(nn), is not simply connected (except for the trivial case of SO(1)). Consequently, it is sometimes advantageous, or even necessary, to work with a covering group of SO(nn), known as the spin group, Spin(nn). Analogously, O(nn) also possesses covering groups, referred to as the pin groups, Pin(nn). For dimensions n>2n > 2, Spin(nn) is simply connected, making it the universal covering group for SO(nn). The most prominent example of a spin group is Spin(3), which is mathematically equivalent to SU(2), the group of unit quaternions.

The Pin and Spin groups are embedded within Clifford algebras, which themselves can be constructed from orthogonal matrices.

Rectangular Matrices

If a matrix QQ is not square, the conditions QTQ=IQ^{\mathrm {T} }Q = I and QQT=IQQ^{\mathrm {T} } = I are not equivalent. The condition QTQ=IQ^{\mathrm {T} }Q = I signifies that the columns of QQ are orthonormal. This can only occur if QQ is an m×nm \times n matrix where nmn \leq m (due to potential linear dependence). Conversely, QQT=IQQ^{\mathrm {T} } = I implies that the rows of QQ are orthonormal, which requires nmn \geq m.

There isn't a universally standardized terminology for these types of matrices. They are variously referred to as "semi-orthogonal matrices," "orthonormal matrices," "orthogonal matrices," or sometimes simply "matrices with orthonormal rows/columns."

In the case where nmn \leq m, matrices with orthonormal columns can be identified as orthogonal k-frames and are elements of the Stiefel manifold.

See also

Notes

  • ^ "Paul's online math notes" full citation needed, Paul Dawkins, Lamar University, 2008. Theorem 3(c)
  • ^ F. Gantmacher, Theory of matrices, vol. 1, Chelsea, 1959, p. 285.
  • ^ Serge Lang, Linear Algebra, 3rd ed., Springer, 1987, p. 230.
  • ^ "Finding the Nearest Orthonormal Matrix", Berthold K.P. Horn, MIT.
  • ^ "Newton's Method for the Matrix Square Root" Archived 2011-09-29 at the Wayback Machine, Nicholas J. Higham, Mathematics of Computation, Volume 46, Number 174, 1986.