Concepts from linear algebra

"Characteristic root" redirects here. For the root of a characteristic equation, see Characteristic equation (calculus).

In the vast, often tedious, landscape of linear algebra, an eigenvector (pronounced, for those still struggling with basic phonetics, /ˈaɪɡən-/ EYE-gən- ) or characteristic vector emerges as a concept of fundamental, if somewhat unexciting, importance. At its core, an eigenvector is a (nonzero) vector that possesses the peculiar quality of having its direction utterly unchanged (or, for the sake of variety, merely reversed) when subjected to a specific linear transformation. To be more precise, and frankly, we must be precise here, an eigenvector

{\displaystyle \mathbf {v} }

of a linear transformation

{\displaystyle T}

is simply scaled by a constant factor — let's call it

{\displaystyle \lambda }

— when that linear transformation is applied to it. This elegant, if somewhat uninspired, relationship is formally expressed as:

= λ

{\displaystyle T\mathbf {v} =\lambda \mathbf {v} }

. The corresponding eigenvalue, also known as a characteristic value or characteristic root, is this multiplying factor

{\displaystyle \lambda }

. It's a number, of course, and can be anything from a negative value that flips the vector's orientation to a complex number, because why should things ever be straightforward?

Geometrically, vectors are those multi-dimensional quantities, often depicted as arrows, that possess both magnitude and direction. A linear transformation, in its tireless work, can rotate, stretch, or even shear the vectors it acts upon. The eigenvectors of such a transformation are the elite few, the special vectors that experience only stretching or shrinking, utterly impervious to rotation or shear. Their direction, in essence, remains steadfast. The associated eigenvalue, then, quantifies precisely how much an eigenvector is stretched or shrunk. Should this eigenvalue happen to be negative, the eigenvector's direction is, with a certain dramatic flourish, reversed. One might say it's a fundamental property, so basic it’s almost insulting to explain.

These eigenvectors and their corresponding eigenvalues are indispensable for characterizing a linear transformation. Consequently, they permeate nearly every field where linear algebra dares to tread, from the shifting strata of geology to the perplexing subatomic realm of quantum mechanics. Their ubiquity isn't surprising; after all, understanding the intrinsic behaviors of a system is rather useful. A particularly critical scenario arises when a system is modeled by a linear transformation whose outputs are relentlessly fed back as inputs to the very same transformation (feedback). In such cases, the largest eigenvalue typically commands the most attention, as it dictates the system's long-term behavior after countless applications of the transformation. Its associated eigenvector, naturally, represents the system's steady state, the predictable equilibrium it eventually settles into – if anything in the universe truly settles.

Matrices

For an

{\displaystyle n{\times }n}

matrix A and a nonzero

{\displaystyle n}

-vector

{\displaystyle \mathbf {v} }

, if the act of multiplying A by

{\displaystyle \mathbf {v} }

(denoted as

{\displaystyle A\mathbf {v} }

) merely results in scaling

{\displaystyle \mathbf {v} }

by a factor λ, where λ is a scalar, then, by definition,

{\displaystyle \mathbf {v} }

is an eigenvector of A, and λ is its corresponding eigenvalue. This pivotal relationship, foundational to much of what we discuss, is concisely expressed as:

= λ

{\displaystyle A\mathbf {v} =\lambda \mathbf {v} }

Considering an n-dimensional vector space and the arbitrary selection of a basis, a direct and unambiguous correspondence exists between linear transformations operating from the vector space onto itself and n-by-n square matrices. This means that, within a finite-dimensional vector space, one can equivalently define eigenvalues and eigenvectors using either the abstract language of linear transformations or the more concrete, computationally friendly language of matrices. It's two sides of the same coin, for those who appreciate such metaphors.

Overview

Eigenvalues and eigenvectors, as you might have gathered, loom rather prominently in the rigorous analysis of linear transformations. The prefix "eigen-," which so many find perplexing, is, in fact, merely adopted from the German word eigen. This term is cognate with the English word "own," and carries connotations of 'proper,' 'characteristic,' or 'intrinsic.' It implies a unique, inherent property, not merely something generic. Originally, these concepts were employed to dissect the principal axes of the rotational motion of rigid bodies – a problem of considerable practical importance, one would assume. Today, their applications are far-reaching and, frankly, rather predictable: they are found in stability analysis, ensuring structures don't simply collapse; in vibration analysis, determining how things oscillate; in the abstract world of atomic orbitals, mapping electron behavior; in the surprisingly tangible realm of facial recognition, identifying individuals; and in the elegant process of matrix diagonalization, simplifying complex systems.

At its core, an eigenvector v of a linear transformation T is a nonzero vector that, upon T's application, maintains its original direction. The transformation simply scales the eigenvector by a scalar value λ, which we call an eigenvalue. This fundamental condition is succinctly captured by the equation:

T (

)

{\displaystyle T(\mathbf {v} )=\lambda \mathbf {v} ,}

This, for those taking notes, is referred to as the eigenvalue equation or eigenequation. Generally, λ can be any scalar that the field allows. For instance, λ might be negative, in which case the eigenvector's direction is reversed as part of the scaling. It could also be zero, implying the vector is collapsed to the origin, or even a complex number, because reality is rarely simple enough to be purely real.

Consider the accompanying illustration: a shear mapping. Observe how the red arrow, a rather typical vector, changes its direction rather dramatically. However, the blue arrow, being an eigenvector, remains resolute; its direction is preserved. Since its length also remains unaltered, its eigenvalue is precisely 1. It's almost as if it knows something the other vectors don't. Another image displays a 2 × 2 real and symmetric matrix at work, stretching and shearing the plane. The red lines represent the eigenvectors, those two singular directions where any point will merely slide along them, without deviation.

Let's delve slightly deeper into the Mona Lisa example, a rather curious application for such mathematical rigor. Each point on the painting can be conceptualized as a vector originating from the painting's center. The linear transformation demonstrated here is a shear mapping, a rather crude operation that shifts points in the top half to the right and points in the bottom half to the left, with the displacement proportional to their distance from the central horizontal axis. Consequently, vectors pointing to individual pixels in the original image are tilted and their lengths altered. However, any vector lying precisely along the horizontal axis, possessing no vertical component, remains untouched by this transformation. These vectors, therefore, are eigenvectors of this particular transformation, as their direction is entirely preserved. Furthermore, because their lengths are also unchanged, their corresponding eigenvalue is exactly one. A rather undignified fate for such a famous smile, reduced to a collection of vectors.

Linear transformations manifest in a myriad of forms, operating on vectors within diverse vector spaces. This versatility means that eigenvectors themselves can assume various guises. For example, a linear transformation might be embodied by a differential operator, such as

d x

{\displaystyle {\tfrac {d}{dx}}}

. In such a scenario, the eigenvectors are not mere arrows but functions, aptly named eigenfunctions, which are scaled by the action of that differential operator. A prime example is the exponential function, where

d x

λ x

= λ

λ x

{\displaystyle {\frac {d}{dx}}e^{\lambda x}=\lambda e^{\lambda x}.}

Here, the operation of differentiation simply multiplies the function by λ, leaving its "direction" (its functional form) intact. Alternatively, the linear transformation could be an n × n matrix, in which case the eigenvectors are typically represented as n × 1 matrices, or column vectors. When the linear transformation is expressed as an n × n matrix A, the eigenvalue equation can be concisely written as the matrix multiplication

= λ

{\displaystyle A\mathbf {v} =\lambda \mathbf {v} ,}

where the eigenvector v is an n × 1 matrix. For matrices, eigenvalues and eigenvectors are essential tools for decomposing the matrix, a process often involving diagonalizing it, which simplifies many otherwise intractable calculations.

The concepts of eigenvalues and eigenvectors give rise to an entire family of closely related mathematical ideas, all liberally adorned with the "eigen-" prefix, because mathematicians, like everyone else, enjoy a good naming convention:

The collective assembly of all eigenvectors of a linear transformation, each meticulously paired with its corresponding eigenvalue, is rather grandly termed the eigensystem of that transformation. It’s the complete set of intrinsic behaviors, if you will.
The collection of all eigenvectors of T that correspond to the same eigenvalue, augmented by the ever-present zero vector, forms what is known as an eigenspace, or the characteristic space of T associated with that specific eigenvalue. This space represents all possible "directions" that remain invariant for a given scaling factor.
Should a set of eigenvectors of T be so accommodating as to form a basis for the domain of T, then this particularly useful basis is elevated to the status of an eigenbasis. It's the ideal coordinate system for understanding the transformation.

History

Eigenvalues, despite their current omnipresence in linear algebra and matrix theory curricula, did not spring fully formed from a textbook. Historically, their genesis lies in the investigation of quadratic forms and the often-frustrating realm of differential equations. It seems humanity always finds a way to complicate things before simplifying them.

In the 18th century, the venerable Leonhard Euler embarked upon a detailed study of the rotational motion of a rigid body. It was in this context that he stumbled upon the profound significance of the principal axes – those special axes around which a body rotates without wobble. A few decades later, Joseph-Louis Lagrange made the crucial realization that these very principal axes were, in fact, the eigenvectors of the inertia matrix – a rather elegant connection between geometry and algebra.

The early 19th century witnessed Augustin-Louis Cauchy extending this work, recognizing its utility in classifying quadric surfaces and generalizing these insights to arbitrary dimensions. Cauchy, ever the innovator, also bestowed upon us the term racine caractéristique (characteristic root), the direct ancestor of our modern "eigenvalue," a term that, rather stubbornly, survives in the characteristic equation.

Later, in 1822, Joseph Fourier, building upon the foundations laid by Lagrange and Pierre-Simon Laplace, leveraged these concepts to solve the notoriously difficult heat equation using the method of separation of variables, as detailed in his seminal treatise, The Analytic Theory of Heat (Théorie analytique de la chaleur). Charles-François Sturm further refined Fourier's ideas, drawing them to Cauchy's attention. This collaboration led to the significant discovery that real symmetric matrices invariably possess real eigenvalues – a property that simplifies many practical problems. This finding was subsequently broadened by Charles Hermite in 1855 to encompass what are now known as Hermitian matrices, which are, in essence, complex generalizations of symmetric matrices.

Around this same period, Francesco Brioschi demonstrated that the eigenvalues of orthogonal matrices always reside on the unit circle in the complex plane – a result indicating preservation of length. Concurrently, Alfred Clebsch elucidated the analogous result for skew-symmetric matrices. Finally, Karl Weierstrass, in his contributions to stability theory (a field initiated by Laplace), highlighted a critical point: defective matrices, those lacking a full set of linearly independent eigenvectors, could lead to system instability, a rather inconvenient truth.

Meanwhile, Joseph Liouville delved into eigenvalue problems strikingly similar to Sturm's, giving birth to the field now recognized as Sturm–Liouville theory, a cornerstone of applied mathematics. Towards the close of the 19th century, Hermann Schwarz meticulously studied the first eigenvalue of Laplace's equation across various domains, while Henri Poincaré investigated Poisson's equation a few years later. It seems everyone wanted a piece of the eigenvalue pie.

The dawn of the 20th century saw David Hilbert extending these concepts to the eigenvalues of integral operators, conceptualizing these operators as infinite-dimensional matrices. Hilbert, in 1904, was the first to explicitly employ the German word eigen, meaning "own" or "characteristic," to designate eigenvalues and eigenvectors, though he might have been influenced by a similar usage from Hermann von Helmholtz. For a time, the accepted English term was "proper value," but the more distinctive, and frankly, less ambiguous, "eigenvalue" has since become the standard. A small victory for clarity, perhaps.

The quest for practical methods to compute these values led to significant advancements. The first numerical algorithm for calculating eigenvalues and eigenvectors emerged in 1929, when Richard von Mises unveiled the power method, a relatively simple iterative technique. Fast forward to 1961, and two independent breakthroughs occurred: John G. F. Francis and Vera Kublanovskaya each proposed the QR algorithm, which remains one of the most widely used and efficient methods today. It seems even the most abstract concepts eventually require a calculator.

Eigenvalues and eigenvectors of matrices

See also: Euclidean vector and Matrix (mathematics)

Eigenvalues and eigenvectors are, rather predictably, often introduced to students within the confines of linear algebra courses, where the focus invariably gravitates towards matrices. This is not without reason. Furthermore, linear transformations operating over a finite-dimensional vector space can be quite conveniently represented using matrices. This representation becomes particularly dominant and useful in numerical and computational applications, where abstract concepts need to be translated into something a machine can process.

Observe the illustration: matrix A, in its action, stretches the vector x. Crucially, it does not alter its fundamental direction. Thus, x is, by definition, an eigenvector of A. It’s a visual confirmation of the core concept.

Let's consider n-dimensional vectors which are constructed as ordered lists of n scalars. For example, in three dimensions, we might encounter vectors such as

[

− 3

]

and

[

− 20

− 80

]

{\displaystyle \mathbf {x} ={\begin{bmatrix}1\-3\4\end{bmatrix}}\quad {\mbox{and}}\quad \mathbf {y} ={\begin{bmatrix}-20\60\-80\end{bmatrix}}.}

These vectors are said to be scalar multiples of one another, or, more colloquially, parallel or collinear, if there exists a scalar λ such that

= λ

{\displaystyle \mathbf {x} =\lambda \mathbf {y} .}

In this specific instance, a quick calculation reveals that

λ

−

1 20

{\displaystyle \lambda =-{\frac {1}{20}}}

. A simple observation, but an important one for context.

Now, let's consider the operation of a linear transformation on these n-dimensional vectors, a transformation defined by an n-by-n matrix A. This operation transforms a vector v into a new vector w, expressed as:

{\displaystyle A\mathbf {v} =\mathbf {w} ,}

Or, in its more explicit, component-wise form, for those who appreciate the details:

[

⋯

1 n

⋯

2 n

⋮

⋱

⋮

n 1

n 2

⋯

n n

]

[

⋮

]

[

⋮

]

{\displaystyle {\begin{bmatrix}A_{11}&A_{12}&\cdots &A_{1n}\A_{21}&A_{22}&\cdots &A_{2n}\\vdots &\vdots &\ddots &\vdots \A_{n1}&A_{n2}&\cdots &A_{nn}\\end{bmatrix}}{\begin{bmatrix}v_{1}\v_{2}\\vdots \v_{n}\end{bmatrix}}={\begin{bmatrix}w_{1}\w_{2}\\vdots \w_{n}\end{bmatrix}}}

where, for each row i, the component

{\displaystyle w_{i}}

is given by the sum:

i 1

i 2

⋯ +

i n

∑

j

i j

{\displaystyle w_{i}=A_{i1}v_{1}+A_{i2}v_{2}+\cdots +A_{in}v_{n}=\sum {j=1}^{n}A{ij}v_{j}.}

Now, if it so happens that the output vector w is a mere scalar multiple of the input vector v – that is, if

= λ

then v is unequivocally an eigenvector of the linear transformation represented by A, and the scaling factor λ is its corresponding eigenvalue. Equation ( 1 ) is, quite simply, the eigenvalue equation for the matrix A. It’s the definition, distilled.

This Equation ( 1 ) can be equivalently, and perhaps more usefully for computation, restated as:

(

A − λ I

)

where I represents the n-by-n identity matrix, a matrix that leaves vectors unchanged, and 0 is the zero vector. This formulation highlights that finding eigenvectors is essentially solving a homogeneous system of linear equations.

Eigenvalues and the characteristic polynomial

Main article: Characteristic polynomial

Equation ( 2 ), which we just established as

(

A − λ I

)

{\displaystyle \left(A-\lambda I\right)\mathbf {v} =\mathbf {0} ,}

possesses a nonzero solution for v if and only if the determinant of the matrix ( A − λI ) is precisely zero. This condition is not merely a mathematical quirk; it's the gateway to finding the eigenvalues. Therefore, the eigenvalues of A are those specific values of λ that satisfy the equation:

det ( A − λ I )

{\displaystyle \det(A-\lambda I)=0}

Applying the renowned Leibniz formula for determinants, the left-hand side of equation ( 3 ) unfurls into a polynomial function of the variable λ. The degree of this polynomial is invariably n, which is the order of the matrix A. Its coefficients are, naturally, dependent on the entries of A, with the notable exception that its term of degree n is always (−1) n λ n . This specific polynomial is christened the characteristic polynomial of A. Consequently, Equation ( 3 ) is known as the characteristic equation or, in certain circles, the secular equation of A.

The characteristic polynomial of an n-by-n matrix A, being a polynomial of degree n, will yield at most n complex number roots. These roots, which are the eigenvalues themselves, can be unearthed either by the rather satisfying process of factoring the characteristic polynomial or, more often in practical scenarios, by employing numerical root-finding algorithms. The characteristic polynomial can be factored into a product of n linear terms, a process that reveals each individual eigenvalue:

det ( A − λ I )

(

− λ ) (

− λ ) ⋯ (

− λ ) ,

where the complex numbers λ 1 , λ 2 , ... , λ n represent the eigenvalues. It's crucial to note that these eigenvalues are not necessarily distinct; some may repeat, a phenomenon quantified by their algebraic multiplicity.

As a brief illustrative example, to be explored in more detail later, consider the rather unassuming matrix:

A

[

]

{\displaystyle A={\begin{bmatrix}2&1\1&2\end{bmatrix}}.}

Taking the determinant of ( A − λI ), its characteristic polynomial is calculated as:

det ( A − λ I )

2 − λ

= 3 − 4 λ +

{\displaystyle \det(A-\lambda I)={\begin{vmatrix}2-\lambda &1\1&2-\lambda \end{vmatrix}}=3-4\lambda +\lambda ^{2}.}

Setting this characteristic polynomial to zero, we find its roots at λ = 1 and λ = 3. These, then, are the two eigenvalues of A. The corresponding eigenvectors for each eigenvalue λ can be determined by solving for the components of v in the equation ( A − λI ) v = 0. For this specific example, the eigenvectors are any nonzero scalar multiples of:

λ

[

− 1

]

λ

[

]

{\displaystyle \mathbf {v} _{\lambda =1}={\begin{bmatrix}1\-1\end{bmatrix}},\quad \mathbf {v} _{\lambda =3}={\begin{bmatrix}1\1\end{bmatrix}}.}

It's a common misconception that if the entries of matrix A are all real numbers, then its eigenvalues must also be real. While the coefficients of the characteristic polynomial will indeed be real, the eigenvalues themselves may very well possess nonzero imaginary parts – a fact that often surprises those new to the field. Consequently, the entries of the corresponding eigenvectors may also be complex numbers. Similarly, eigenvalues can be irrational numbers even if all entries of A are rational numbers, or even integers. However, a small comfort: if all entries of A are algebraic numbers (which includes all rationals), then the eigenvalues are guaranteed to be algebraic numbers as well.

A salient feature of polynomials with real coefficients is that their non-real roots always manifest in pairs of complex conjugates. This means that for each complex eigenvalue, its conjugate will also be an eigenvalue, with their imaginary parts differing only in sign while sharing the same real part. Furthermore, if the polynomial's degree is odd, the intermediate value theorem guarantees that at least one of its roots must be real. Therefore, any real matrix with an odd order (e.g., 3x3, 5x5) will necessarily possess at least one real eigenvalue. Conversely, a real matrix of even order (e.g., 2x2, 4x4) might, in its infinite capacity for mischief, have no real eigenvalues at all. The eigenvectors associated with these complex eigenvalues are, predictably, also complex and, much like their eigenvalues, appear in complex conjugate pairs.

Spectrum of a matrix

The spectrum of a matrix is not some grand celestial phenomenon, but rather the complete list of its eigenvalues, with each eigenvalue repeated according to its multiplicity. In an alternative, slightly more formal notation, it is the set of eigenvalues along with their respective multiplicities. It’s a comprehensive inventory of a matrix's intrinsic scaling factors.

An important quantity derived from this spectrum is the maximum absolute value among all eigenvalues. This is rather grandly known as the spectral radius of the matrix. It often dictates the convergence properties and long-term behavior of systems described by the matrix.

Algebraic multiplicity

Let λ i denote an eigenvalue of an n-by-n matrix A. The algebraic multiplicity, denoted as μ A ( λ i ), of this eigenvalue is defined as its multiplicity as a root of the characteristic polynomial. More precisely, it is the largest integer k such that ( λ − λ i ) k evenly divides that polynomial. This tells us how "many times" an eigenvalue appears as a root.

Suppose a matrix A has dimension n and d distinct eigenvalues, where d ≤ n. While equation ( 4 ) expresses the characteristic polynomial of A as a product of n linear terms (some potentially repeating), it can also be more compactly written as the product of d terms, each corresponding to a distinct eigenvalue and raised to the power of its algebraic multiplicity:

det ( A − λ I )

(

− λ

)

(

)

(

− λ

)

(

)

⋯ (

− λ

)

(

)

{\displaystyle \det(A-\lambda I)=(\lambda _{1}-\lambda )^{\mu _{A}(\lambda _{1})}(\lambda _{2}-\lambda )^{\mu _{A}(\lambda _{2})}\cdots (\lambda _{d}-\lambda )^{\mu _{A}(\lambda _{d})}.}

If, by some fortunate coincidence, d = n, then the right-hand side is simply the product of n distinct linear terms, which aligns perfectly with equation ( 4 ). The magnitude of each eigenvalue's algebraic multiplicity is inherently constrained by the dimension n, adhering to the following rules:

≤

(

) ≤ n ,

∑

i

(

)

= n .

{\displaystyle {\begin{aligned}1&\leq \mu _{A}(\lambda _{i})\leq n,\\mu _{A}&=\sum _{i=1}^{d}\mu _{A}\left(\lambda _{i}\right)=n.\end{aligned}}}

If μ A ( λ i ) = 1, the eigenvalue λ i is, rather uncreatively, designated a simple eigenvalue. It's the simplest case, naturally. If μ A ( λ i ) happens to be equal to the geometric multiplicity of λ i , denoted as γ A ( λ i ) (a concept we'll explore shortly), then λ i is deemed a semisimple eigenvalue. This equality implies a certain "well-behaved" nature, allowing for simpler diagonalization.

Eigenspaces, geometric multiplicity, and the eigenbasis for matrices

Given a specific eigenvalue λ of the n × n matrix A, let's define the set E. This set comprises all vectors v that satisfy equation ( 2 ), which, for those who need a reminder, is

(

A − λ I

)

{\displaystyle \left(A-\lambda I\right)\mathbf {v} =\mathbf {0} }

. So, quite simply:

E

{

(

A − λ I

)

}

{\displaystyle E=\left{\mathbf {v} :\left(A-\lambda I\right)\mathbf {v} =\mathbf {0} \right}.}

On one hand, this set E is precisely the kernel or nullspace of the matrix A − λI . This means it contains all vectors that are mapped to the zero vector by the transformation (A - λI). On the other hand, by its very definition, any nonzero vector satisfying this condition is an eigenvector of A associated with λ. Thus, the set E is the union of the zero vector with the collection of all eigenvectors of A corresponding to λ. In essence, E is the nullspace of A − λI . This space E is formally referred to as the eigenspace or characteristic space of A associated with the eigenvalue λ. In general, λ is a complex number, and the eigenvectors are complex n × 1 matrices (column vectors). Because every nullspace inherently constitutes a linear subspace of the domain, E is, by its nature, a linear subspace of

{\displaystyle \mathbb {C} ^{n}}

, the n-dimensional complex vector space.

Because the eigenspace E holds the distinguished status of being a linear subspace, it exhibits the property of being closed under vector addition. This means that if any two vectors u and v belong to the set E (denoted as u , v ∈ E ), then their sum, u + v, will also reside within E. This can be verified by applying the distributive property of matrix multiplication: A(u+v) = Au + Av = λu + λv = λ(u+v). Similarly, due to its nature as a linear subspace, E is also closed under scalar multiplication. If v ∈ E and α is a complex number, then α v ∈ E. This is straightforwardly confirmed by noting the commutative property of scalar multiplication with complex matrices: A(αv) = α(Av) = α(λv) = λ(αv). As long as u + v and α v do not happen to be the zero vector, they too are eigenvectors of A associated with λ, demonstrating the linear structure of these "invariant directions."

The dimension of the eigenspace E associated with λ, or, to put it another way, the maximum number of linearly independent eigenvectors that can be associated with λ, is termed the eigenvalue's geometric multiplicity, denoted as

( λ )

{\displaystyle \gamma _{A}(\lambda )}

. Since E is, as established, the nullspace of A − λI , the geometric multiplicity of λ is equivalent to the dimension of the nullspace of A − λI , which is also known as the nullity of A − λI . This quantity is intrinsically linked to the size and rank of A − λI by the equation:

( λ )

n − rank ⁡ ( A − λ I ) .

{\displaystyle \gamma _{A}(\lambda )=n-\operatorname {rank} (A-\lambda I).}

By the very definition of eigenvalues and eigenvectors, an eigenvalue's geometric multiplicity must be at least one; every eigenvalue, by its nature, must have at least one associated eigenvector. Furthermore, an eigenvalue's geometric multiplicity is inherently bounded and cannot exceed its algebraic multiplicity. Additionally, as we've noted, an eigenvalue's algebraic multiplicity cannot exceed the dimension n. These fundamental relationships are summarized as:

1 ≤

( λ ) ≤

( λ ) ≤ n

{\displaystyle 1\leq \gamma _{A}(\lambda )\leq \mu _{A}(\lambda )\leq n}

To demonstrate the inequality that the geometric multiplicity cannot exceed the algebraic multiplicity (i.e.,

( λ ) ≤

( λ )

{\displaystyle \gamma _{A}(\lambda )\leq \mu _{A}(\lambda )}

), let's consider B = A − λI , where λ is a specific complex number, and the eigenspace associated with λ is, by definition, the nullspace of B. Suppose the dimension of this eigenspace is

k

( λ )

{\displaystyle k=\gamma _{A}(\lambda )}

. This implies that if we perform Gaussian elimination to bring B into its echelon form, the last k rows of this echelon form will consist entirely of zeros. Consequently, there exists an invertible matrix E, which is the product of the elementary matrices from the Gauss-Jordan reduction, such that:

E B

[

∗

k × ( n − k )

k × k

]

{\displaystyle EB={\begin{bmatrix}&\\mathbf {0} _{k\times (n-k)}&\mathbf {0} _{k\times k}\end{bmatrix}}.}

From this structure, it follows that the last k rows of the matrix EB − tE are merely (− t ) times the corresponding last k rows of E. Due to basic properties of determinants (specifically, homogeneity and linearity with respect to rows), the polynomial t k must evenly divide the polynomial det( EB − tE ). However, we also know that det( EB − tE ) = det E ⋅ det( B − tI ) = p A ( t + λ ) det E , where p A ( t ) is the characteristic polynomial of A. Since det E is a non-zero constant, it implies that ( t − λ ) k divides p A ( t ). This directly demonstrates that the algebraic multiplicity of λ must be at least k, thus proving that geometric multiplicity cannot exceed algebraic multiplicity. It's a rather elegant piece of logical deduction, if you appreciate such things.

Now, let's assume A possesses d distinct eigenvalues, λ 1 , ... , λ d , where d ≤ n, and the geometric multiplicity of each λ i is γ A ( λ i ). The total geometric multiplicity of A, which is simply the sum of the geometric multiplicities of all its distinct eigenvalues, is given by:

∑

i

(

) ,

d ≤

≤ n ,

{\displaystyle \gamma _{A}=\sum _{i=1}^{d}\gamma _{A}(\lambda _{i}),\quad d\leq \gamma _{A}\leq n,}

This total geometric multiplicity represents the dimension of the sum of all the eigenspaces corresponding to A's eigenvalues, or, equivalently, the maximum number of linearly independent eigenvectors of A that can be found. If, in a particularly fortunate scenario,

= n

{\displaystyle \gamma _{A}=n}

, then several highly desirable conditions hold:

The direct sum of the eigenspaces associated with all of A's eigenvalues spans the entire vector space

{\displaystyle \mathbb {C} ^{n}}

. This means that every vector in the space can be uniquely expressed as a sum of vectors from these eigenspaces.

A basis for

{\displaystyle \mathbb {C} ^{n}}

can be constructed entirely from n linearly independent eigenvectors of A. Such a basis is, quite fittingly, called an eigenbasis. It's the ideal coordinate system for analyzing the transformation.

Any vector within

{\displaystyle \mathbb {C} ^{n}}

can be elegantly expressed as a linear combination of the eigenvectors of A. This is a powerful statement, implying that the eigenvectors essentially form the fundamental building blocks for the entire space under the transformation.

Additional properties

Let A be an arbitrary n × n matrix of complex numbers with eigenvalues λ 1 , ... , λ n . Each eigenvalue appears μ A ( λ i ) times in this list, where μ A ( λ i ) denotes the eigenvalue's algebraic multiplicity. These relationships give rise to several fundamental properties that are, frankly, quite convenient:

The trace of A, which is defined as the sum of its diagonal elements, also happens to be precisely the sum of all its eigenvalues. This is a rather elegant connection between two seemingly disparate properties:

tr ⁡ ( A )

∑

i

i i

∑

i

⋯ +

{\displaystyle \operatorname {tr} (A)=\sum {i=1}^{n}a{ii}=\sum _{i=1}^{n}\lambda _{i}=\lambda _{1}+\lambda _{2}+\cdots +\lambda _{n}.}

Similarly, the determinant of A, that single scalar value that tells us about scaling and invertibility, is also the product of all its eigenvalues. This is another fundamental identity, connecting the overall scaling effect of a matrix to its intrinsic scaling factors:

det ( A )

∏

i

⋯

{\displaystyle \operatorname {det} (A)=\prod _{i=1}^{n}\lambda _{i}=\lambda _{1}\lambda _{2}\cdots \lambda _{n}.}

Should one consider the kth power of A (i.e., A k , for any positive integer k), its eigenvalues are simply the kth powers of the original eigenvalues: λ k 1 , ... , λ k n . This makes calculating powers of matrices significantly easier if you can diagonalize them.
The matrix A is invertible if and only if every single one of its eigenvalues is nonzero. If even one eigenvalue is zero, it means at least one direction is collapsed to the origin, making the transformation irreversible.
If A is, in fact, invertible, then the eigenvalues of its inverse, A −1 , are simply the reciprocals of the original eigenvalues:

, … ,

{\textstyle {\frac {1}{\lambda _{1}}},\ldots ,{\frac {1}{\lambda _{n}}}}

. Furthermore, each eigenvalue's geometric multiplicity remains the same. As a bonus, since the characteristic polynomial of the inverse matrix is a reciprocal polynomial of the original, the eigenvalues also share the same algebraic multiplicity.

If A is equal to its conjugate transpose A ∗ (or, equivalently, if A is a Hermitian matrix), then every single eigenvalue is guaranteed to be a real number. The same holds true for any symmetric real matrix, a special case of a Hermitian matrix. This is a remarkably useful property in physics and engineering.
If A is not only Hermitian but also positive-definite, then all its eigenvalues are strictly positive. If it's positive-semidefinite, they are non-negative. Conversely, for negative-definite and negative-semidefinite matrices, the eigenvalues are strictly negative or non-positive, respectively. These properties are crucial for understanding the "energy" or "definiteness" of a system.
If A is a unitary matrix, a transformation that preserves inner products (and thus lengths), then every eigenvalue λ i has an absolute value of 1 (| λ i | = 1). This means they lie on the unit circle in the complex plane, indicating a rotational or reflective action.
More generally, if A is an n × n matrix and { λ 1 , ... , λ k } are its eigenvalues, then the eigenvalues of the matrix I + A (where I is the identity matrix) are simply { λ 1 + 1, ... , λ k + 1}. Even more broadly, if

α ∈

{\displaystyle \alpha \in \mathbb {C} }

is any complex scalar, the eigenvalues of αI + A are { λ 1 + α , ... , λ k + α }. This pattern extends to polynomials: for any polynomial P, the eigenvalues of the matrix P ( A ) are { P ( λ 1 ), ... , P ( λ k )}. This means that if you know the eigenvalues of A, you can easily find the eigenvalues of any polynomial function of A.

Left and right eigenvectors

See also: left and right (algebra)

In many disciplines, especially those with a practical bent, vectors are conventionally represented as matrices with a single column. This convention leads to the term "eigenvector" in the context of matrices almost invariably referring to a right eigenvector. This is a column vector that performs a right multiplication of the n × n matrix A in the defining equation, equation ( 1 ), which we've seen countless times by now:

v

{\displaystyle A\mathbf {v} =\lambda \mathbf {v} .}

However, the eigenvalue and eigenvector problem can also be formulated for row vectors that perform a left multiplication of matrix A. In this alternative formulation, the defining equation becomes:

A

{\displaystyle \mathbf {u} A=\kappa \mathbf {u} ,}

where κ is a scalar and u is a 1 × n matrix (a row vector). Any row vector u that satisfies this equation is termed a left eigenvector of A, and κ is its associated eigenvalue. To connect this to our more familiar right eigenvector concept, one can take the transpose of this equation:

= κ

{\displaystyle A^{\textsf {T}}\mathbf {u} ^{\textsf {T}}=\kappa \mathbf {u} ^{\textsf {T}}.}

Comparing this transposed equation to equation ( 1 ), it immediately becomes clear that a left eigenvector of A is nothing more than the transpose of a right eigenvector of A T , and, crucially, they share the exact same eigenvalue. Furthermore, since the characteristic polynomial of A T is identical to the characteristic polynomial of A, it logically follows that the left and right eigenvectors of A are associated with the very same set of eigenvalues. It's almost as if the universe has a sense of symmetry.

Diagonalization and the eigendecomposition

Main article: Eigendecomposition of a matrix

Suppose, for a moment, that the eigenvectors of matrix A are sufficiently numerous and well-behaved to form a basis for the entire vector space. This is equivalent to stating that A possesses n linearly independent eigenvectors, let's call them v 1 , v 2 , ..., v n , each paired with its associated eigenvalue λ 1 , λ 2 , ..., λ n . It's worth noting that these eigenvalues do not necessarily need to be distinct. In this felicitous scenario, we can construct a square matrix Q whose columns are precisely these n linearly independent eigenvectors of A:

Q

[

⋯

]

{\displaystyle Q={\begin{bmatrix}\mathbf {v} _{1}&\mathbf {v} _{2}&\cdots &\mathbf {v} _{n}\end{bmatrix}}.}

Because each column of Q is, by design, an eigenvector of A, the operation of right-multiplying A by Q has a very specific, simplifying effect: it scales each column of Q by its corresponding eigenvalue. This results in:

A Q

[

⋯

]

{\displaystyle AQ={\begin{bmatrix}\lambda _{1}\mathbf {v} _{1}&\lambda _{2}\mathbf {v} _{2}&\cdots &\lambda _{n}\mathbf {v} _{n}\end{bmatrix}}.}

With this insight, let us define a diagonal matrix Λ, where each diagonal element Λ ii is precisely the eigenvalue associated with the ith column of Q. With this construction, the previous equation can be rewritten with remarkable elegance as:

A Q

Q Λ .

{\displaystyle AQ=Q\Lambda .}

Since the columns of Q are, by our initial assumption, linearly independent, Q is necessarily invertible. Armed with this invertibility, we can right-multiply both sides of the equation by Q −1 , yielding:

A

Q Λ

− 1

{\displaystyle A=Q\Lambda Q^{-1},}

Alternatively, by left-multiplying both sides by Q −1 , we arrive at an equally important form:

− 1

A Q

Λ .

{\displaystyle Q^{-1}AQ=\Lambda .}

This process, where A is broken down into a matrix of its eigenvectors, a diagonal matrix containing its eigenvalues, and the inverse of the eigenvector matrix, is known as the eigendecomposition. It is, in essence, a similarity transformation. A matrix A that can be subjected to this process is said to be similar to the diagonal matrix Λ, or, more simply, diagonalizable. The matrix Q, in this context, serves as the change of basis matrix for the similarity transformation. Fundamentally, matrices A and Λ represent the identical linear transformation, but expressed within two distinct bases. The eigenvectors themselves are utilized as the basis when the linear transformation is represented by Λ, which is often a far simpler form to work with.

Conversely, imagine a matrix A that is known to be diagonalizable. This implies there exists a non-singular square matrix P such that P −1 AP results in some diagonal matrix D. If we left-multiply both sides of this expression by P, we get AP = PD. From this, it follows that each column of P must be an eigenvector of A, with its corresponding eigenvalue being the diagonal element from D in the same column. Since the columns of P must be linearly independent for P to be invertible, we can conclude that there exist n linearly independent eigenvectors of A. This leads to a crucial insight: the eigenvectors of A form a basis for the vector space if and only if A is diagonalizable.

A matrix that stubbornly refuses to be diagonalizable is termed a defective matrix. For these less cooperative matrices, the straightforward notion of eigenvectors must be generalized to generalized eigenvectors, and the elegant diagonal matrix of eigenvalues expands into the more complex, but still structured, Jordan normal form. Over an algebraically closed field, such as the complex numbers, any matrix A is guaranteed to possess a Jordan normal form. This means it admits a basis composed of generalized eigenvectors and a decomposition into generalized eigenspaces, ensuring that even the "defective" matrices can be understood, albeit with a bit more effort.

Variational characterization

Main article: Min-max theorem

In the specific realm of Hermitian matrices (which include symmetric real matrices as a special case), eigenvalues are not merely roots of a polynomial; they can be given a powerful variational characterization. The largest eigenvalue of a Hermitian matrix H, for instance, corresponds to the maximum value that the quadratic form x T H x / x T x can attain. Furthermore, any vector x that actually achieves this maximum value is, by definition, an eigenvector of H. This principle is formally captured by the Min-max theorem, a cornerstone of spectral theory for these matrices. It provides a way to "feel out" the eigenvalues by probing the matrix's action on vectors.

Matrix examples

Two-dimensional matrix example

The transformation matrix A = ⎡⎣ 2 1 1 2 ⎤⎦ demonstrates a clear visual effect: it preserves the direction of magenta vectors that are parallel to v λ =1 = [1 −1] T and blue vectors parallel to v λ =3 = [1 1] T . These are, quite obviously, the eigenvectors. Conversely, the red vectors, not being parallel to either of these special directions, have their directions altered by the transformation. The magenta vectors, corresponding to an eigenvalue of 1, experience no change in length, while the blue vectors, with an eigenvalue of 3, are stretched to three times their original length. For those who need more, an extended version, showing all four quadrants is available. It's a rather neat visual summary of the concept.

Let's revisit the matrix:

A

[

]

{\displaystyle A={\begin{bmatrix}2&1\1&2\end{bmatrix}}.}

The accompanying figure on the right provides a visual representation of how this linear transformation impacts point coordinates within the plane. The eigenvectors v of this transformation are precisely those vectors that satisfy equation ( 1 ), and the values of λ for which the determinant of the matrix ( A − λI ) equals zero are, of course, the eigenvalues.

To explicitly calculate the characteristic polynomial of A, we take the determinant of ( A − λI ):

det ( A − λ I )

[

]

− λ

[

]

2 − λ

= ( 2 − λ ) ( 2 − λ ) − ( 1 ) ( 1 )

= 4 − 4 λ +

− 1

= 3 − 4 λ +

= ( λ − 3 ) ( λ − 1 ) .

{\displaystyle {\begin{aligned}\det(A-\lambda I)&=\left|{\begin{bmatrix}2&1\1&2\end{bmatrix}}-\lambda {\begin{bmatrix}1&0\0&1\end{bmatrix}}\right|={\begin{vmatrix}2-\lambda &1\1&2-\lambda \end{vmatrix}}\[6pt]&=(2-\lambda )(2-\lambda )-(1)(1)\[6pt]&=4-4\lambda +\lambda ^{2}-1\[6pt]&=3-4\lambda +\lambda ^{2}\[6pt]&=(\lambda -3)(\lambda -1).\end{aligned}}}

Setting this characteristic polynomial equal to zero, we readily find its roots at λ = 1 and λ = 3. These, then, are the two eigenvalues of A.

Now, let's determine the eigenvectors. For λ = 1, equation ( 2 ) transforms into:

( A − I )

λ

[

2 − 1

]

[

]

[

]

[

]

[

]

This matrix equation expands to a system of linear equations: 1

= 0

{\displaystyle {\begin{aligned}(A-I)\mathbf {v} {\lambda =1}&={\begin{bmatrix}2-1&1\1&2-1\end{bmatrix}}{\begin{bmatrix}v{1}\v_{2}\end{bmatrix}}={\begin{bmatrix}1&1\1&1\end{bmatrix}}{\begin{bmatrix}v_{1}\v_{2}\end{bmatrix}}={\begin{bmatrix}0\0\end{bmatrix}}\1v_{1}+1v_{2}&=0\end{aligned}}}

Any nonzero vector where v 1 = − v 2 will satisfy this equation. Therefore, a representative eigenvector corresponding to λ = 1 is:

λ

[

−

]

[

− 1

]

{\displaystyle \mathbf {v} {\lambda =1}={\begin{bmatrix}v{1}\-v_{1}\end{bmatrix}}={\begin{bmatrix}1\-1\end{bmatrix}}}

And, of course, any scalar multiple of this vector is also a valid eigenvector for λ = 1.

Next, for λ = 3, equation ( 2 ) becomes:

( A − 3 I )

λ

[

2 − 3

]

[

]

[

− 1

−

− 1

]

[

]

[

]

This system of equations is: − 1

= 0 ;

− 1

= 0

{\displaystyle {\begin{aligned}(A-3I)\mathbf {v} {\lambda =3}&={\begin{bmatrix}2-3&1\1&2-3\end{bmatrix}}{\begin{bmatrix}v{1}\v_{2}\end{bmatrix}}={\begin{bmatrix}-1&{\hphantom {-}}1\{\hphantom {-}}1&-1\end{bmatrix}}{\begin{bmatrix}v_{1}\v_{2}\end{bmatrix}}={\begin{bmatrix}0\0\end{bmatrix}}\-1v_{1}+1v_{2}&=0;\1v_{1}-1v_{2}&=0\end{aligned}}}

Both equations simplify to v 1 = v 2 . Hence, any nonzero vector where v 1 = v 2 is a solution. A representative eigenvector for λ = 3 is:

λ

[

]

[

]

{\displaystyle \mathbf {v} {\lambda =3}={\begin{bmatrix}v{1}\v_{1}\end{bmatrix}}={\begin{bmatrix}1\1\end{bmatrix}}}

And, predictably, any scalar multiple of this vector also serves as an eigenvector for λ = 3.

Thus, we confirm that the vectors v λ =1 and v λ =3 are indeed the eigenvectors of A, corresponding to the eigenvalues λ = 1 and λ = 3, respectively. It’s a straightforward, if somewhat tedious, process.

Three-dimensional matrix example

Let's escalate to three dimensions with the matrix:

A

[

]

{\displaystyle A={\begin{bmatrix}2&0&0\0&3&4\0&4&9\end{bmatrix}}.}

Calculating the characteristic polynomial of A involves finding the determinant of ( A − λI ):

det ( A − λ I )

[

]

− λ

[

]

2 − λ

3 − λ

9 − λ

Given the upper-triangular block structure, this simplifies to:

( 2 − λ )

[

( 3 − λ ) ( 9 − λ ) − 4 ∗ 4

]

= ( 2 − λ ) ( 27 − 12 λ +

− 16 )

= ( 2 − λ ) (

− 12 λ + 11 )

= ( 2 − λ ) ( λ − 1 ) ( λ − 11 )

= −

− 35 λ + 22.

{\displaystyle {\begin{aligned}\det(A-\lambda I)&=\left|{\begin{bmatrix}2&0&0\0&3&4\0&4&9\end{bmatrix}}-\lambda {\begin{bmatrix}1&0&0\0&1&0\0&0&1\end{bmatrix}}\right|={\begin{vmatrix}2-\lambda &0&0\0&3-\lambda &4\0&4&9-\lambda \end{vmatrix}},\[6pt]&=(2-\lambda ){\bigl [}(3-\lambda )(9-\lambda )-16{\bigr ]}\&=(2-\lambda )(\lambda ^{2}-12\lambda +11)\&=(2-\lambda )(\lambda -1)(\lambda -11)\&=-\lambda ^{3}+14\lambda ^{2}-35\lambda +22.\end{aligned}}}

The roots of this characteristic polynomial are, rather conveniently, 2, 1, and 11. These are the three distinct eigenvalues of A. These eigenvalues correspond to the eigenvectors [1 0 0] T , [0 −2 1] T , and [0 1 2] T , respectively, or, as always, any nonzero scalar multiple thereof. It's a slightly more involved calculation, but the principle remains the same.

Three-dimensional matrix example with complex eigenvalues

Now for something a little less straightforward: consider the cyclic permutation matrix:

A

[

]

{\displaystyle A={\begin{bmatrix}0&1&0\0&0&1\1&0&0\end{bmatrix}}.}

This matrix performs a rather neat trick: it shifts the coordinates of a vector up by one position, and then takes the first coordinate and moves it to the bottom. Its characteristic polynomial is 1 − λ 3 , the roots of which are not all real, as you might expect given the cyclic nature of the transformation:

= 1

= −

1 2

∗

= −

1 2

− i

{\displaystyle {\begin{aligned}\lambda _{1}&=1\\lambda _{2}&=-{\frac {1}{2}}+i{\frac {\sqrt {3}}{2}}\\lambda _{3}&=\lambda _{2}^{*}=-{\frac {1}{2}}-i{\frac {\sqrt {3}}{2}}\end{aligned}}}

where i is the ubiquitous imaginary unit with i 2 = −1. These complex eigenvalues are, as expected for real matrices, a complex conjugate pair.

For the sole real eigenvalue, λ 1 = 1, any vector with three equal nonzero entries will serve as an eigenvector. For example:

[

]

[

]

= 1 ⋅

[

]

{\displaystyle A{\begin{bmatrix}5\5\5\end{bmatrix}}={\begin{bmatrix}5\5\5\end{bmatrix}}=1\cdot {\begin{bmatrix}5\5\5\end{bmatrix}}.}

This makes intuitive sense: if all components are the same, shifting them cyclically leaves the vector unchanged.

Now, for the complex conjugate pair of imaginary eigenvalues, we note their interesting relationships:

= 1 ,

{\displaystyle \lambda _{2}\lambda _{3}=1,\quad \lambda _{2}^{2}=\lambda _{3},\quad \lambda _{3}^{2}=\lambda _{2}.}

Using these, we can find the complex eigenvectors:

[

]

[

]

⋅

[

]

{\displaystyle A{\begin{bmatrix}1\\lambda _{2}\\lambda _{3}\end{bmatrix}}={\begin{bmatrix}\lambda _{2}\\lambda _{3}\1\end{bmatrix}}=\lambda _{2}\cdot {\begin{bmatrix}1\\lambda _{2}\\lambda _{3}\end{bmatrix}},}

And similarly for the other complex eigenvalue:

[

]

[

λ 2

]

⋅

[

]

{\displaystyle A{\begin{bmatrix}1\\lambda _{3}\\lambda _{2}\end{bmatrix}}={\begin{bmatrix}\lambda _{3}\\lambda _{2}\1\end{bmatrix}}=\lambda _{3}\cdot {\begin{bmatrix}1\\lambda _{3}\\lambda _{2}\end{bmatrix}}.}

Therefore, the remaining two eigenvectors of A are complex: v λ 2 = ]1 λ 2 λ 3 ] T and v λ 3 = ]1 λ 3 λ 2 ] T , corresponding to eigenvalues λ 2 and λ 3 , respectively. As predicted, these two complex eigenvectors also form a complex conjugate pair, demonstrating the underlying symmetry even in the complex domain:

∗

{\displaystyle \mathbf {v} _{\lambda _{2}}=\mathbf {v} _{\lambda _{3}}^{*}.}

Diagonal matrix example

Matrices that possess entries exclusively along their main diagonal are, rather uncreatively, termed diagonal matrices. A rather convenient property of these matrices is that their eigenvalues are simply the diagonal elements themselves. Let's consider an example:

A

[

]

{\displaystyle A={\begin{bmatrix}1&0&0\0&2&0\0&0&3\end{bmatrix}}.}

The characteristic polynomial of A is derived by taking the determinant of ( A − λI ):

det ( A − λ I )

( 1 − λ ) ( 2 − λ ) ( 3 − λ ) ,

{\displaystyle \det(A-\lambda I)=(1-\lambda )(2-\lambda )(3-\lambda ),}

which, when set to zero, clearly yields the roots λ 1 = 1, λ 2 = 2, and λ 3 = 3. These roots are, as expected, precisely the diagonal elements and, consequently, the eigenvalues of A. It's almost too easy.

Each diagonal element corresponds to an eigenvector whose only nonzero component resides in the same row as that diagonal element. In our example, the eigenvalues correspond to the following eigenvectors:

[

]

[

]

λ 3

[

]

{\displaystyle \mathbf {v} _{\lambda _{1}}={\begin{bmatrix}1\0\0\end{bmatrix}},\quad \mathbf {v} _{\lambda _{2}}={\begin{bmatrix}0\1\0\end{bmatrix}},\quad \mathbf {v} _{\lambda _{3}}={\begin{bmatrix}0\0\1\end{bmatrix}},}

respectively. And, naturally, any scalar multiples of these vectors are also valid eigenvectors. This is the simplest possible case, providing a clear intuition for what eigenvectors and eigenvalues represent.

Triangular matrix example

A matrix where all elements situated above the main diagonal are zero is designated a lower triangular matrix. Conversely, if all elements below the main diagonal are zero, it's an upper triangular matrix. Much like diagonal matrices, a similarly convenient property holds for triangular matrices: their eigenvalues are simply the elements residing on the main diagonal. Another simplification for our benefit.

Consider the following lower triangular matrix:

A

[

]

{\displaystyle A={\begin{bmatrix}1&0&0\1&2&0\2&3&3\end{bmatrix}}.}

The characteristic polynomial of A is determined by calculating the determinant of ( A − λI ):

det ( A − λ I )

( 1 − λ ) ( 2 − λ ) ( 3 − λ ) ,

{\displaystyle \det(A-\lambda I)=(1-\lambda )(2-\lambda )(3-\lambda ),}

which yields the roots λ 1 = 1, λ 2 = 2, and λ 3 = 3. These roots are, as predicted, precisely the diagonal elements and, therefore, the eigenvalues of A.

These eigenvalues correspond to the following eigenvectors, which require a bit more calculation than for a diagonal matrix, but still follow the standard procedure:

[

− 1

1 2

]

[

− 3

]

λ 3

[

]

{\displaystyle \mathbf {v} _{\lambda _{1}}={\begin{bmatrix}1\-1\{\frac {1}{2}}\end{bmatrix}},\quad \mathbf {v} _{\lambda _{2}}={\begin{bmatrix}0\1\-3\end{bmatrix}},\quad \mathbf {v} _{\lambda _{3}}={\begin{bmatrix}0\0\1\end{bmatrix}},}

respectively, along with their inevitable scalar multiples.

Matrix with repeated eigenvalues example

As demonstrated in the previous examples, the eigenvalues of a lower triangular matrix are simply its diagonal elements. Let's consider a matrix where some of these eigenvalues happen to be repeated:

A

[

]

{\displaystyle A={\begin{bmatrix}2&0&0&0\1&2&0&0\0&1&3&0\0&0&1&3\end{bmatrix}},}

Its characteristic polynomial is, by the property of triangular matrices, the product of its diagonal elements:

det ( A − λ I )

2 − λ

3 − λ

= ( 2 − λ

)

( 3 − λ

)

{\displaystyle \det(A-\lambda I)={\begin{vmatrix}2-\lambda &0&0&0\1&2-\lambda &0&0\0&1&3-\lambda &0\0&0&1&3-\lambda \end{vmatrix}}=(2-\lambda )^{2}(3-\lambda )^{2}.}

The roots of this polynomial, and thus the eigenvalues, are 2 and 3. In this case, the algebraic multiplicity of each eigenvalue is 2; they are both double roots. The sum of the algebraic multiplicities of all distinct eigenvalues is μ A = 2 + 2 = 4, which correctly equals n, the order of the characteristic polynomial and the dimension of A. So far, so predictable.

However, a subtle but critical distinction emerges when we consider the geometric multiplicity. For the eigenvalue 2, its geometric multiplicity is only 1. This is because its eigenspace is spanned by just one linearly independent vector, which can be found to be ]0 1 −1 1] T , making its eigenspace 1-dimensional. Similarly, for the eigenvalue 3, its geometric multiplicity is also 1, as its eigenspace is spanned by a single vector, ]0 0 0 1] T . The total geometric multiplicity γ A for this matrix is 1 + 1 = 2, which is the smallest it could possibly be for a matrix with two distinct eigenvalues. This disparity between algebraic and geometric multiplicities indicates that this matrix is defective, meaning it cannot be fully diagonalized using only its eigenvectors. This concept of geometric multiplicity, and its often-unfortunate divergence from algebraic multiplicity, is a crucial detail, as defined in a later section.

Eigenvector-eigenvalue identity

For a Hermitian matrix A – a class of matrices that often simplifies life in quantum mechanics – the square of the absolute value of the α-th component of a normalized eigenvector can be computed using a rather elegant identity that relies solely on the matrix's eigenvalues and the eigenvalues of its corresponding minor matrix. This identity is expressed as:

i α

∏

(

( A ) −

(

) )

∏

k ≠ i

(

( A ) −

( A ) )

{\displaystyle |v_{i\alpha }|^{2}={\frac {\prod _{k}{(\lambda _{i}(A)-\lambda {k}(A{\alpha }))}}{\prod _{k\neq i}{(\lambda _{i}(A)-\lambda _{k}(A))}}},}

where

{\textstyle A_{\alpha }}

denotes the submatrix formed by the rather surgical removal of the α-th row and column from the original matrix A. This identity, a surprisingly fundamental piece of linear algebra, also extends its utility to diagonalizable matrices. It has been, rather amusingly, "rediscovered" numerous times throughout the mathematical literature, suggesting its inherent elegance and utility.

Eigenvalues and eigenfunctions of differential operators

Main article: Eigenfunction

The fundamental definitions of eigenvalues and eigenvectors, typically introduced in the finite-dimensional world of matrices, remain remarkably valid even when the underlying vector space expands into the boundless realms of infinite-dimensional spaces, such as a Hilbert space or a Banach space. A particularly prevalent class of linear transformations that operate within these infinite-dimensional spaces are the differential operators acting on function spaces – a concept crucial in fields like quantum mechanics and signal processing.

Let D be a linear differential operator acting on the space

∞

(

)

{\displaystyle C^{\infty }(\mathbb {R} )}

, which comprises all infinitely differentiable real functions of a real argument t. The eigenvalue equation for such an operator D takes the form of a differential equation:

D f ( t )

λ f ( t )

{\displaystyle Df(t)=\lambda f(t)}

The functions that manage to satisfy this equation are, quite appropriately, termed eigenvectors of D. However, given their functional nature, they are more commonly and specifically referred to as eigenfunctions. They are the functions that, when acted upon by the differential operator, simply scale without changing their fundamental form.

Derivative operator example

Consider the most fundamental of differential operators, the derivative operator, denoted as

d t

{\displaystyle {\tfrac {d}{dt}}}

. Its corresponding eigenvalue equation is:

d t

f ( t )

λ f ( t ) .

{\displaystyle {\frac {d}{dt}}f(t)=\lambda f(t).}

This particular differential equation is, thankfully, quite solvable. One can achieve this by multiplying both sides by dt / f ( t ) and then performing integration. The solution, which is the ubiquitous exponential function

f ( t )

f ( 0 )

λ t

{\displaystyle f(t)=f(0)e^{\lambda t},}

is, rather elegantly, the eigenfunction of the derivative operator. In this specific and rather instructive case, the eigenfunction itself is intrinsically a function of its associated eigenvalue, λ. Notably, for λ = 0, the eigenfunction f ( t ) simplifies to a constant. This means that constants are functions whose rate of change is zero, a fact that should surprise precisely no one.

General definition

The profound concept of eigenvalues and eigenvectors, while often introduced through the lens of matrices, extends quite naturally and elegantly to arbitrary linear transformations operating within arbitrary vector spaces. This is where the true power of abstraction lies. Let V represent any vector space defined over some field K of scalars, and let T denote a linear transformation that maps V into itself:

T : V → V .

{\displaystyle T:V\to V.}

We formally state that a nonzero vector v ∈ V qualifies as an eigenvector of T if and only if there exists a scalar λ ∈ K such that the following fundamental equation holds:

T (

)

{\displaystyle T(\mathbf {v} )=\lambda \mathbf {v} .}

This equation, fundamental to the entire theory, is known as the eigenvalue equation for T. The scalar λ, which quantifies the scaling effect, is the eigenvalue of T corresponding to the eigenvector v. It is essential to grasp that T ( v ) represents the outcome of applying the transformation T to the vector v , while λ v is simply the product of the scalar λ with the vector v . This distinction, while subtle, is critical for understanding the nature of these intrinsic properties.

Eigenspaces, geometric multiplicity, and the eigenbasis

Given a specific eigenvalue λ, let us consider the set E, which is defined as:

E

{

: T (

)

}

{\displaystyle E=\left{\mathbf {v} :T(\mathbf {v} )=\lambda \mathbf {v} \right},}

This set E encompasses the union of the zero vector with all the eigenvectors associated with λ. This collection E is formally known as the eigenspace or characteristic space of T associated with λ. Crucially, it is also precisely the kernel (or nullspace) of the linear transformation T − λI , which maps vectors to zero if they are in the eigenspace.

By the very definition of a linear transformation, it adheres to the following properties:

T (

)

= T (

) + T (

) ,

T ( α

)

= α T (

) ,

{\displaystyle {\begin{aligned}T(\mathbf {x} +\mathbf {y} )&=T(\mathbf {x} )+T(\mathbf {y} ),\T(\alpha \mathbf {x} )&=\alpha T(\mathbf {x} ),\end{aligned}}}

for any vectors x , y ∈ V and any scalar α ∈ K . Therefore, if u and v are both eigenvectors of T corresponding to the same eigenvalue λ (meaning u , v ∈ E ), then their sum and scalar multiples also exhibit this behavior:

T (

)

= T (

) + T (

)

u + λ

v

λ (

) ,

T ( α

)

= α T (

)

α ( λ

v )

λ ( α

) .

{\displaystyle {\begin{aligned}T(\mathbf {u} +\mathbf {v} )&=\lambda (\mathbf {u} +\mathbf {v }),\T(\alpha \mathbf {v} )&=\lambda (\alpha \mathbf {v }).\end{aligned}}}

Thus, both u + v and α v are either the zero vector or, more interestingly, eigenvectors of T also associated with λ. This means that u + v and α v both belong to E, establishing that E is closed under vector addition and scalar multiplication. Consequently, the eigenspace E associated with λ is, by definition, a linear subspace of V. If this subspace happens to have a dimension of 1, it is sometimes, rather poetically, referred to as an eigenline.

The geometric multiplicity, denoted as γ T ( λ ), of an eigenvalue λ is defined as the dimension of the eigenspace associated with λ. Equivalently, it is the maximum number of linearly independent eigenvectors that can be associated with that specific eigenvalue. By the very definition of eigenvalues and eigenvectors, γ T ( λ ) must be at least 1, as every eigenvalue is guaranteed to have at least one corresponding eigenvector.

The eigenspaces of T, each corresponding to a distinct eigenvalue, invariably form a direct sum. A significant consequence of this is that eigenvectors corresponding to different eigenvalues are always linearly independent. Therefore, the sum of the dimensions of all the eigenspaces cannot possibly exceed the dimension n of the vector space V upon which T operates. This also implies that there cannot be more than n distinct eigenvalues, a rather reassuring upper bound.

Any subspace that is spanned by the eigenvectors of T forms an invariant subspace of T. Furthermore, the restriction of T to such a subspace is, conveniently, diagonalizable. More importantly, if the entire vector space V can be spanned by the eigenvectors of T – or, to state it another way, if the direct sum of all the eigenspaces associated with T's eigenvalues constitutes the entire vector space V – then a basis for V, aptly named an eigenbasis, can be constructed from linearly independent eigenvectors of T. When T is so accommodating as to admit an eigenbasis, T is, by definition, diagonalizable. This is the ideal scenario for simplifying the analysis of the transformation.

Spectral theory

Main article: Spectral theory

If λ happens to be an eigenvalue of T, then the operator ( T − λI ) is demonstrably not one-to-one. This implies that its inverse, ( T − λI ) −1 , simply does not exist. This direct relationship holds true for finite-dimensional vector spaces. However, in the more expansive realm of infinite-dimensional vector spaces, the converse is not necessarily true; the operator ( T − λI ) might still lack an inverse even if λ is not an eigenvalue. It seems infinite dimensions delight in adding complexity.

For this very reason, in the more abstract domain of functional analysis, eigenvalues are elegantly generalized to the concept of the spectrum of a linear operator T. This spectrum is defined as the set of all scalars λ for which the operator ( T − λI ) does not possess a bounded inverse. Crucially, the spectrum of an operator always encompasses all its eigenvalues, but it is not strictly confined to them; it's a broader, more inclusive set that captures a wider range of the operator's intrinsic properties.

Associative algebras and representation theory

Main article: Weight (representation theory)

One can, if one so desires, generalize the algebraic object that acts upon the vector space. Instead of a singular operator, we can consider an algebra representation – essentially, an associative algebra acting on a module. The comprehensive study of such actions forms the rich and complex field of representation theory.

Within this framework, the representation-theoretical concept of weight serves as a direct analog to eigenvalues. Similarly, weight vectors and weight spaces are the corresponding analogs of eigenvectors and eigenspaces, respectively. It's a way of abstracting the core ideas to broader algebraic structures.

A rather specialized, but related, concept is the Hecke eigensheaf, which is characterized as a tensor-multiple of itself and finds its place in the intricate tapestry of the Langlands correspondence – a topic that, for most, remains delightfully opaque.

Dynamic equations

The most straightforward difference equations take the general form:

t − 1

t − 2

⋯ +

x t − k

{\displaystyle x_{t}=a_{1}x_{t-1}+a_{2}x_{t-2}+\cdots +a_{k}x_{t-k}.}

The solution for x in terms of t for this equation is elegantly discovered by employing its characteristic equation:

−

k − 1

−

k − 2

− ⋯ −

k − 1

λ −

= 0 ,

{\displaystyle \lambda ^{k}-a_{1}\lambda ^{k-1}-a_{2}\lambda ^{k-2}-\cdots -a_{k-1}\lambda -a_{k}=0,}

This characteristic equation can be derived by constructing a matrix form of a system of equations. This system consists of the original difference equation augmented by k – 1 trivial identities (e.g., x t –1 = x t –1 , ..., x t – k +1 = x t – k +1 ). This yields a k-dimensional first-order system for the stacked variable vector ] x t ⋅⋅⋅ x t – k +1 ] in terms of its once-lagged value. The characteristic equation of this system's matrix then provides the necessary roots. This equation, when solved, yields k characteristic roots, λ 1 , ... , λ k , which are then used to construct the general solution equation:

⋯ +

{\displaystyle x_{t}=c_{1}\lambda {1}^{t}+\cdots +c{k}\lambda _{k}^{t}.}

A strikingly similar methodology is applied for solving a differential equation of the form:

k − 1

⋯ +

d x

d t

x

{\displaystyle {\frac {d^{k}x}{dt^{k}}}+a_{k-1}{\frac {d^{k-1}x}{dt^{k-1}}}+\cdots +a_{1}{\frac {dx}{dt}}+a_{0}x=0.}

In both cases, eigenvalues (or characteristic roots) are the keys to understanding the system's dynamic behavior over time. It's almost as if the universe prefers to operate in predictable, exponential ways.

Calculation

Main article: Eigenvalue algorithm

The actual calculation of eigenvalues and eigenvectors is a domain where the elegant theory, so neatly presented in elementary linear algebra textbooks, often diverges quite sharply from the messy realities of practical application. It's a classic case of "it looks good on paper."

Classical method

The classical method, for those who appreciate tradition, involves a two-step process: first, one determines the eigenvalues, and then, for each eigenvalue found, one proceeds to calculate its corresponding eigenvectors. While conceptually straightforward, this approach is, in several critical ways, poorly suited for computations involving non-exact arithmetics, such as the ubiquitous floating-point numbers used in virtually all computer calculations. The precision of the universe is rarely matched by our machines.

Eigenvalues

The eigenvalues of a matrix A can, in theory, be determined by diligently finding the roots of its characteristic polynomial. This task is generally manageable for small 2 × 2 matrices, but the computational difficulty escalates with alarming rapidity as the size of the matrix increases. It’s a harsh reality that complexity grows faster than one would like.

Theoretically, the coefficients of the characteristic polynomial can be computed exactly, as they are simply sums of products of matrix elements. Furthermore, algorithms exist that can find all the roots of a polynomial of arbitrary degree to any desired accuracy. However, this seemingly robust approach is rarely viable in practical scenarios. The primary culprit? The coefficients would inevitably be contaminated by unavoidable round-off errors during computation. Compounding this issue, the roots of a polynomial can be extraordinarily sensitive functions of their coefficients, a phenomenon famously illustrated by Wilkinson's polynomial, where tiny changes in coefficients lead to dramatic shifts in roots. Even for matrices whose elements are integers, the exact calculation becomes nontrivial because the sums involved are excessively long. The constant term of the polynomial, for instance, is the determinant, which for an n × n matrix is a staggering sum of n! different products. The combinatorial explosion is a rather inconvenient truth.

Explicit algebraic formulas for the roots of a polynomial exist only if the degree n is 4 or less. This is not a mere oversight; it's a fundamental mathematical limitation. According to the profound Abel–Ruffini theorem, there is simply no general, explicit, and exact algebraic formula for the roots of a polynomial with degree 5 or more. (This generality matters, as any polynomial of degree n can be constructed as the characteristic polynomial of some companion matrix of order n.) Therefore, for matrices of order 5 or greater, the eigenvalues and eigenvectors cannot be obtained through an explicit algebraic formula. Instead, they must be approximated using numerical methods. Even the exact formula for the roots of a degree 3 polynomial, while existing, is often numerically impractical due to its complexity and susceptibility to round-off errors.

Eigenvectors

Once the (exact) value of an eigenvalue is known, the corresponding eigenvectors can be determined by searching for nonzero solutions to the eigenvalue equation, which, at this point, has transformed into a system of linear equations with known coefficients. For example, once we establish that 6 is an eigenvalue of the matrix:

A

[

]

{\displaystyle A={\begin{bmatrix}4&1\6&3\end{bmatrix}}}

we can proceed to find its eigenvectors by solving the equation Av = 6 v , which translates to:

[

]

[

]

= 6 ⋅

[

]

{\displaystyle {\begin{bmatrix}4&1\6&3\end{bmatrix}}{\begin{bmatrix}x\y\end{bmatrix}}=6\cdot {\begin{bmatrix}x\y\end{bmatrix}}}

This matrix equation is equivalent to the following two linear equations:

{

4 x +

= 6 x

6 x + 3 y

= 6 y

{\displaystyle \left{{\begin{aligned}4x+{\hphantom {3}}y&=6x\6x+3y&=6y\end{aligned}}\right.}

Simplifying these, we get:

{

− 2 x +

= 0

6 x − 3 y

= 0

{\displaystyle \left{{\begin{aligned}-2x+{\hphantom {3}}y&=0\6x-3y&=0\end{aligned}}\right.}

Both equations, rather conveniently, reduce to the single linear equation y = 2 x . Therefore, any vector of the form ] a 2 a ] T , for any nonzero real number a, is an eigenvector of A associated with the eigenvalue λ = 6. It's a line of invariant directions.

The matrix A, as it happens, has another eigenvalue, λ = 1. A similar calculation reveals that the corresponding eigenvectors are the nonzero solutions of 3 x + y = 0. This means any vector of the form ] b −3 b ] T , for any nonzero real number b, is an eigenvector for λ = 1. The process is systematic, if a little repetitive.

Simple iterative methods

Main article: Power iteration

The converse approach to the classical method – that is, first seeking the eigenvectors and then determining each eigenvalue from its found eigenvector – proves to be far more amenable to computational methods. It seems computers, like some people, prefer to work backwards. The simplest algorithm in this category is the power method, which involves selecting an arbitrary starting vector and then repeatedly multiplying it by the matrix (with an optional step of normalizing the vector to prevent its elements from growing to unmanageable sizes). Through this iterative process, the vector gradually converges towards an eigenvector corresponding to the dominant (largest magnitude) eigenvalue.

A useful variation on this theme involves multiplying the vector by ( A − μI ) −1 , where μ is a complex scalar. This modification causes the vector to converge to an eigenvector associated with the eigenvalue closest to

μ ∈

{\displaystyle \mu \in \mathbb {C} }

. This "inverse iteration" allows targeting specific eigenvalues.

Once v is (or is a sufficiently good approximation of) an eigenvector of A, the corresponding eigenvalue can then be conveniently computed using the Rayleigh quotient:

λ

∗

{\displaystyle \lambda ={\frac {\mathbf {v} ^{}A\mathbf {v} }{\mathbf {v} ^{}\mathbf {v} }}}

where v ∗ denotes the conjugate transpose of v . This provides an elegant way to extract the scaling factor once the invariant direction is identified.

Modern methods

Efficient and accurate methods for computing eigenvalues and eigenvectors of arbitrary matrices were not fully realized until the advent of the QR algorithm, which was ingeniously designed in 1961. This algorithm revolutionized numerical linear algebra. Further advancements have been made; for instance, combining the Householder transformation with LU decomposition can result in an algorithm that boasts even better convergence properties than the standard QR algorithm. For very large Hermitian sparse matrices – matrices with a vast number of zero entries – the Lanczos algorithm stands out as a particularly efficient iterative method for computing eigenvalues and eigenvectors, among a host of other specialized possibilities. It seems the quest for computational efficiency is never-ending.

It's worth noting that most numeric methods designed to compute the eigenvalues of a matrix also, as a rather convenient by-product, determine a set of corresponding eigenvectors during the computation. Although, sometimes, implementers, in their pursuit of minimal memory and computational overhead, choose to discard this eigenvector information as soon as it is no longer explicitly required. A pragmatic, if somewhat cold, approach.

Applications

Geometric transformations

Eigenvectors and eigenvalues are remarkably useful tools for unraveling and understanding the effects of linear transformations on various geometric shapes. They reveal the intrinsic directions of stretching, shrinking, or reversal. The following table provides a concise overview of several common geometric transformations in the plane, along with their respective 2 × 2 matrices, eigenvalues, and eigenvectors. It's a quick reference for those who prefer concrete examples.

Eigenvalues of geometric transformations

Scaling

Unequal scaling

Rotation

Eigenvalues And Eigenvectors

)

λ

j

det ( A − λ I )

det ( A − λ I )

A

det ( A − λ I )

λ

λ

det ( A − λ I )

i

E

( λ )

k

E B

i

tr ⁡ ( A )

i

i

det ( A )

i

v

A

Q

A Q

A Q

A

A Q

A

λ

λ

λ

λ

A

Given the upper-triangular block structure, this simplifies to:

A

A

det ( A − λ I )

A

det ( A − λ I )

A

det ( A − λ I )

D f ( t )

f ( t )

f ( t )

)

E

)

)

v

)

v )

x

A

λ