QUICK FACTS
Created Jan 0001
Status Verified Sarcastic
Type Existential Dread
multivariate derivative, mathematics, derivative, calculus, slope, gradian, gradient (disambiguation), verification, improve this article

Gradient

“The concept of a multivariate derivative is a fundamental cornerstone in mathematics, extending the familiar notion of a derivative from single-variable...”

Contents
  • 1. Overview
  • 2. Etymology
  • 3. Cultural Impact

The concept of a multivariate derivative is a fundamental cornerstone in mathematics , extending the familiar notion of a derivative from single-variable calculus into the more complex, multi-dimensional realm. While the term “derivative” might conjure images of simple slopes, its multivariate counterpart, the gradient, is far more sophisticated, describing the intricate landscape of how a function changes across multiple dimensions. For those seeking simpler definitions of incline, one might refer to Slope , or for a similarly spelled angular unit, Gradian . However, for the true essence of change in a multi-dimensional function, the gradient is the precise tool. Other contexts for the term can be found under Gradient (disambiguation) .

This article, like most things of genuine substance, finds itself in need of additional substantiation for verification . While the core principles remain immutable, a more rigorous academic framework would undoubtedly benefit those who insist on such formalities. One might consider contributing to improve this article by adding citations to reliable sources , for instance. The removal of unsourced material is, regrettably, a constant threat in these digital archives. For further investigation, one could find sources related to “Gradient” through news, newspapers, books, academic scholars, or JSTOR, particularly for content from January 2018. One might Learn how and when to remove this message if they find themselves particularly bothered by such administrative notations.

The visual representation to the right, marked by the distinctive blue arrows, elegantly illustrates the gradient. It precisely denotes the direction of greatest change for a given scalar function . The underlying values of this function are meticulously rendered in greyscale, with the magnitude of the function increasing progressively from white (representing low values) to dark (representing high values). This visual alone should clarify the fundamental premise, though I suspect some will still find a way to misunderstand.


Part of a series of articles about Calculus

The following mathematical expression represents the Fundamental theorem of calculus , a central concept that links differentiation and integration: $$ \int _{a}^{b}f’(t),dt=f(b)-f(a) $$ This theorem, a pillar of calculus , underpins many of the ideas discussed here, establishing a profound connection between the rate of change and the accumulation of quantities.

Core Concepts of Calculus :

Differential Calculus : Definitions:

Concepts:

Rules and identities:

Integral Calculus :

Definitions:

Integration by:

Series (mathematics) :

Convergence tests:

Vector Calculus :

Theorems:

Multivariable Calculus : Formalisms:

Definitions:

Advanced Calculus :

Specialized Calculus :

Miscellanea:


In the realm of vector calculus , the gradient of a scalar-valued differentiable function —let’s call it $f$—that operates on several variables is not just a number, but a vector field . One might also refer to it as a vector-valued function , if precision is your goal. This vector field, typically represented as $\nabla f$, possesses a rather specific and crucial property: its value at any given point $p$ in the function’s domain provides both the precise direction and the specific rate of the fastest possible increase of the function from that point. It’s the ultimate guide to ascending the steepest path.

This mathematical construct is not arbitrary; the gradient behaves predictably, transforming just as a proper vector should under any change of the basis of the space where the variables of $f$ reside. If, by some chance, the gradient of a function happens to be non-zero at a particular point $p$, then the direction indicated by this gradient vector is, without question, the direction in which the function’s value will increase most rapidly when moving away from $p$. Furthermore, the sheer magnitude of this gradient vector quantifies this rate of increase, representing the greatest possible absolute directional derivative at that point. [^1] Conversely, a point where the gradient is precisely the zero vector is rather uncreatively termed a stationary point . This is where the function, at least infinitesimally, isn’t going anywhere fast. Such points are, unsurprisingly, of monumental importance in optimization theory , where the relentless pursuit of minima (or maxima) is paramount. It also finds its way into the trendy fields of machine learning and artificial intelligence , where algorithms like gradient descent rely on it to iteratively minimize cost functions, blindly following the path of steepest decline until some semblance of optimal performance is achieved.

In more abstract, coordinate-free terms—for those who prefer such elegant generalities—the gradient of a function $f(\mathbf{r})$ can be defined through the relationship: $$ df=\nabla f\cdot d\mathbf{r} $$ Here, $df$ represents the total infinitesimal change in the function $f$ that results from an infinitesimal displacement $d\mathbf{r}$. From this expression, it becomes evident that $df$ achieves its maximum value when the infinitesimal displacement $d\mathbf{r}$ is perfectly aligned with the direction of the gradient $\nabla f$. The symbol $\nabla$, often written as an upside-down triangle and pronounced “del” or “nabla,” is itself a vector differential operator . It’s a shorthand for a collection of partial derivatives, a mathematical entity that operates on functions to produce other functions, or in this case, vector fields.

When operating within a coordinate system where the basis vectors are conveniently constant and do not vary with position—a luxury not always afforded in more complex geometries—the gradient is straightforwardly given by a vector [^a] whose components are simply the partial derivatives of $f$ evaluated at point $p$. [^2] That is to say, for a function $f\colon \mathbb{R}^{n}\to \mathbb{R}$ mapping from $n$-dimensional Euclidean space to the real numbers, its gradient $\nabla f\colon \mathbb{R}^{n}\to \mathbb{R}^{n}$ is defined at a point $p=(x_{1},\ldots ,x_{n})$ as the vector [^b]: $$ \nabla f(p)={\begin{bmatrix}{\frac {\partial f}{\partial x_{1}}}(p)\\vdots \{\frac {\partial f}{\partial x_{n}}}(p)\end{bmatrix}} $$ It is crucial to note, and perhaps a point often overlooked by the less meticulous, that this definition of the gradient is only valid if the function $f$ is actually differentiable at point $p$. One might assume that the existence of partial derivatives in every direction guarantees differentiability, but such an assumption is a perilous trap. There exist functions for which individual partial derivatives are perfectly well-defined in every direction, yet the function itself fails to be differentiable at that point. Furthermore, the elegant simplification of defining the gradient as merely the vector of partial derivatives holds true only when the basis of the coordinate system in use is orthonormal . For any other basis—a scenario far more common in the wild expanse of physics and engineering—the metric tensor at that specific point must be meticulously factored into the calculation. Ignoring this detail is a recipe for mathematical disaster.

Consider, for instance, the function $f(x,y)={\frac {x^{2}y}{x^{2}+y^{2}}}$ everywhere except at the origin, where $f(0,0)=0$. This function, despite possessing well-defined partial derivatives in every direction at the origin, is decidedly not differentiable at that point. It lacks a well-defined tangent plane, which is the very essence of differentiability in multiple dimensions. [^3] In such a peculiar example, if one were to rotate the $x-y$ coordinate system, the standard formula for the gradient (as a simple vector of partials) would fail to transform correctly as a true vector, becoming dependent on the arbitrary choice of basis for the coordinate system. Worse still, it might even fail to point towards the “steepest ascent” in certain orientations, rendering its primary utility moot. For functions that are genuinely differentiable, and for which the standard formula for the gradient holds, it can be rigorously demonstrated that it always transforms as a proper vector under any basis transformation, and it always points towards the direction of fastest increase. Consistency, it seems, is a virtue reserved for the well-behaved.

The gradient holds a curious relationship with the total derivative , $df$: they are, in a sense, dual to one another. While the value of the gradient at a point manifests as a tangent vector —a vector anchored at each point, indicating a direction in the space itself—the value of the derivative at that same point is a cotangent vector —a linear functional that operates on vectors. [^c] Their connection, however, is not tenuous: the dot product of the gradient of $f$ at a point $p$ with any other tangent vector $\mathbf{v}$ yields precisely the directional derivative of $f$ at $p$ along the direction of $\mathbf{v}$. That is, $\nabla f(p)\cdot \mathbf {v} ={\frac {\partial f}{\partial \mathbf {v} }}(p)=df_{p}(\mathbf {v} )$. This elegant equivalence underscores their intertwined nature. The gradient, in its expansive utility, also admits multiple generalizations, extending its reach to more abstract functions on manifolds , a topic we shall grudgingly delve into later.

Motivation

One might consider a room, perhaps one with poorly regulated heating, where the temperature is described by a scalar field , denoted as $T$. This means that at every single point $(x,y,z)$ within that room, there is a specific temperature $T(x,y,z)$, completely independent of time (we’re keeping things simple, for now). If you were to calculate the gradient of $T$ at any given point in that room, the resulting vector would unfailingly indicate the precise direction in which the temperature rises most rapidly as you move away from $(x,y,z)$. Furthermore, the sheer length, or magnitude , of this gradient vector would quantitatively tell you how fast that temperature rises in that specific direction. It’s the ultimate guide to finding the hottest (or coldest, if you reverse the direction) spot.

Now, let’s consider a landscape, perhaps a rather inconveniently steep one, where the height above sea level at any point $(x,y)$ on the horizontal plane is given by a function $H(x,y)$. The gradient of $H$ at a particular point is not just any vector; it’s a two-dimensional plane vector that points unequivocally in the direction of the steepest slope or grade at that exact spot. The actual steepness of this slope at that point is then directly quantified by the magnitude of that gradient vector. It tells you exactly how much effort you’d need to exert if you were foolish enough to try and walk directly uphill.

The utility of the gradient extends beyond merely identifying the direction of greatest change. It can also be cleverly employed to ascertain how a scalar field varies in other directions, directions that are not necessarily the path of steepest ascent. This is achieved through the elegant operation of taking a dot product . Imagine, if you will, that the steepest possible slope on a particular hill is a formidable 40%. A path that leads directly uphill will, naturally, have a 40% slope. However, a winding road that traverses the hill at an angle to the direct uphill path will, by definition, present a shallower slope. For instance, if this road deviates by a 60° angle from the direct uphill direction (when both directions are projected onto the horizontal plane, naturally), then the slope experienced along that road will be the dot product between the gradient vector (representing the steepest slope) and a unit vector aligned with the road. This dot product, you see, quantifies how much the unit vector along the road “aligns” with the steepest slope. [^d] Thus, the slope along the road would be 40% multiplied by the cosine of 60°, which, if you’re keeping track, is a more manageable 20%.

More generally, if the function describing the hill’s height, $H$, is sufficiently differentiable —meaning it’s smooth enough not to have any sudden, jarring changes—then the gradient of $H$ dotted with any unit vector will yield the exact slope of the hill in the direction specified by that unit vector. This is precisely what is known as the directional derivative of $H$ along that unit vector. It’s a remarkably versatile tool for dissecting how functions change in any desired orientation.

Notation

The gradient of a function, let’s call it $f$, evaluated at a specific point $a$, is most commonly and succinctly expressed as $\nabla f(a)$. This is the standard, the default, the notation one should expect to encounter. However, as is often the case in mathematics, a multitude of alternative representations exist, perhaps to cater to differing aesthetic preferences or to emphasize particular aspects of the mathematical object.

Among these, one might also encounter:

  • ${\vec {\nabla }}f(a)$: This notation serves to explicitly underscore the inherent vector nature of the result. It’s a vector, in case you were in any doubt.
  • $\operatorname {grad} f$: A more verbose, yet equally valid, way to denote the gradient, particularly favored in some older texts or specific fields.
  • $\partial {i}f$ and $f{i}$: These are components of the gradient, often used in conjunction with Einstein notation . In this compact notation, repeated indices (like the $i$ here) implicitly imply a summation over all possible values of that index. This saves ink and time, assuming you’re fluent in its conventions.

Definition

The image provided, depicting the gradient of the 2D function $f(x,y)=-(\cos^{2}x+\cos^{2}y)^{2}$ as a projected vector field on the bottom plane, offers a compelling visual summary of the concept. It shows, quite literally, the direction and magnitude of the steepest ascent at every point in the domain.

The gradient, which can be thought of as a gradient vector field, of a scalar function $f(x_{1}, x_{2}, x_{3}, \dots, x_{n})$ is typically denoted either as $\nabla f$ or, with a more explicit nod to its vector identity, $\vec{\nabla} f$. The symbol $\nabla$ itself, known as nabla , represents the vector differential operator , affectionately referred to as del . As previously mentioned, the notation $\operatorname {grad} f$ is also a perfectly acceptable, albeit slightly more antiquated, way to represent this mathematical entity.

The gradient of $f$ is uniquely defined as the specific vector field such that its dot product with any arbitrary vector $\mathbf{v}$ at any given point $x$ yields precisely the directional derivative of $f$ along the direction of $\mathbf{v}$. In more formal terms: $$ {\big (}\nabla f(x){\big )}\cdot {\hat {\mathbf {v} }}=D_{\mathbf {v} }f(x) $$ where the right-hand side, $D_{\mathbf{v}}f(x)$, is the directional derivative , a quantity that can be expressed in various ways depending on the chosen notation. Formally, and this is a subtle but crucial distinction, the derivative itself is considered dual to the gradient; this relationship will be elaborated upon further in a dedicated section, for those who appreciate such intricacies.

It’s worth noting a practical consideration: when a function’s value also depends on an additional parameter, such as time, the term “gradient” typically refers exclusively to the vector composed of its spatial derivatives. This is to differentiate it from a total derivative that would include the time component. For example, if you have a temperature field $T(x,y,z,t)$, the gradient $\nabla T$ would only involve $\frac{\partial T}{\partial x}$, $\frac{\partial T}{\partial y}$, and $\frac{\partial T}{\partial z}$, effectively capturing the spatial variation at a fixed moment in time. This specialized application is sometimes referred to as the Spatial gradient .

A particularly powerful and reassuring property of the gradient is that both its magnitude and its direction are entirely independent of the specific coordinate representation chosen to describe the function. [^4] [^5] This means that the intrinsic rate and direction of steepest ascent of a function are fundamental properties of the function itself, not artifacts of the arbitrary grid you impose upon it. It’s a truly invariant quantity, reflecting a deeper underlying reality.

Cartesian coordinates

For those who prefer their mathematical spaces neatly aligned and orthogonal, the three-dimensional Cartesian coordinate system , equipped with a standard Euclidean metric , offers the most straightforward expression for the gradient. If the gradient exists for a function $f$, it is given by: $$ \nabla f={\frac {\partial f}{\partial x}}\mathbf {i} +{\frac {\partial f}{\partial y}}\mathbf {j} +{\frac {\partial f}{\partial z}}\mathbf {k} $$ Here, $\mathbf{i}$, $\mathbf{j}$, and $\mathbf{k}$ represent the standard unit vectors, each pointing along the positive directions of the $x$, $y$, and $z$ coordinates, respectively. These are the fundamental building blocks of this coordinate system, providing a clear directional framework. Each term in the sum represents the rate of change of the function along one of the principal axes, and the gradient vector combines these rates into a single, comprehensive direction of steepest change.

Let’s illustrate with a concrete example, since abstract notions often prove elusive. Consider the function: $$ f(x,y,z)=2x+3y^{2}-\sin(z) $$ To find its gradient, we simply compute the partial derivative with respect to each variable and assemble them into our vector: $$ \nabla f(x,y,z)=2\mathbf {i} +6y\mathbf {j} -\cos(z)\mathbf {k} $$ This vector tells you, at any point $(x,y,z)$, the direction and magnitude of the fastest increase for $f$. For instance, if you were at $(1,1,0)$, the gradient would be $2\mathbf{i} + 6\mathbf{j} - \mathbf{k}$, indicating a strong upward trend in the $y$-direction.

Alternatively, this can be expressed more compactly as a column vector: $$ \nabla f(x,y,z)={\begin{bmatrix}2\6y\-\cos z\end{bmatrix}} $$ It is a common practice in certain applications to represent the gradient either as a row vector or a column vector of its components within a rectangular coordinate system. For the sake of consistency and clarity, this article adheres to the convention of the gradient being a column vector, while its dual, the derivative, will be represented as a row vector. This distinction, though seemingly minor, is crucial in the broader context of linear algebra and tensor calculus.

Cylindrical and spherical coordinates

While Cartesian coordinates are wonderfully intuitive, the universe, in its infinite complexity, often demands more specialized coordinate systems to describe phenomena with natural symmetries. For those situations, we grudgingly turn to cylindrical and spherical coordinates. For a more exhaustive treatment of how the del operator behaves in these systems, one might consult the main article: Del in cylindrical and spherical coordinates .

In the case of cylindrical coordinates , which are particularly suited for problems exhibiting axial symmetry, the gradient is given by the following expression [^6]: $$ \nabla f(\rho ,\varphi ,z)={\frac {\partial f}{\partial \rho }}\mathbf {e} {\rho }+{\frac {1}{\rho }}{\frac {\partial f}{\partial \varphi }}\mathbf {e} {\varphi }+{\frac {\partial f}{\partial z}}\mathbf {e} {z} $$ Here, $\rho$ denotes the radial distance from the $z$-axis (the axial distance), $\varphi$ is the azimuthal or azimuth angle (the angle around the $z$-axis), and $z$ is, rather predictably, the axial coordinate along the $z$-axis. The terms $\mathbf{e}{\rho}$, $\mathbf{e}{\varphi}$, and $\mathbf{e}{z}$ are the unit vectors that point along their respective coordinate directions. Notice the presence of the $1/\rho$ term in the $\varphi$ component. This is not an arbitrary addition; it accounts for the fact that a change in $\varphi$ at a smaller radius $\rho$ corresponds to a smaller physical displacement than the same change in $\varphi$ at a larger radius. The coordinate lines for $\varphi$ effectively “stretch” as you move away from the $z$-axis, and this term normalizes that effect, ensuring the gradient correctly reflects the physical rate of change.

Similarly, in spherical coordinates , which are indispensable for problems possessing spherical symmetry, and again assuming a Euclidean metric , the gradient takes on a different, yet equally precise, form [^6]: $$ \nabla f(r,\theta ,\varphi )={\frac {\partial f}{\partial r}}\mathbf {e} {r}+{\frac {1}{r}}{\frac {\partial f}{\partial \theta }}\mathbf {e} {\theta }+{\frac {1}{r\sin \theta }}{\frac {\partial f}{\partial \varphi }}\mathbf {e} {\varphi } $$ In this system, $r$ represents the radial distance from the origin, $\theta$ is the polar angle (measured from the positive $z$-axis), and $\varphi$ is the azimuthal angle (identical to its cylindrical counterpart). The vectors $\mathbf{e}{r}$, $\mathbf{e}{\theta}$, and $\mathbf{e}{\varphi}$ are, once more, the local unit vectors aligned with these coordinate directions, representing the normalized covariant basis . The terms $1/r$ and $1/(r \sin\theta)$ serve analogous purposes to the $1/\rho$ in cylindrical coordinates, correcting for the varying “sizes” of the coordinate steps as one moves through space. The $\theta$ and $\varphi$ coordinate lines, much like lines of longitude and latitude on a globe, are not uniformly spaced in physical distance, and these scaling factors ensure the gradient accurately reflects the true directional rates of change.

For the gradient’s manifestations in other orthogonal coordinate systems , one can refer to the detailed exposition in Orthogonal coordinates (Differential operators in three dimensions) . Each system, with its unique scaling factors, provides a tailored lens through which to view the multivariate derivative.

General coordinates

When the elegant simplicity of Cartesian, cylindrical, or spherical coordinates proves insufficient—a common occurrence when dealing with complex geometries or curved spaces—we must resort to the more abstract framework of general coordinates . Let’s denote these coordinates as $x^1, \dots, x^i, \dots, x^n$, where $n$ is the number of dimensions of the domain. It’s important to clarify that the upper index here refers to the position of the coordinate in a list, so $x^2$ is merely the second coordinate, not the quantity $x$ squared. The index variable $i$ is a placeholder for an arbitrary element $x^i$.

Using the powerful shorthand of Einstein notation , the gradient can be expressed in its most general form as: $$ \nabla f={\frac {\partial f}{\partial x^{i}}}g^{ij}\mathbf {e} _{j} $$ This expression, while compact, is laden with meaning. The term $\frac{\partial f}{\partial x^{i}}$ represents the partial derivative of the function $f$ with respect to the $i$-th coordinate. The $g^{ij}$ are the components of the inverse metric tensor , a fundamental object that encapsulates the geometry of the space and allows us to convert between covariant and contravariant components. The $\mathbf{e}_j$ are the unnormalized local covariant basis vectors , which essentially define the directions of the coordinate axes at each point. The Einstein summation convention, a convenience for those fluent in tensor calculus, implies that we sum over both repeated indices $i$ and $j$. This formula effectively takes the partial derivatives (which are covariant components of the differential) and “raises” them to contravariant components to form a vector, using the metric.

Its dual, the total differential or exterior derivative , $df$, which represents a covector or linear form , is written as: $$ \mathrm {d} f={\frac {\partial f}{\partial x^{i}}}\mathbf {e} ^{i} $$ Here, $\mathbf{e}^{i}$ are the unnormalized local contravariant basis covectors , which are dual to the covariant basis vectors.

Specifically, $\mathbf{e}^{i}=\mathrm{d}x^{i}$ represents the differential of the $i$-th coordinate, while $\mathbf{e}_{i}=\partial \mathbf{x}/\partial x^{i}$ denotes the tangent vector along the $i$-th coordinate curve. The inverse metric tensor , $g^{ij}$, is the crucial component that mediates the transformation between these covariant and contravariant representations, reflecting the intrinsic geometry of the space.

If, by some stroke of luck or clever coordinate choice, the coordinates happen to be orthogonal, the expression for the gradient (and the differential ) simplifies considerably. In such cases, we can express it in terms of the normalized bases, which we denote as ${\hat {\mathbf {e}}}{i}$ and ${\hat {\mathbf {e}}}^{i}$, by introducing the scale factors (also known as Lamé coefficients ) $h{i}=\lVert \mathbf {e} {i}\rVert ={\sqrt {g{ii}}}=1,/\lVert \mathbf {e} ^{i}\rVert$: $$ \nabla f={\frac {\partial f}{\partial x^{i}}}g^{ij}{\hat {\mathbf {e}}}{j}{\sqrt {g{jj}}}=\sum {i=1}^{n},{\frac {\partial f}{\partial x^{i}}}{\frac {1}{h{i}}}\mathbf {\hat {e}} _{i} $$ (And for the differential: $\mathrm {d} f=\sum {i=1}^{n},{\frac {\partial f}{\partial x^{i}}}{\frac {1}{h{i}}}\mathbf {\hat {e}} ^{i}$).

It’s important to realize that in this specific formulation for orthogonal coordinates, we cannot utilize Einstein notation because it becomes impossible to avoid the repetition of more than two indices, which would violate the convention. Furthermore, despite the use of upper and lower indices, the normalized basis vectors ${\hat {\mathbf {e}}}{i}$, ${\hat {\mathbf {e}}}^{i}$, and the scale factors $h{i}$ themselves are neither contravariant nor covariant quantities. They are, rather, scalar magnitudes and normalized directions specific to the chosen coordinate system.

This latter, more explicit expression, when specialized, neatly recovers the formulas previously given for cylindrical and spherical coordinates. It’s a testament to the unifying power of general coordinate systems, even if their initial appearance can be somewhat daunting.

Relationship with derivative


Part of a series of articles about Calculus

The following mathematical expression represents the Fundamental theorem of calculus , a central concept that links differentiation and integration: $$ \int _{a}^{b}f’(t),dt=f(b)-f(a) $$ This theorem, a pillar of calculus , underpins many of the ideas discussed here, establishing a profound connection between the rate of change and the accumulation of quantities.

Core Concepts of Calculus :

Differential Calculus : Definitions:

Concepts:

Rules and identities:

Integral Calculus :

Definitions:

Integration by:

Series (mathematics) :

Convergence tests:

Vector Calculus :

Theorems:

Multivariable Calculus : Formalisms:

Definitions:

Advanced Calculus :

Specialized Calculus :

Miscellanea:


Relationship with total derivative

The gradient, a vector field, holds an intimately close relationship with the total derivative , often referred to as the total differential $df$. In essence, they are transpose or, more precisely, dual to each other. This distinction is not merely semantic; it reflects a fundamental difference in the mathematical objects they represent.

Following the convention where vectors in $\mathbb{R}^{n}$ are represented by column vectors , and covectors (which are linear maps from $\mathbb{R}^{n}$ to $\mathbb{R}$) are represented by row vectors [^a], the gradient $\nabla f$ and the derivative $df$ are expressed with the exact same components, yet they are transposes of one another. To be explicit: The gradient at a point $p$ is given as a column vector: $$ \nabla f(p)={\begin{bmatrix}{\frac {\partial f}{\partial x_{1}}}(p)\\vdots \{\frac {\partial f}{\partial x_{n}}}(p)\end{bmatrix}}; $$ while the derivative at the same point $p$ is given as a row vector: $$ df_{p}={\begin{bmatrix}{\frac {\partial f}{\partial x_{1}}}(p)&\cdots &{\frac {\partial f}{\partial x_{n}}}(p)\end{bmatrix}}. $$ Despite sharing identical components, the crucial difference lies in the kind of mathematical object each represents. At any given point, the derivative $df_p$ is fundamentally a cotangent vector , or a linear form (sometimes referred to as a covector). Its role is to take a vector input and produce a scalar output, thereby expressing precisely how much the (scalar) output of the function changes for a given infinitesimal change in the (vector) input. It’s a “measuring stick” for change along a direction.

In stark contrast, at the same point, the gradient $\nabla f(p)$ is a tangent vector . It represents an infinitesimal change in the input vector space itself, pointing in a specific direction with a specific magnitude. In symbolic terms, the gradient $\nabla f(p)$ is an element of the tangent space at point $p$, denoted as $\nabla f(p)\in T_{p}\mathbb{R}^{n}$. The derivative $df_p$, however, is a map from this tangent space to the real numbers, expressed as $df_{p}\colon T_{p}\mathbb{R}^{n}\to \mathbb{R}$.

Now, for those who appreciate a touch of philosophical convenience, the tangent spaces at each point of $\mathbb{R}^{n}$ can be “naturally” identified [^e] with the vector space $\mathbb{R}^{n}$ itself. Similarly, the cotangent space at each point can be naturally identified with the dual vector space $(\mathbb{R}^{n})^{*}$ of covectors. This means that, in practice, the value of the gradient at a point can be intuitively thought of as a vector residing in the original $\mathbb{R}^{n}$ space, rather than strictly as an abstract tangent vector. This simplification, while convenient, should not obscure the underlying formal distinction.

From a computational perspective, the equivalence between these dual concepts becomes strikingly clear. Given a tangent vector $v$, one can multiply it by the derivative (treated as a matrix operation), and the result is precisely equivalent to taking the dot product with the gradient: $$ (df_{p})(v)={\begin{bmatrix}{\frac {\partial f}{\partial x_{1}}}(p)&\cdots &{\frac {\partial f}{\partial x_{n}}}(p)\end{bmatrix}}{\begin{bmatrix}v_{1}\\vdots \v_{n}\end{bmatrix}}=\sum {i=1}^{n}{\frac {\partial f}{\partial x{i}}}(p)v_{i}={\begin{bmatrix}{\frac {\partial f}{\partial x_{1}}}(p)\\vdots \{\frac {\partial f}{\partial x_{n}}}(p)\end{bmatrix}}\cdot {\begin{bmatrix}v_{1}\\vdots \v_{n}\end{bmatrix}}=\nabla f(p)\cdot v $$ This algebraic identity elegantly demonstrates how these seemingly distinct mathematical objects are deeply interconnected, providing different yet equivalent perspectives on the same fundamental concept of change.

Differential or (exterior) derivative

The concept of the “best linear approximation” is central to understanding derivatives in multivariable calculus. For a differentiable function $f:\mathbb{R}^{n}\to \mathbb{R}$ at a specific point $x$ within $\mathbb{R}^{n}$, this best linear approximation is itself a linear map that transforms vectors from $\mathbb{R}^{n}$ to $\mathbb{R}$. This map is frequently denoted as $df_{x}$ or $Df(x)$ and is formally referred to as the differential or the total derivative of $f$ at $x$. It essentially provides a linear “model” of the function’s behavior in the immediate vicinity of $x$.

The function $df$, which maps each point $x$ to its corresponding differential $df_{x}$, is known as the total differential or, in the more advanced language of differential geometry, the exterior derivative of $f$. This $df$ is a prime example of a differential 1-form , which is a smoothly varying assignment of a cotangent vector (a linear functional) to each point in the domain. It’s a field of linear approximations, if you will.

Much in the same way that the derivative of a single-variable function precisely quantifies the slope of the tangent line to the function’s graph at a given point [^7], the directional derivative of a function with several variables extends this idea. It represents the slope of the tangent hyperplane —a higher-dimensional generalization of a tangent line—in the specific direction of the chosen vector. It’s the multi-dimensional equivalent of “how steep is it if I walk this way?”

The fundamental relationship connecting the gradient and the differential is elegantly expressed by the formula: $$ (\nabla f){x}\cdot v=df{x}(v) $$ for any vector $v\in \mathbb{R}^{n}$. Here, $\cdot$ denotes the dot product . This formula states that taking the dot product of a vector with the gradient of $f$ at point $x$ is mathematically identical to applying the differential of $f$ at $x$ to that same vector. In essence, the gradient provides a vector representation of the differential, allowing us to compute directional derivatives via a simpler dot product.

If we conceptualize $\mathbb{R}^{n}$ as the space of $n$-dimensional column vectors (composed of real numbers), then we can conveniently regard the differential $df$ as a row vector whose components are the partial derivatives : $$ \left({\frac {\partial f}{\partial x_{1}}},\dots ,{\frac {\partial f}{\partial x_{n}}}\right) $$ In this representation, the application of the differential $df_{x}(v)$ is straightforwardly given by matrix multiplication of this row vector with the column vector $v$. Assuming the standard Euclidean metric on $\mathbb{R}^{n}$ (which provides the standard definition of dot product and vector length), the gradient is then simply the corresponding column vector, which is precisely the transpose of the row vector representing the differential: $$ (\nabla f){i}=df{i}^{\mathsf {T}} $$ This highlights their intrinsic duality, where one is merely the “flipped” version of the other in a coordinate-dependent representation.

Linear approximation to a function

The concept of the best linear approximation to a function is a cornerstone of multivariable calculus , offering a powerful way to understand and predict a function’s behavior in the immediate vicinity of a known point. The gradient of a function $f$ that maps from the Euclidean space $\mathbb{R}^{n}$ to $\mathbb{R}$—at any particular point $x_0$ in $\mathbb{R}^{n}$—serves to characterize this optimal linear approximation of $f$ around $x_0$.

The approximation itself is elegantly simple, yet profoundly useful: $$ f(x)\approx f(x_{0})+(\nabla f){x{0}}\cdot (x-x_{0}) $$ This formula holds true for values of $x$ that are sufficiently “close” to $x_0$. Here, $(\nabla f)_{x_0}$ represents the gradient of $f$ meticulously computed at the specific point $x_0$, and the dot symbol $\cdot$ signifies the standard dot product operation within $\mathbb{R}^{n}$.

Let’s dissect this. The term $f(x_0)$ is simply the function’s value at the known point. The second term, $(\nabla f){x_0}\cdot (x-x_0)$, represents the linear change in the function as you move from $x_0$ to $x$. Geometrically, this equation describes the tangent hyperplane to the graph of $f$ at the point $(x_0, f(x_0))$. The gradient $(\nabla f){x_0}$ dictates the slope and orientation of this hyperplane. For small displacements $(x-x_0)$, this tangent hyperplane provides an excellent estimate for the actual value of the function $f(x)$.

This equation is not merely an approximation; it is precisely equivalent to the first two terms of the multivariable Taylor series expansion of $f$ around the point $x_0$. The Taylor series provides a polynomial approximation of a function, and the linear approximation is simply its lowest-order, or first-order, manifestation. This fundamental relationship underscores the gradient’s role in local analysis and its utility in fields ranging from numerical methods to optimization. It’s the mathematical equivalent of knowing the immediate terrain to predict your next few steps.

Relationship with Fréchet derivative

For those who venture beyond the confines of basic Euclidean spaces, the Fréchet derivative offers the most expansive and rigorous definition of a derivative for functions between Banach spaces . Let $U$ be an open set in $\mathbb{R}^{n}$. If the function $f: U \to \mathbb{R}$ is differentiable in this more general sense, then its differential $df$ is precisely the Fréchet derivative of $f$.

Consequently, the gradient $\nabla f$ emerges as a function mapping from $U$ to $\mathbb{R}^{n}$, satisfying a stringent limit definition: $$ \lim _{h\to 0}{\frac {|f(x+h)-f(x)-\nabla f(x)\cdot h|}{|h|}}=0 $$ Here, $\cdot$ denotes the dot product , and $|h|$ is the Euclidean norm (length) of the vector $h$. This limit condition effectively states that the difference between the actual change in $f$ ($f(x+h)-f(x)$) and its linear approximation ($\nabla f(x)\cdot h$) must vanish faster than the magnitude of the displacement $h$ itself. This rigorous definition ensures that the linear approximation provided by the gradient is truly the “best” possible in a local sense.

As a direct consequence of this profound relationship, the familiar properties of the derivative, which one might take for granted, extend gracefully to the gradient. It is crucial to remember, however, that the gradient itself is not the derivative, but rather its dual, a vector representation derived from the derivative.

Linearity : The gradient exhibits linearity, a property that makes computations wonderfully predictable. If $f$ and $g$ are two real-valued functions, both differentiable at a point $a \in \mathbb{R}^{n}$, and $\alpha$ and $\beta$ are any two scalar constants, then the linear combination $\alpha f + \beta g$ is also differentiable at $a$. More importantly, its gradient follows a simple additive rule: $$ \nabla \left(\alpha f+\beta g\right)(a)=\alpha \nabla f(a)+\beta \nabla g(a) $$ This means you can differentiate sums and scalar multiples of functions independently, a convenience that should not be underestimated.

Product rule : For functions that are products of other functions, the product rule dictates their derivative behavior. If $f$ and $g$ are real-valued functions, both differentiable at a point $a \in \mathbb{R}^{n}$, then their product $fg$ is also differentiable at $a$. The gradient of this product is given by: $$ \nabla (fg)(a)=f(a)\nabla g(a)+g(a)\nabla f(a) $$ This rule ensures that the rate of change of a product is correctly accounted for by considering the change in each factor weighted by the other.

Chain rule : The chain rule is arguably one of the most powerful tools in calculus, allowing us to differentiate composite functions. In the context of gradients, it manifests in several crucial forms.

First, consider a scenario where a scalar-valued function $f: A \to \mathbb{R}$ (defined on a subset $A$ of $\mathbb{R}^{n}$) is composed with a parametric curve $g: I \to \mathbb{R}^{n}$, where $I \subset \mathbb{R}$ is a subset of the real numbers. If $f$ is differentiable at a point $a$ and $g$ is differentiable at a point $c \in I$ such that $g(c) = a$, then the derivative of the composite function $(f \circ g)$ with respect to its single variable $c$ is given by: $$ (f\circ g)’(c)=\nabla f(a)\cdot g’(c) $$ Here, $\circ$ is the composition operator , meaning $(f \circ g)(x) = f(g(x))$. This form tells us that the rate of change of $f$ along the curve $g$ is the dot product of $f$’s gradient at $a$ with the tangent vector of the curve $g$ at $c$.

More generally, if the domain $I$ is instead a subset of $\mathbb{R}^{k}$ (meaning $g$ is a vector-valued function of several variables, $g: I \to \mathbb{R}^{n}$), then the chain rule for the gradient of the composite function $\nabla (f \circ g)$ is expressed as: $$ \nabla (f\circ g)(c)={\big (}Dg(c){\big )}^{\mathsf {T}}{\big (}\nabla f(a){\big )} $$ In this sophisticated form, $Dg(c)$ represents the Jacobian matrix of $g$ evaluated at $c$, and $(Dg(c))^{\mathsf{T}}$ denotes its transpose. This rule effectively combines the rates of change of all input variables through the Jacobian with the gradient of the outer function.

For the second primary form of the chain rule, imagine a real-valued function $h: I \to \mathbb{R}$ defined on a subset $I$ of $\mathbb{R}$, and suppose $h$ is differentiable at the point $f(a) \in I$. Then, the gradient of the composite function $(h \circ f)$ is given by: $$ \nabla (h\circ f)(a)=h’{\big (}f(a){\big )}\nabla f(a) $$ This form is particularly useful when you have a scalar function $f$ that feeds into another single-variable scalar function $h$. It states that the gradient of the composite is simply the derivative of $h$ with respect to its input (which is $f(a)$), scaled by the gradient of $f$ itself.

These properties, stemming from the fundamental definition of the Fréchet derivative, ensure that the gradient is a consistent and powerful tool for analyzing functions across diverse mathematical structures.

Further properties and applications

Level sets

For those seeking to understand the geometric implications of the gradient, its relationship with level sets is profoundly illuminating. A level surface, more generally termed an isosurface in three dimensions (or simply a level set in any dimension), is defined as the collection of all points within a function’s domain where that function yields a specific, constant value. Imagine contour lines on a topographical map; each line represents a level set of constant altitude.

If the function $f$ is differentiable , then as we’ve already established, the dot product $(\nabla f)_{x} \cdot v$ of the gradient at a point $x$ with an arbitrary vector $v$ provides the directional derivative of $f$ at $x$ in the direction of $v$. This means it tells you how much the function changes if you move in that particular direction.

From this, a crucial property emerges: the gradient of $f$ is always orthogonal (perpendicular) to the level sets of $f$. Why is this so? Consider moving along a level set. By definition, the function’s value remains constant along this path. Therefore, the directional derivative of $f$ in any direction tangent to the level set must be zero. Since the dot product of two vectors is zero if and only if they are orthogonal (assuming neither is the zero vector), it logically follows that the gradient, which represents the direction of greatest change, must be perpendicular to any direction of no change on the level set.

For a concrete illustration, consider a level surface in three-dimensional space, which can be described by an equation of the form $F(x, y, z) = c$, where $c$ is a constant. The gradient of $F$, denoted $\nabla F$, will then be a vector that is precisely normal to this surface at every point. This property is immensely useful in fields like computer graphics for calculating surface normals, or in physics for understanding equipotential surfaces.

This concept generalizes beautifully. Any embedded hypersurface within a Riemannian manifold —a more complex space where distances and angles are defined smoothly—can be characterized by an equation $F(P) = 0$, provided that its differential $dF$ is nowhere zero (ensuring it’s a “smooth” surface). In such a sophisticated setting, the gradient of $F$ will consistently serve as the normal vector to the hypersurface.

Even in the realm of affine algebraic hypersurfaces , defined by polynomial equations $F(x_1, \dots, x_n) = 0$, the gradient retains its significance. At what is known as a singular point of the hypersurface, the gradient of $F$ will be the zero vector (this is, in fact, the definition of a singular point in this context). However, at any non-singular point, the gradient will unfailingly provide a nonzero normal vector to the hypersurface. The gradient, it seems, is an indispensable guide to the topography of mathematical spaces.

Conservative vector fields and the gradient theorem

The gradient of a scalar function, when viewed across its entire domain, produces what is known as a gradient field. This type of vector field possesses a particularly important property: a (continuous) gradient field is always a conservative vector field . This is not a mere coincidence but a fundamental characteristic with significant implications in physics and engineering.

What does it mean for a vector field to be conservative? It implies that the line integral of the field along any path connecting two points depends solely on the starting and ending points of that path, and is entirely independent of the specific route taken between them. This path independence is a hallmark of conservative fields, often associated with forces that do no net work over a closed loop. For such fields, the line integral can be elegantly evaluated using the gradient theorem , which is essentially the fundamental theorem of calculus for line integrals extended to multiple dimensions. This theorem states that the line integral of a gradient field from point A to point B is simply the difference in the scalar function’s value at B and A.

Conversely, and this completes the elegant duality, any (continuous) conservative vector field can always be expressed as the gradient of some scalar function. This scalar function is often referred to as a “potential function” or “scalar potential.” The existence of such a potential function simplifies many problems in physics, particularly in electrostatics and fluid dynamics, where forces or velocities can be derived from a scalar field rather than dealing with the complexities of vector integration directly. The gradient, therefore, serves as the bridge between scalar potentials and the vector fields they generate, providing a powerful framework for understanding fundamental forces and flows.

Gradient is direction of steepest ascent

It’s a foundational truth in multivariable calculus , one that the gradient itself seems to declare with unwavering certainty: the gradient of a function $f\colon \mathbb{R}^{n}\to \mathbb{R}$ at any given point $x$ is, without exception, the direction of its steepest ascent. This means it’s the direction in which the function’s value increases most rapidly. Put another way, the gradient vector at a point maximizes the function’s directional derivative at that point.

To understand why this is rigorously true, let $v\in \mathbb{R}^{n}$ be an arbitrary unit vector , meaning its magnitude is exactly 1. The directional derivative of $f$ in the direction of $v$ at point $x$ is defined as: $$ \nabla _{v}f(x)=\lim _{h\rightarrow 0}{\frac {f(x+vh)-f(x)}{h}} $$ This expression represents the instantaneous rate of change of $f$ as we move from $x$ in the direction of $v$. Now, let’s substitute the function $f(x+vh)$ with its Taylor series expansion around $x$. For a differentiable function, the first-order Taylor expansion is $f(x+vh) = f(x) + \nabla f(x) \cdot (vh) + R$, where $R$ denotes higher-order terms that become negligible as $h \to 0$. Plugging this into the directional derivative definition: $$ \nabla _{v}f(x)=\lim _{h\rightarrow 0}{\frac {(f(x)+\nabla f\cdot vh+R)-f(x)}{h}} $$ The $f(x)$ terms cancel out. Dividing the remaining terms by $h$ and then taking the limit as $h$ approaches zero, the higher-order remainder term $R/h$ also vanishes. This leaves us with: $$ \nabla _{v}f(x) = \nabla f \cdot v $$ This crucial result shows that the directional derivative in any direction $v$ is simply the dot product of the gradient with that direction vector.

Now, to find the maximum value of this directional derivative, we turn to the Cauchy–Schwarz inequality [^8]. This inequality states that for any two vectors, the absolute value of their dot product is less than or equal to the product of their magnitudes. Applying this to our situation: $$ |\nabla _{v}f(x)|=|\nabla f\cdot v|\leq |\nabla f||v| $$ Since $v$ is a unit vector , its magnitude $|v|=1$. Therefore, the inequality simplifies to: $$ |\nabla _{v}f(x)|\leq |\nabla f| $$ This tells us that the absolute value of the directional derivative can never exceed the magnitude of the gradient itself. The maximum possible value for the directional derivative is thus $|\nabla f|$.

To achieve this maximum, we need to choose the unit vector $v$ such that the equality in the Cauchy-Schwarz inequality holds. This occurs when $v$ is in the exact same direction as $\nabla f$. Thus, the unit vector that maximizes the directional derivative is: $$ v^{}=\nabla f/|\nabla f| $$ This vector $v^$ is simply the normalized gradient vector. When we substitute this specific $v^$ back into our directional derivative formula: $$ |\nabla _{v^{}}f(x)|=|\nabla f\cdot (\nabla f/|\nabla f|)| = |(\nabla f)^{2}/|\nabla f||=|\nabla f| $$ This confirms that the maximum possible value of the directional derivative is indeed the magnitude of the gradient, and it occurs precisely in the direction of the gradient. The gradient is, therefore, the unequivocal compass pointing towards the steepest ascent, its length quantifying the intensity of that climb.

Generalizations

The concept of the derivative, and by extension the gradient, is far too useful to be confined to simple scalar functions in Euclidean space. Mathematics, in its relentless pursuit of generality, has extended these ideas to more complex functions and spaces, leading to powerful generalizations that underpin much of modern physics and advanced engineering.

Jacobian

When dealing with functions that map from one Euclidean space to another, rather than just to the real numbers, the gradient evolves into a more sophisticated entity: the Jacobian matrix . This matrix stands as the generalization of the gradient for vector-valued functions of several variables and for differentiable maps between different Euclidean spaces or, indeed, more abstract manifolds . [^9] [^10] For those who truly wish to push the boundaries, a further generalization exists for functions defined between Banach spaces , known as the Fréchet derivative .

Suppose we have a function $f: \mathbb{R}^{n} \to \mathbb{R}^{m}$, meaning it takes $n$ input variables and produces $m$ output variables. Let’s also assume that each of its first-order partial derivatives exists everywhere on $\mathbb{R}^{n}$. In this scenario, the Jacobian matrix of $f$ is defined as an $m \times n$ matrix, typically denoted by $\mathbf{J}_{\mathbb{f}}(\mathbb{x})$ or simply $\mathbf{J}$.

The entry in the $(i, j)$-th position of this matrix, denoted $\mathbf{J}{ij}$, is given by: $$ \mathbf {J} {ij}={\frac {\partial f{i}}{\partial x{j}}} $$ Here, $f_i$ refers to the $i$-th component function of the vector-valued function $f$, and $x_j$ is the $j$-th input variable. Explicitly, the Jacobian matrix takes the form: $$ \mathbf {J} ={\begin{bmatrix}{\dfrac {\partial \mathbf {f} }{\partial x_{1}}}&\cdots &{\dfrac {\partial \mathbf {f} }{\partial x_{n}}}\end{bmatrix}}={\begin{bmatrix}\nabla ^{\mathsf {T}}f_{1}\\vdots \\nabla ^{\mathsf {T}}f_{m}\end{bmatrix}}={\begin{bmatrix}{\dfrac {\partial f_{1}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{1}}{\partial x_{n}}}\\vdots &\ddots &\vdots \{\dfrac {\partial f_{m}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{m}}{\partial x_{n}}}\end{bmatrix}} $$ Notice a critical insight here: each row of the Jacobian matrix is, in fact, the transpose of the gradient of one of the component functions of $f$. That is, the $i$-th row is $\nabla^{\mathsf{T}} f_i$. So, while the gradient of a scalar function gives a vector, the Jacobian of a vector function provides a matrix where each row is essentially a “gradient-like” entity for one of the output dimensions. It’s a comprehensive map of all possible rates of change, showing how each output component varies with respect to each input component.

Gradient of a vector field

The concept of a gradient doesn’t stop at scalar functions. It can be extended to vector fields themselves, though the resulting entity is far more complex than a simple vector. When we take the total derivative of a vector field, the outcome is not another vector, but rather a linear mapping that transforms vectors into other vectors. This type of object, which describes how vectors change and rotate in space, is known as a tensor .

In the relatively familiar context of rectangular coordinates, the gradient of a vector field $\mathbf{f} = (f^1, f^2, f^3)$ is defined as: $$ \nabla \mathbf {f} =g^{jk}{\frac {\partial f^{i}}{\partial x^{j}}}\mathbf {e} _{i}\otimes \mathbf {e} _{k} $$ Here, the Einstein summation notation is implicitly used (summing over repeated indices $j, k, i$). The term $\mathbf{e}_i \otimes \mathbf{e}_k$ represents the tensor product of the basis vectors $\mathbf{e}_i$ and $\mathbf{e}_k$, which forms a dyadic tensor of type (2,0). This tensor essentially captures all the partial derivatives of all the components of the vector field with respect to all the coordinate variables.

Interestingly, this entire expression is equivalent to the transpose of the Jacobian matrix that would be formed by taking the derivatives of the vector field’s components: $$ {\frac {\partial f^{i}}{\partial x^{j}}}={\frac {\partial (f^{1},f^{2},f^{3})}{\partial (x^{1},x^{2},x^{3})}} $$ This means that the gradient of a vector field tells us not just how each component of the field changes, but how the entire vector changes in response to spatial shifts, including stretching, shrinking, and rotation.

When we move beyond flat rectangular coordinates to curvilinear coordinates , or more generally, to a curved manifold , the situation becomes even more intricate. The gradient of a vector field must then account for the curvature of the coordinate system itself, incorporating Christoffel symbols : $$ \nabla \mathbf {f} =g^{jk}\left({\frac {\partial f^{i}}{\partial x^{j}}}+{\Gamma ^{i}}_{jl}f^{l}\right)\mathbf {e} _{i}\otimes \mathbf {e} _{k} $$ In this formula, $g^{jk}$ are the components of the inverse metric tensor , which still defines the geometry, and the $\mathbf{e}i$ are the coordinate basis vectors. The new terms, ${\Gamma ^{i}}{jl}$, are the Christoffel symbols, which arise specifically to correct for the fact that basis vectors in curvilinear systems change direction from point to point. They ensure that the derivative of a vector field correctly reflects the intrinsic change in the vector, rather than just the change due to the chosen coordinate system’s orientation.

Expressed in a more invariantly, without reference to specific coordinates, the gradient of a vector field $\mathbf{f}$ can be defined using the Levi-Civita connection and the metric tensor [^11]: $$ \nabla ^{a}f^{b}=g^{ac}\nabla _{c}f^{b} $$ Here, $\nabla_c$ represents the covariant derivative associated with the Levi-Civita connection, which provides a way to differentiate vector fields in a manner that is independent of the coordinate system. This definition underscores the deep geometric nature of the gradient of a vector field, positioning it as a fundamental tensor operation.

Riemannian manifolds

The ultimate generalization of the gradient takes us into the sophisticated realm of Riemannian manifolds . A Riemannian manifold $(M, g)$ is essentially a smooth space where every point is equipped with an inner product (the metric $g$) on its tangent space, allowing us to measure lengths of tangent vectors and angles between them. This provides a framework for doing calculus on curved spaces.

For any smooth function $f$ defined on such a Riemannian manifold $(M, g)$, the gradient of $f$, denoted $\nabla f$, is defined as the unique vector field that satisfies a specific condition. For any arbitrary vector field $X$ on the manifold, the inner product of the gradient of $f$ with $X$ must be equal to the directional derivative of $f$ in the direction of $X$: $$ g(\nabla f,X)=\partial {X}f $$ To be more precise, at any given point $x \in M$, the inner product $g_x$ (which is the metric tensor evaluated at $x$) applied to the gradient of $f$ at $x$ and the vector field $X$ at $x$ yields the directional derivative of $f$ in the direction of $X$, evaluated at $x$: $$ g{x}{\big (}(\nabla f){x},X{x}{\big )}=(\partial {X}f)(x) $$ Here, $\partial_X f$ is the function that assigns to each point $x \in M$ the value of the directional derivative of $f$ in the direction of $X$, computed at $x$. In a specific coordinate chart $\varphi$ that maps an open subset of $M$ to an open subset of $\mathbb{R}^{n}$, $(\partial_X f)(x)$ can be explicitly written as: $$ \sum {j=1}^{n}X^{j}{\big (}\varphi (x){\big )}{\frac {\partial }{\partial x{j}}}(f\circ \varphi ^{-1}){\Bigg |}{\varphi (x)} $$ where $X^j$ denotes the $j$-th component of the vector field $X$ in this coordinate chart. This formula essentially projects the problem back into Euclidean space within the chart, calculates the directional derivative, and then relates it back to the manifold.

The local form of the gradient within such a coordinate system takes on a familiar structure, but with a crucial reliance on the inverse metric tensor : $$ \nabla f=g^{ik}{\frac {\partial f}{\partial x^{k}}}{\textbf {e}}{i} $$ Here, $g^{ik}$ are the components of the inverse metric tensor, which effectively “raises” the index of the partial derivatives (which are components of the 1-form $df$) to convert them into the components of a vector field. The ${\textbf {e}}{i}$ are the local basis vectors.

Just as in Euclidean space, the gradient of a function on a Riemannian manifold is intimately connected to its exterior derivative . The directional derivative of $f$ in the direction of $X$ at $x$ is equivalent to applying the differential of $f$ at $x$ to the vector $X$ at $x$: $$ (\partial {X}f)(x)=(df){x}(X_{x}) $$ More precisely, the gradient $\nabla f$ is the vector field that is associated with the differential 1-form $df$ through a profound geometric operation known as the musical isomorphism . This isomorphism, often denoted $\sharp = \sharp^g \colon T^*M \to TM$ (called “sharp”), is defined by the metric $g$. It acts as a bridge, transforming a cotangent vector (like $df$) into a tangent vector (like $\nabla f$) by using the metric to define a canonical correspondence between the dual spaces.

The relationship between the exterior derivative and the gradient of a function on $\mathbb{R}^{n}$ that we explored earlier is merely a special, simplified case of this grander picture. In $\mathbb{R}^{n}$, the metric is the flat Euclidean metric given by the dot product , which simplifies the musical isomorphism considerably. Thus, the gradient in Euclidean space is but a shadow of its true, manifold-dwelling form.

See also

For those who find themselves with an insatiable curiosity for the interconnectedness of mathematical concepts, the following related topics might offer further avenues for exploration.

  • Wikimedia Commons has media related to Gradient fields.
  • Curl – A vector operator that characterizes the infinitesimal rotation of a 3D vector field, measuring its circulation density.
  • Divergence – Another vector operator that quantifies the magnitude of a vector field’s source or sink at a given point, measuring its flux density.
  • Four-gradient – The relativistic analogue of the gradient operation, extending its concept to four-dimensional spacetime in special and general relativity .
  • Hessian matrix – A square matrix of second-order partial derivatives of a scalar-valued function, crucial for analyzing local extrema and concavity in multivariable calculus.
  • Skew gradient – A less common variant of the gradient, typically found in specialized areas of continuum mechanics or fluid dynamics.
  • Spatial gradient – A gradient whose components are exclusively spatial derivatives, often used when a function also depends on time but only its spatial variation is of interest.

Notes

[^a] This article uses the convention that column vectors represent vectors, and row vectors represent covectors, but the opposite convention is also common. [^b] Strictly speaking, the gradient is a vector field $f\colon \mathbb{R}^{n}\to T\mathbb{R}^{n}$, and the value of the gradient at a point is a tangent vector in the tangent space at that point, $T_{p}\mathbb{R}^{n}$, not a vector in the original space $\mathbb{R}^{n}$. However, all the tangent spaces can be naturally identified with the original space $\mathbb{R}^{n}$, so these do not need to be distinguished; see § Generalizations and relationship with the derivative. [^c] The value of the gradient at a point can be thought of as a vector in the original space $\mathbb{R}^{n}$, while the value of the derivative at a point can be thought of as a covector on the original space: a linear map $\mathbb{R}^{n}\to \mathbb{R}$. [^d] The dot product (the slope of the road around the hill) would be 40% if the degree between the road and the steepest slope is 0°, i.e. when they are completely aligned, and flat when the degree is 90°, i.e. when the road is perpendicular to the steepest slope. [^e] Informally, “naturally” identified means that this can be done without making any arbitrary choices. This can be formalized with a natural transformation .