- 1. Overview
- 2. Etymology
- 3. Cultural Impact
• Part of a series on Machine learning and data mining
Paradigms
Problems
• AutoML
Supervised learning ( classification • regression )
• Bagging
• Boosting
• k -NN
• Relevance vector machine (RVM)
• Support vector machine (SVM)
• BIRCH
• CURE
• k -means
• Fuzzy
• Expectation–maximization (EM)
• DBSCAN
• OPTICS
• CCA
• ICA
• LDA
• NMF
• PCA
• PGD
• t-SNE
• SDL
• RANSAC
• k -NN
• LSTM
• GRU
• ESN
• GAN
• SOM
• Convolutional neural network
• U-Net
• LeNet
• AlexNet
• Physics-informed neural networks
• Vision
• Mamba
• Electrochemical RAM (ECRAM)
• SARSA
Learning with humans
• Mechanistic interpretability
• RLHF
Model diagnostics
• Coefficient of determination
Mathematical foundations
• Computational learning theory
Journals and conferences
• AAAI
• NeurIPS
• ICML
• ICLR
• IJCAI
• ML
• JMLR
Related articles
• Glossary of artificial intelligence
• List of datasets for machine-learning research
• List of datasets in computer vision and image processing
•
• v • t • e
The Proper Generalized Decomposition (PGD) is an iterative numerical method — because, apparently, we can’t solve everything directly, can we? — designed for the efficient resolution of boundary value problems (BVPs). These are the sort of challenges where partial differential equations (PDEs), such as the ubiquitous Poisson’s equation or the deceptively simple Laplace’s equation , are constrained by a particular set of boundary conditions. Essentially, it’s a way to find solutions for complex systems that are fixed at their edges, like trying to predict the exact sag of a trampoline given how tightly it’s anchored.
The PGD algorithm operates through a process of successive enrichment. This isn’t some self-help mantra; it means that with each computational cycle, a new component, often referred to as a “mode,” is meticulously calculated and then added to the existing approximation of the solution. In theory, or so they claim, the more modes you manage to extract, the closer your approximation gets to the elusive theoretical solution. One might wonder if “closer” is always “better,” but that’s a philosophical debate for another time. Notably, and perhaps a point of contention for those who prefer neatness, PGD modes are not inherently orthogonal to one another, distinguishing them from the principal components derived through methods like Proper Orthogonal Decomposition (POD). This non-orthogonality simply means they don’t necessarily represent independent directions in the solution space, which is fine if you don’t mind a bit of overlap.
By judiciously selecting only the most pertinent PGD modes, one can construct a reduced order model of the full solution. Given this inherent ability to distil complex, high-dimensional problems into more manageable forms, PGD is quite rightly categorized as a dimensionality reduction algorithm. It’s essentially a sophisticated way of saying, “Let’s ignore the noise and focus on what actually matters,” which, frankly, is a skill many humans could benefit from.
Description
The Proper Generalized Decomposition method is characterized by a precise set of foundational elements, without which it would likely collapse into a chaotic mess. These defining features include:
- A specific variational formulation of the problem at hand. Because, naturally, you can’t just dive in without a proper framework, can you?
- A systematic discretization of the problem’s domain , often implemented in a manner reminiscent of the widely accepted finite element method . Precision is key, or so they say.
- The rather bold assumption that the solution can, in fact, be approximated as a separate representation. A convenient simplification, if it holds true.
- The application of a particular numerical greedy algorithm to methodically unearth the solution. One step at a time, for those who appreciate a relentless approach.
Variational formulation
In the realm of the Proper Generalized Decomposition method, the concept of a variational formulation isn’t merely academic jargon; it’s the fundamental principle that translates a seemingly intractable problem into a format where its solution can be approximated by either minimizing (or sometimes maximizing) a specific functional . For those unfamiliar with the term, a functional is essentially a function that takes another function as its input and spits out a scalar value. In this context, that input function represents our problem’s solution, and we’re trying to find the one that yields the optimal scalar output. It’s an indirect route, certainly, but often the only viable one.
The Bubnov-Galerkin method stands out as the most frequently implemented variational formulation within the PGD framework. It’s chosen, presumably, for its dependable ability to furnish an approximate solution for complex challenges, particularly those articulated through partial differential equations (PDEs). The core idea behind the Bubnov-Galerkin approach is to project the original problem onto a smaller, more manageable space, which is thoughtfully spanned by a finite collection of carefully selected basis functions . These basis functions are not chosen at random; they are specifically designed to mimic, as closely as possible, the characteristics of the problem’s actual solution space. This isn’t about solving the differential equations directly, a task often fraught with computational peril. Instead, the method seeks an approximate solution that satisfies the integral form of the PDEs across the problem’s domain . By doing so, the intricate differential problem is transformed into a search for coefficients that best satisfy this integral equation within the chosen function space. A clever workaround, if you’re into that sort of thing.
While the Bubnov-Galerkin method enjoys widespread adoption, it’s not the only game in town. Other variational formulations are occasionally employed in PGD, their selection contingent upon the specific demands and inherent characteristics of the problem at hand. These alternatives offer different trade-offs in terms of stability, accuracy, and computational overhead:
- Petrov-Galerkin Method : This method shares a familial resemblance with the Bubnov-Galerkin approach , yet it introduces a crucial distinction in the selection of its test functions. In Petrov-Galerkin , the test functions—those used to project the residual of the differential equation—are deliberately chosen to be different from the trial functions, which approximate the solution itself. This strategic divergence can, for certain types of problems, lead to enhanced stability and improved accuracy. It’s a subtle tweak that can yield significant benefits when dealing with particularly stubborn equations.
- Collocation Method : In contrast to integral-based methods, collocation methods take a more direct route. Here, the differential equation is required to be satisfied precisely at a finite, pre-determined number of points within the problem’s domain , aptly named collocation points. This approach can indeed appear simpler and more straightforward than its integral counterparts, but this simplicity often comes at a cost, potentially offering less stability for certain problem classes. It’s a trade-off: ease of implementation versus robust performance.
- Least Squares Method : This approach, as its name suggests, involves the minimization of the squared residual of the differential equation over the entire domain . It proves particularly valuable when confronted with problems where traditional methods might struggle with issues of stability or convergence. By focusing on minimizing errors in a squared sense, it often provides a more robust, if sometimes less elegant, path to a solution.
- Mixed Finite Element Method : For problems involving multiple interacting physical quantities, mixed methods introduce additional variables—such as fluxes or gradients—which are then approximated concurrently with the primary variable of interest. This simultaneous approximation can lead to solutions that are both more accurate and more stable, especially for complex scenarios governed by principles of incompressibility or conservation laws. It’s a more holistic approach, treating interconnected variables as part of a unified system.
- Discontinuous Galerkin Method : A specialized variant of the broader Galerkin method , the Discontinuous Galerkin Method permits the solution to exhibit discontinuities across the boundaries of computational elements. This characteristic makes it exceptionally well-suited for problems that feature sharp gradients, shocks, or other forms of inherent discontinuities, where traditional continuous methods would falter. It embraces the messiness, rather than trying to smooth it over.
Domain discretization
The discretization of the domain is not some vague concept; it is a meticulously defined sequence of procedures absolutely essential for any numerical simulation worth its salt. This intricate process encompasses three critical stages: (a) the precise creation of finite element meshes , which carve the continuous physical space into a collection of smaller, manageable units; (b) the rigorous definition of basis functions (often referred to, rather poetically, as shape functions) on these idealized reference elements; and (c) the systematic mapping of these abstract reference elements onto the specific, geometric elements of the mesh. It’s a foundational step, ensuring that the continuous world can be approximated and computed in a discrete, digital fashion.
Separate representation
PGD operates on a rather ingenious, if sometimes audacious, premise: it assumes that the solution u of a multi-dimensional problem—the kind that typically makes other methods weep in despair—can be approximated as a separate representation. This means the solution u is not treated as a monolithic, indivisible entity but rather as a sum of products, where each product term is itself a multiplication of functions, each depending on only a single variable (or a distinct set of variables). Mathematically, this elegant decomposition takes the form:
u
≈
u
N
(
x
1
,
x
2
, … ,
x
d
)
∑
i
1
N
X
1
i
(
x
1
) ⋅
X
2
i
(
x
2
) ⋯
X
d
i
(
x
d
) ,
{\displaystyle \mathbf {u} \approx \mathbf {u} ^{N}(x_{1},x_{2},\ldots ,x_{d})=\sum {i=1}^{N}\mathbf {X{1}} {i}(x{1})\cdot \mathbf {X_{2}} {i}(x{2})\cdots \mathbf {X_{d}} {i}(x{d}),}
Here, u^N represents the approximation of the true solution u. The magic lies in N, the number of addends (or modes), and the individual functional products X_1(x_1), X_2(x_2), …, X_d(x_d). Each X_j is a function solely of its corresponding variable x_j. The crucial, and perhaps slightly inconvenient, detail is that both the optimal number of addends N and the specific forms of these functional products X_j are not known in advance. The algorithm must discover them through its iterative process, which, if you think about it, is a rather demanding task.
Greedy algorithm
The solution to the PGD problem is not simply conjured; it is systematically sought through the application of a greedy algorithm . In most practical implementations, this means employing a fixed point algorithm to iteratively solve the weak formulation of the underlying problem. It’s a method that, true to its name, grabs what it can at each step, hoping it leads to the global optimum.
For every iteration i of this algorithm, a new “mode” of the solution is painstakingly computed. Each of these modes is not a single value, but rather a complete set of numerical values corresponding to the functional products X_1(x_1), …, X_d(x_d). These newly calculated components are then integrated into the existing approximation, a process often described as “enriching” the solution. Now, the term “enrich” is used here with a very specific, and perhaps slightly ironic, nuance, rather than the more optimistic “improve.” This distinction is critical because, due to the inherent greedy nature
of the algorithm, it is entirely possible that some newly computed modes might not, in fact, enhance the overall accuracy of the approximation but could, in certain circumstances, actually worsen it. It’s a calculated risk, a testament to the fact that even sophisticated algorithms can sometimes take a step backward before moving forward. The ultimate number of computed modes required to achieve an approximation that falls below a specified error threshold is entirely dictated by the stopping criterion
implemented within the iterative algorithm itself. One can only hope that criterion is well-chosen, or you’ll be here all day.
Features
PGD truly shines when confronted with high-dimensional problems , the sort that typically cause classical numerical approaches to buckle under their own weight. Its primary advantage lies in its ability to circumvent the notorious curse of dimensionality . This isn’t some ancient hex; it’s the exponential increase in computational complexity that arises as the number of dimensions or variables in a problem grows. By transforming a single, monolithic multi-dimensional problem into a series of decoupled, lower-dimensional sub-problems, PGD dramatically reduces the computational expense. Solving these isolated, simpler problems is, quite frankly, far less resource-intensive than attempting to tackle the original, daunting multidimensional behemoth directly.
Furthermore, PGD exhibits a remarkable adaptability, allowing it to re-frame parametric problems—those where the solution depends on various input parameters—into an expanded multidimensional framework. This is achieved by ingeniously treating the problem’s parameters as if they were additional spatial coordinates. The solution, in this expanded context, then takes on a slightly modified, yet equally elegant, separate representation:
u
≈
u
N
(
x
1
, … ,
x
d
;
k
1
, … ,
k
p
)
∑
i
1
N
X
1
i
(
x
1
) ⋯
X
d
i
(
x
d
) ⋅
K
1
i
(
k
1
) ⋯
K
p
i
(
k
p
) ,
{\displaystyle \mathbf {u} \approx \mathbf {u} ^{N}(x_{1},\ldots ,x_{d};k_{1},\ldots ,k_{p})=\sum {i=1}^{N}\mathbf {X{1}} {i}(x{1})\cdots \mathbf {X_{d}} {i}(x{d})\cdot \mathbf {K_{1}} {i}(k{1})\cdots \mathbf {K_{p}} {i}(k{p}),}
In this extended formulation, a new series of functional products, K_1(k_1), K_2(k_2), …, K_p(k_p), has been seamlessly incorporated into the equation. Each K_j now depends exclusively on its corresponding parameter k_j. The profound implication of this approach is that the resulting approximation of the solution is not just a single answer, but rather a comprehensive computational vademecum
. This ‘vademecum’ is, in essence, a generalized meta-model that encapsulates all possible particular solutions for every conceivable value of the involved parameters. It’s like having an entire library of solutions at your fingertips, rather than having to re-solve the problem for each new parameter setting. A rather efficient way to deal with variability, wouldn’t you agree?
Sparse Subspace Learning
This section, apparently, has been deemed confusing or unclear to some readers. One might infer that not everyone possesses the cognitive fortitude for such intricate concepts, but let’s endeavor to clarify, shall we?
The Sparse Subspace Learning (SSL) method represents a sophisticated approach that capitalizes on the strategic deployment of hierarchical collocation techniques. Its primary objective is to arrive at a numerical approximation for parametric models, particularly those that are often computationally expensive or complex to analyze through traditional means. What sets SSL apart from conventional projection-based reduced order modeling (ROM) techniques is its ingenious use of a collocation framework. This enables a decidedly non-intrusive approach, meaning it does not require direct modification of the underlying simulation code, which is a significant advantage when dealing with proprietary or legacy software. This non-intrusiveness is achieved through sparse adaptive sampling of the parametric space. Instead of brute-force evaluation across all parameter combinations, SSL intelligently selects a minimal, yet representative, set of points.
Through this sparse sampling, the method effectively recovers the underlying low-dimensional structure of the parametric solution subspace. More impressively, it simultaneously learns the explicit functional dependency of the solution on the parameters themselves. This is not merely an approximation; it’s a deep understanding of how the output changes with respect to input variations. A sparse low-rank approximate tensor representation of the parametric solution is then incrementally constructed. This construction process is particularly efficient, as it only necessitates access to the output of a deterministic solver. There’s no need to peer into the solver’s internal mechanics; merely providing inputs and observing outputs is sufficient. This inherent non-intrusiveness renders the SSL approach remarkably adaptable and straightforwardly applicable to notoriously challenging problems. These include systems characterized by strong nonlinearity or those possessing non-affine weak forms, where other methods often stumble. In essence, SSL offers a pragmatic and powerful way to understand and predict complex system behaviors without getting bogged down in every minute detail.