- 1. Overview
- 2. Etymology
- 3. Cultural Impact
For other uses, see Probability (disambiguation) .
Probability
- v
- t
- e
Part of a series on statistics Probability theory
Probability
Probability space
- Sample space
- Event
- Collectively exhaustive events
- Elementary event
- Mutual exclusivity
- Outcome
- Singleton
- Experiment
- Bernoulli trial
Probability distribution
- Bernoulli distribution
- Binomial distribution
- Exponential distribution
- Normal distribution
- Pareto distribution
- Poisson distribution
- Probability measure
- Random variable
- Bernoulli process
- Continuous or discrete
- Expected value
- Variance
- Markov chain
- Observed value
- Random walk
- Stochastic process
Relationships
Key concepts
- Independence
- Conditional independence
- Law of total probability
- Law of large numbers
- Bayes’ theorem
- Boole’s inequality
Visualizations
- v
- t
- e
Part of a series on Mathematics
Areas
- Number theory
- Geometry
- Algebra
- Calculus and Analysis
- Discrete mathematics
- Logic
- Set theory
- Probability
- Statistics and Decision theory
Relationship with sciences
- v
- t
- e
The probabilities of rolling several numbers using two dice
Probability is a specific branch of mathematics and statistics fundamentally concerned with the quantification of uncertainty . It provides a rigorous framework for describing events and assigning numerical values to the likelihood of their occurrence. At its core, the probability of any given event is represented by a number ranging from 0 to 1. A probability of 0 signifies an event that is virtually impossible, while a probability of 1 indicates an event that is almost certain to occur. [note 1] [1] [2] This numerical value is frequently translated into a percentage, spanning from 0% (absolute impossibility) to 100% (absolute certainty), primarily for the benefit of those who struggle with fractions.
Consider the rather mundane example of tossing a fair, unbiased coin. Given its inherent fairness, the two potential outcomes â“heads” and “tails”âare considered equally probable. Consequently, the probability of observing “heads” is precisely equal to the probability of observing “tails.” Since the universe, in this specific instance, offers no other possibilities, the probability of either “heads” or “tails” manifesting is 1/2. This can, of course, be expressed with equal validity as 0.5 or 50%, for those who prefer their certainty in different flavors.
These foundational concepts have been meticulously formalized into an axiomatic mathematical system known as probability theory . This theoretical construct is extensively employed across a vast array of areas of study , including but not limited to statistics , pure mathematics , the various sciences , finance , the dubious art of gambling , the burgeoning fields of artificial intelligence and machine learning , computer science , the strategic intricacies of game theory , and the often convoluted realm of philosophy . Its utility lies in its capacity to facilitate the drawing of inferences regarding the anticipated frequency of events and, more broadly, to describe the underlying mechanics and inherent regularities that govern complex systems . [3]
Etymology
See also: History of probability § Etymology , and Glossary of probability and statistics Further information: Likelihood
The term “probability” itself derives from the Latin word probabilitas. Interestingly, this Latin root also carried the meaning of “probity,” which referred to a measure of the authority or credibility of a witness within a legal case in historical Europe. This credibility was, rather predictably, often correlated with the witness’s nobility or social standing. This historical context reveals a significant divergence from the modern understanding of “probability.” The classical sense centered on the trustworthiness of a statement or a person based on their perceived moral character or social weight.
In stark contrast, the contemporary meaning of probability, as a cornerstone of scientific and mathematical inquiry, functions as an objective measure of the weight of empirical evidence . It is a concept meticulously arrived at through processes of inductive reasoning and rigorous statistical inference , rather than a judgment of character. [4] One might observe that humanity’s understanding of what constitutes “truth” or “likelihood” has, thankfully, evolved beyond mere social status.
Interpretations
Main article: Probability interpretations
When one grapples with random experiments â that is, experiments characterized by both randomness and a well-defined nature â within a purely theoretical framework (such as the archetypal coin toss), probabilities can be numerically ascertained by the simple ratio of the number of desired outcomes to the total number of all possible outcomes. This approach is aptly termed “theoretical probability,” a concept distinct from empirical probability , which concerns itself with probabilities derived from actual, real-world experiments and observations. The resulting probability, as always, is a value meticulously confined between 0 and 1; a larger number implies a greater predisposition for the desired outcome to manifest.
Consider, for instance, the seemingly straightforward act of tossing a coin twice. The complete sample space of possible outcomes comprises “head-head,” “head-tail,” “tail-head,” and “tail-tail.” The probability of achieving the specific outcome of “head-head” is precisely 1 out of these 4 equally likely outcomes, which translates numerically to 1/4, or 0.25, or even 25% if one insists on percentages. Conversely, the probability of obtaining at least one head encompasses the outcomes “head-head,” “head-tail,” and “tail-head,” yielding 3 out of 4 possibilities, or 0.75. This event, naturally, is significantly more likely to occur.
However, when one ventures beyond the pristine theoretical realm into the messy domain of practical application, a schism emerges. Two predominant and often competing categories of probability interpretations vie for intellectual dominance, with their respective adherents holding fundamentally divergent views regarding the inherent nature of probability itself:
Objectivists are those who endeavor to assign numerical values to describe some intrinsic, objective, or physical state of affairs in the universe. The most widely embraced manifestation of objective probability is frequentist probability . This interpretation posits that the probability of a random event is, in essence, the relative frequency of that event’s occurrence when the underlying experiment is repeated an indefinite number of times. It conceives of probability as the limiting relative frequency observed “in the long run” â a concept that, while mathematically elegant, often feels somewhat out of reach for finite human experience. [5] A nuanced modification of this perspective is propensity probability , which interprets probability not as a frequency, but as an inherent tendency or disposition of a particular experiment to yield a certain outcome, even if that experiment is, for practical reasons, performed only once. It’s about the inherent “oomph” of a system to produce a result, rather than just counting past occurrences.
Subjectivists , on the other hand, assign numerical probabilities as a direct representation of a degree of belief . [6] This “degree of belief” has been famously interpreted as “the price at which you would buy or sell a bet that pays 1 unit of utility if E, 0 if not E” [7] â a definition that, while provocative, is not universally embraced, as some find its economic framing too restrictive for the breadth of human belief. [8] The most prominent iteration of subjective probability is Bayesian probability . This framework ingeniously incorporates both expert knowledge (often expressed as a subjective prior probability distribution ) and new experimental data (encapsulated in a likelihood function ) to generate updated probabilities. The product of this prior belief and the likelihood, once properly normalized, yields a posterior probability distribution . This posterior distribution elegantly synthesizes all available information up to that point. [9] According to Aumann’s agreement theorem , Bayesian agents who commence with sufficiently similar prior beliefs will, upon sharing information, converge towards similar posterior beliefs. However, and this is a critical caveat, sufficiently divergent initial priors can lead to persistently different conclusions, regardless of the sheer volume of information exchanged. [10] It seems even logic has its limits when faced with entrenched preconceptions.
History
Main article: History of probability Further information: History of statistics
The rigorous, scientific study of probability is, in the grand scheme of mathematics , a rather recent development. While the pervasive human fascination with gambling undeniably demonstrates a long-standing interest in quantifying notions of chance throughout recorded history , precise mathematical descriptions and formal theories emerged considerably later. There are, naturally, compelling reasons for this protracted evolution in the mathematics of probability. Even as games of chance provided the initial impetus for mathematical inquiry into probability, many fundamental issues [note 2] remained stubbornly obscured by pervasive superstitions and a general lack of systematic thought. [11] One might say humanity was too busy blaming capricious gods for bad rolls to bother with actual arithmetic.
According to the astute observations of Richard Jeffrey , “Before the middle of the seventeenth century, the term ‘probable’ (Latin probabilis) meant approvable, and was applied in that sense, univocally, to opinion and to action. A probable action or opinion was one such as sensible people would undertake or hold, in the circumstances.” [12] This highlights a profound semantic shift. The concept of “probable” was initially tied to social consensus and good judgment, not numerical likelihood. However, it’s worth noting that in specific legal contexts, ‘probable’ could indeed refer to propositions supported by substantial evidence, hinting at the eventual scientific meaning. [13]
Gerolamo Cardano (16th century) Christiaan Huygens published one of the first books on probability (17th century).
It was the brilliant, if somewhat erratic, sixteenth-century Italian polymath Gerolamo Cardano who first demonstrated the practical efficacy of defining odds as the ratio of favorable to unfavorable outcomes. This deceptively simple yet profound insight inherently implied that the probability of an event could be quantified as the ratio of favorable outcomes to the total number of all possible outcomes. [14] His work, though elementary by modern standards, laid a crucial conceptual groundwork.
Beyond Cardano’s foundational, yet somewhat isolated, contributions, the formal doctrine of probabilities truly began to coalesce with the exchange of correspondence between the eminent French mathematicians Pierre de Fermat and Blaise Pascal in 1654. Their discussions, sparked by a problem concerning games of chance, marked a pivotal moment. Shortly thereafter, Christiaan Huygens (1657) provided what is recognized as the earliest known comprehensive scientific treatise on the subject, meticulously detailing its principles. [15] The field matured further with the posthumous publication of Jakob Bernoulli ’s monumental Ars Conjectandi in 1713, and Abraham de Moivre ’s The Doctrine of Chances in 1718, both of which firmly established probability as a legitimate and distinct branch of mathematics . [16] For a deeper dive into the genesis of this concept, one might consult Ian Hacking ’s The Emergence of Probability [4] and James Franklin’s The Science of Conjecture [17].
The genesis of the theory of errors can be tenuously traced back to Roger Cotes ’s Opera Miscellanea (published posthumously in 1722). However, it was a seminal memoir prepared by Thomas Simpson in 1755 (and subsequently printed in 1756) that first systematically applied this nascent theory to the intricate discussion of errors arising from observational measurements. [18] A re-publication of this significant memoir in 1757 explicitly articulated the foundational axioms: that positive and negative errors are equally probable, and that specific, assignable limits delineate the entire range of potential errors. Simpson’s work also ventured into the realm of continuous errors, providing an early description of what would later be recognized as a probability curve.
The first two influential laws governing error distribution were both proposed by the prodigious Pierre-Simon Laplace . His initial law, published in 1774, posited that the frequency of an error could be expressed as an exponential function of the numerical magnitude of the error, without regard to its sign. The second, and perhaps more enduring, law of error was introduced by Laplace in 1778. This law stated that the frequency of the error is an exponential function of the square of the error. This second formulation is now widely known as the normal distribution or, somewhat controversially, the Gauss law. As one astute observer noted, “It is difficult historically to attribute that law to Gauss, who in spite of his well-known precocity had probably not made this discovery before he was two years old.” [19] A rather pointed way of saying credit should be given where it’s due, not to future prodigies.
Daniel Bernoulli (1778) further enriched the field by introducing the principle of the maximum product of the probabilities of a system comprising concurrent errors.
Carl Friedrich Gauss
Adrien-Marie Legendre (1805) made a significant contribution with the development of the method of least squares , which he introduced in his Nouvelles mĂ©thodes pour la dĂ©termination des orbites des comĂštes (New Methods for Determining the Orbits of Comets). [20] Remarkably, and in apparent ignorance of Legendre’s prior work, an Irish-American writer named Robert Adrain , then editor of “The Analyst” (1808), independently derived the law of facility of error, which he expressed as:
$$ \phi (x)=ce^{-h^{2}x^{2}} $$
where $h$ represents a constant directly related to the precision of observation, and $c$ is a scale factor specifically designed to ensure that the total area beneath the curve equates to 1. Adrain furnished two distinct proofs for this law, with the second bearing a striking resemblance to the one later provided by John Herschel in 1850. [ citation needed ] Carl Friedrich Gauss subsequently presented the first proof of this law that gained widespread recognition in Europe (making it the third proof known after Adrain’s), publishing it in 1809. Additional proofs and refinements were later offered by Laplace (1810, 1812), Gauss again (1823), James Ivory (1825, 1826), Hagen (1837), Friedrich Bessel (1838), W.F. Donkin (1844, 1856), and Morgan Crofton (1870). Other notable contributors to the burgeoning theory of errors included Robert Leslie Ellis (1844), Augustus De Morgan (1864), James Whitbread Lee Glaisher (1872), and Giovanni Schiaparelli (1875). Christian August Friedrich Peters ’s (1856) formula [ clarification needed ] for $r$, which represents the probable error of a single observation, remains a well-known result in the field.
Throughout the nineteenth century, numerous authors expanded upon the general theory of probability. Prominent among them were Laplace himself, Sylvestre Lacroix (1816), Littrow (1833), Adolphe Quetelet (1853), Richard Dedekind (1860), Helmert (1872), Hermann Laurent (1873), Liagre, Didion, and Karl Pearson . The theoretical exposition of probability was significantly refined and clarified by the contributions of Augustus De Morgan and George Boole , whose work cemented its logical foundations.
A significant advancement in the early 20th century occurred in 1906 when Andrey Markov introduced [21] the concept of Markov chains . These mathematical models proved to be profoundly influential, playing a critical role in the development of stochastic processes theory and its wide-ranging applications across various disciplines. The modern, rigorous theory of probability, built upon the sophisticated framework of measure theory , was ultimately developed by the brilliant Russian mathematician Andrey Kolmogorov in 1931. [22] His axiomatic approach provided the robust mathematical foundation that underpins all contemporary probability theory.
On the geometric front, contributors to The Educational Times who advanced the field of integral geometry included Miller, Crofton, McColl, Wolstenholme, Watson, and Artemas Martin . [23]
Theory
Main article: Probability theory
Like most theories that attempt to impose order on the universe, the theory of probability functions as a representation of its underlying concepts in formal, abstract terms. These terms are designed to be considered independently of their immediate meaning, allowing for manipulation by the established rules of mathematics and logic . Any results derived from these manipulations are then reinterpreted or translated back into the specific problem domain, hopefully shedding some light on the chaos.
At least two notably successful attempts have been made to rigorously formalize probability, demonstrating humanity’s persistent need to categorize and codify everything. These are the Kolmogorov formulation and the Cox formulation. In Kolmogorov’s formulation (which is inextricably linked to the concept of a probability space ), abstract sets are interpreted as events , and probability itself is understood as a measure defined over a specific class of these sets. This approach provides a robust and widely accepted axiomatic foundation. In contrast, Cox’s theorem takes probability as a fundamental primitive â that is, a concept not subject to further decomposition or analysis. The emphasis here shifts to constructing a consistent and coherent assignment of probability values to logical propositions. Crucially, despite their differing starting points, the fundamental laws of probability derived from both formulations are, apart from minor technical distinctions, essentially identical. It seems even differing philosophical stances can lead to the same mathematical truths, a rather inconvenient consistency.
It is worth noting that other methods exist for quantifying uncertainty, such as the DempsterâShafer theory or possibility theory . However, these approaches are fundamentally distinct from and, more importantly, incompatible with the commonly understood and widely applied laws of probability. They exist in their own separate, less universally applicable, conceptual spaces.
Applications
Probability theory is not merely an abstract mathematical construct; it permeates everyday life, particularly in the critical domains of risk assessment and modeling . The insurance industry and various financial markets , for instance, heavily rely on actuarial science , a discipline deeply rooted in probability, to meticulously determine pricing structures and inform complex trading decisions. Governments, too, employ probabilistic methodologies in areas ranging from environmental regulation to the intricate analysis of entitlement programs and the enforcement of financial regulation . It seems even bureaucracy benefits from a touch of calculated uncertainty.
A particularly vivid illustration of probability theory’s application in equity trading can be seen in its influence on oil prices. The perceived probability of a widespread conflict erupting in the Middle East, for example, can trigger significant fluctuations in oil prices, which then ripple throughout the global economy. Should a commodity trader assess that the likelihood of war has increased, this conviction can instantly send that commodity’s prices soaring or plummeting, simultaneously signaling this assessment to a network of other traders. Consequently, these probabilities are rarely assessed in isolation, nor are they necessarily derived through purely rational processes. The field of behavioral finance emerged precisely to scrutinize and describe the profound impact of such collective phenomena, often termed groupthink , on pricing mechanisms, policy decisions, and even the delicate balance of peace and conflict. [24] Apparently, even the cold logic of markets isn’t immune to human irrationality.
Beyond financial evaluations, probability serves as an invaluable analytical tool for discerning trends in biology (such as the dynamics of disease spread) and ecology (for example, in the construction and interpretation of biological Punnett squares ). [25] Much like in finance, risk assessment, when viewed through a statistical lens, provides a means to calculate the likelihood of undesirable events materializing. This, in turn, can actively inform the implementation of protocols and strategies designed to mitigate or entirely circumvent such circumstances. Unsurprisingly, probability is also meticulously employed in the design of games of chance . Casinos, in a masterstroke of calculated self-interest, leverage probability to ensure a guaranteed long-term profit, while simultaneously calibrating payouts to players frequently enough to sustain engagement and encourage continued, albeit ultimately losing, play. [26]
Another profoundly significant application of probability theory in the fabric of daily life is in the realm of reliability . A vast array of consumer products, from the ubiquitous automobiles that transport us to the intricate consumer electronics that define our modern existence, incorporate reliability theory into their fundamental design. The objective is to systematically reduce the probability of product failure. The calculated probability of failure can, and often does, directly influence a manufacturer’s decisions regarding the scope and duration of a product’s warranty . [27]
Furthermore, sophisticated cache language models and other statistical language models , which are integral components of modern natural language processing systems, stand as compelling examples of the practical application of probability theory. These models leverage probabilistic principles to predict and understand the nuances of human language.
Mathematical treatment
Calculation of probability (risk) vs odds
See also: Probability axioms
Let us consider an experiment capable of yielding a multitude of distinct results. The comprehensive collection of all such possible results is formally designated as the sample space of the experiment, frequently symbolized by the Greek capital letter $\Omega$. The power set of this sample space is then constructed by contemplating every conceivable distinct collection of these possible results. For illustrative purposes, imagine the simple act of rolling a standard six-sided die. This action can produce six discrete possible results: {1, 2, 3, 4, 5, 6}. Now, consider a specific collection of possible results, for instance, the event that the die lands on an odd number. This specific collection, represented as the subset {1, 3, 5}, constitutes an element within the power set of the sample space of dice rolls. These defined collections are universally referred to as “events.” In this particular scenario, {1, 3, 5} is the event corresponding to the die showing an odd number. If the results that actually manifest during the experiment happen to fall within a given event, then that event is said to have occurred.
A probability, in its most fundamental mathematical sense, is a function or a way of assigning to every event a specific numerical value constrained between zero and one, inclusive. This assignment is not arbitrary; it is bound by a crucial requirement: the event comprising all possible results (which, in our die-rolling example, would be the entire sample space {1, 2, 3, 4, 5, 6}) must be assigned a value of one. To legitimately qualify as a probability, this assignment of values must further satisfy a critical condition known as the axiom of additivity: for any collection of mutually exclusive events (that is, events that possess no common results, such as the distinct events {1, 6}, {3}, and {2, 4}), the probability that at least one of these events will occur is precisely given by the sum of the probabilities of all the individual events within that collection. [28] It’s a rather elegant system, if you bother to pay attention.
The probability of an event $A$ is typically denoted in one of several ways: $P(A)$, [29] $p(A)$, or $\text{Pr}(A)$. [30] This precise mathematical definition of probability, grounded in measure theory, possesses the remarkable capacity to extend its applicability to infinite sample spaces , and even to uncountably infinite sample spaces, by leveraging the sophisticated concept of a measure .
The counterpart to an event $A$ is its opposite, or complementary event , often referred to as “not $A$.” This signifies the event of $A$ simply not occurring. Various notations are employed to denote this complement, including $A’$, $A^c$, $\overline{A}$, $A^{\complement}$, $\neg A$, or $\sim A$. Its probability is determined by the straightforward relationship $P(\text{not } A) = 1 - P(A)$. [31] For instance, the probability of not rolling a six on a six-sided die is calculated as 1 minus the probability of rolling a six, which is $1 - \frac{1}{6} = \frac{5}{6}$. For a more exhaustive exploration, one might consult the article on Complementary event .
If two distinct events, $A$ and $B$, both manifest during a single execution of an experiment, this scenario is termed the intersection, or more formally, the joint probability of $A$ and $B$. This is typically denoted as $P(A \cap B)$.
Independent events
Events A and B depicted as mutually exclusive vs non-mutually exclusive in space Ω
Should two events, $A$ and $B$, be truly independent â meaning the occurrence of one does not influence the occurrence of the other â then their joint probability is simply the product of their individual probabilities: [29]
$$ P(A \text{ and } B) = P(A \cap B) = P(A)P(B). $$
For example, if one were to flip two coins, the probability of both coins landing on heads is derived by multiplying the probability of the first coin being heads (1/2) by the probability of the second coin being heads (1/2), yielding $\frac{1}{2} \times \frac{1}{2} = \frac{1}{4}$. [32] A simple concept, yet often forgotten when humans try to predict lottery numbers.
Mutually exclusive events
Main article: Mutual exclusivity
If two events, $A$ and $B$, possess the characteristic that one or the other can occur, but they can never occur simultaneously during a single trial of an experiment, then they are formally classified as mutually exclusive events .
For such mutually exclusive events , the probability of both occurring is denoted as $P(A \cap B)$, and is, by definition, zero:
$$ P(A \text{ and } B) = P(A \cap B) = 0. $$
Conversely, if two events are mutually exclusive , the probability of either one occurring (i.e., the union of the events) is denoted as $P(A \cup B)$, and is simply the sum of their individual probabilities:
$$ P(A \text{ or } B) = P(A \cup B) = P(A) + P(B) - P(A \cap B) = P(A) + P(B) - 0 = P(A) + P(B). $$
To illustrate, consider the probability of rolling a 1 or a 2 on a fair six-sided die. Since rolling a 1 and rolling a 2 are mutually exclusive events (one cannot do both simultaneously), the probability is the sum of their individual probabilities: $P(1 \text{ or } 2) = P(1) + P(2) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6} = \frac{1}{3}$.
Not (necessarily) mutually exclusive events
When the events in question are not necessarily mutually exclusive â meaning they can occur simultaneously â the calculation of the probability of either event occurring requires a slight adjustment to avoid double-counting. This is articulated by the principle of inclusion-exclusion for two events:
$$ P\left(A \text{ or } B\right) = P(A \cup B) = P\left(A\right) + P\left(B\right) - P\left(A \text{ and } B\right). $$
Rewritten using standard set notation for clarity:
$$ P\left(A \cup B\right) = P\left(A\right) + P\left(B\right) - P\left(A \cap B\right). $$
As a concrete example, imagine drawing a single card from a standard deck of 52 playing cards. What is the probability of drawing either a heart or a face card (Jack, Queen, King)? There are 13 hearts, 12 face cards (3 per suit), but 3 of these face cards are also hearts (the Jack, Queen, and King of hearts). If we simply added $P(\text{Heart}) + P(\text{Face Card})$, we would double-count these 3 cards. Therefore, we subtract the probability of drawing a card that is both a heart and a face card:
$$ \frac{13}{52} + \frac{12}{52} - \frac{3}{52} = \frac{22}{52} = \frac{11}{26}. $$
Here, the cards that are both hearts and face cards were included in the “13 hearts” count and also in the “12 face cards” count, hence the necessity of subtracting their joint probability to ensure they are counted only once in the union.
This pattern of inclusion-exclusion can be systematically extended to accommodate multiple events that are not necessarily mutually exclusive. For three events, $A$, $B$, and $C$, the calculation proceeds as follows:
$$ \begin{aligned} P\left(A\cup B\cup C\right)=&P\left(\left(A\cup B\right)\cup C\right)\ =&P\left(A\cup B\right)+P\left(C\right)-P\left(\left(A\cup B\right)\cap C\right)\ =&P\left(A\right)+P\left(B\right)-P\left(A\cap B\right)+P\left(C\right)-P\left(\left(A\cap C\right)\cup \left(B\cap C\right)\right)\ =&P\left(A\right)+P\left(B\right)+P\left(C\right)-P\left(A\cap B\right)-\left(P\left(A\cap C\right)+P\left(B\cap C\right)-P\left(\left(A\cap C\right)\cap \left(B\cap C\right)\right)\right)\ P\left(A\cup B\cup C\right)=&P\left(A\right)+P\left(B\right)+P\left(C\right)-P\left(A\cap B\right)-P\left(A\cap C\right)-P\left(B\cap C\right)+P\left(A\cap B\cap C\right) \end{aligned} $$
It becomes evident, then, that this systematic pattern can be generalized and meticulously applied for any arbitrary number of events, a testament to the elegant, albeit occasionally tedious, nature of mathematical rigor.
Conditional probability
Conditional probability delves into the likelihood of some event $A$ occurring given that some other event $B$ has already occurred or is known to be true. This is a crucial concept, as it allows for the updating of probabilities based on new information. Conditional probability is formally written as $P(A \mid B)$, and is verbally interpreted as “the probability of $A$, given $B$.” It is mathematically defined by the ratio: [33]
$$ P(A \mid B) = \frac{P(A \cap B)}{P(B)} $$
This definition holds true provided that $P(B) > 0$. If $P(B) = 0$, the expression $P(A \mid B)$ is formally undefined by this specific formula. In such a scenario, if $A$ and $B$ are independent, then $P(A \cap B) = P(A)P(B) = 0$. However, it is important to note that it is indeed possible to define a conditional probability for certain zero-probability events, for instance, by employing the more advanced mathematical construct of a Ï-algebra of such events (as might arise when dealing with a continuous random variable ). [34]
Consider a practical illustration: imagine a bag containing 2 red balls and 2 blue balls, making a total of 4 balls. Initially, the probability of drawing a red ball is a straightforward $\frac{2}{4} = \frac{1}{2}$. However, the probability of drawing a second ball of a particular color is dependent on the color of the ball that was drawn first (and not replaced). For example, if a red ball was drawn initially, then the remaining balls are 1 red and 2 blue. The probability of then drawing another red ball becomes $\frac{1}{3}$. Conversely, if a blue ball was drawn first, the remaining balls would be 2 red and 1 blue. In this case, the probability of subsequently drawing a red ball would be $\frac{2}{3}$. This demonstrates how the universe, once observed, changes its probabilities.
Inverse probability
Main article: Inverse probability
Within the realm of probability theory and its manifold applications, Bayes’ rule serves as a fundamental principle that elegantly relates the odds of one event, $A_1$, to another event, $A_2$, both before (the prior probability) and after (the posterior probability) conditioning on the occurrence of a third event, $B$. The odds of $A_1$ relative to $A_2$ is simply the ratio of their respective probabilities. When one is interested in an arbitrary number of events $A$, rather than just a pair, Bayes’ rule can be more broadly stated as: “the posterior is proportional to the prior times the likelihood,” typically expressed as:
$$ P(A | B) \propto P(A)P(B | A) $$
Here, the proportionality symbol ($\propto$) signifies that the left-hand side is directly proportional to (i.e., equals a constant multiplied by) the right-hand side, specifically as $A$ varies while $B$ remains fixed or given (as noted by Lee, 2012, and Bertsch McGrayne, 2012). This form of the rule has historical roots stretching back to Laplace (1774) and Cournot (1843); further historical context can be found in Fienberg (2005). It’s a way of formally updating your beliefs, a process many humans perform informally and often incorrectly.
Summary of probabilities
| Event | Probability Not to be confused with probability theory , game theory , graph theory , or statistics .
Probability: The Art of Quantifying Uncertainty
The concept of probability stands as a foundational pillar within both mathematics and statistics , serving as the primary intellectual tool for grappling with the inherent unpredictability of the universe. It is, quite simply, the branch of human endeavor that concerns itself with events and the precise numerical descriptions of just how likely those events are to actually manifest.
At its core, the probability assigned to any given event is a number meticulously confined to the interval between 0 and 1, inclusive. This numerical value acts as a direct indicator of likelihood: a larger probability unequivocally signifies a greater chance of an event occurring. [note 1] [1] [2] For convenience, or perhaps for those who prefer their certainty in simpler terms, this number is frequently expressed as a percentage, spanning the range from 0% (absolute impossibility) to 100% (absolute certainty).
Consider the rather mundane, yet perfectly illustrative, example of tossing a fair (meaning, scrupulously unbiased) coin. Because the coin is, by definition, fair, the two possible outcomes â“heads” and “tails”âare inherently equally probable. Thus, the probability of observing “heads” is identical to the probability of observing “tails.” Since the laws of physics, in this particular instance, permit no other outcomes, the probability of either “heads” or “tails” occurring is precisely 1/2. This fraction can, of course, be equivalently represented as 0.5, or even 50%, for those who prefer their numerical expressions in different formats.
These fundamental concepts are not merely abstract notions; they have been meticulously woven into an axiomatic mathematical formalization known as probability theory . This robust theoretical framework finds widespread application across an astonishingly diverse array of areas of study . These include, but are by no means limited to, the rigorous fields of statistics , pure mathematics , the various sciences , the complex world of finance , the strategically intricate domain of gambling , the rapidly evolving disciplines of artificial intelligence and machine learning , the foundational principles of computer science , the competitive strategies explored in game theory , and even the philosophical inquiries into the nature of knowledge and belief. In these varied contexts, probability theory is employed to, for example, draw sophisticated inferences about the expected frequency of events. More broadly, it serves as an indispensable tool for describing the underlying mechanics and predictable regularities that govern the behavior of complex systems , offering a window into the otherwise opaque dance of cause and effect. [3]
Etymology
See also: History of probability § Etymology , and Glossary of probability and statistics Further information: Likelihood
The very word “probability” itself can be traced back to its Latin progenitor, probabilitas. Intriguingly, this Latin term carried a dual meaning, encompassing not only a nascent sense of likelihood but also the concept of “probity.” In the context of historical Europe, particularly within the legal systems, “probity” referred to a measure of the authority or credibility attributed to a witness in a legal case . This measure of trustworthiness was, rather predictably and often unfortunately, frequently correlated with the witness’s social standing or nobility . A noble’s word, it seems, held more probabilitas than a commoner’s.
This historical etymology reveals a fascinating and significant divergence from the modern, scientific understanding of probability. The archaic meaning was deeply intertwined with social judgment, moral character, and the perceived reliability of a human source. In essence, it was a subjective assessment of believability. In stark contrast, the contemporary meaning of probability, as it is employed in mathematics and science, functions as an objective and quantifiable measure of the weight of empirical evidence . This modern interpretation is meticulously arrived at through systematic processes of inductive reasoning and rigorous statistical inference , rather than through an appraisal of a person’s social standing or inherent trustworthiness. [4] It seems humanity eventually realized that the universe doesn’t care about your noble lineage when it comes to predicting outcomes.
Interpretations
Main article: Probability interpretations
When one engages with the abstract realm of random experiments âthat is, experiments that are both inherently random and meticulously well-defined âwithin a purely theoretical setting (such as the classic thought experiment of tossing a coin), probabilities can be straightforwardly quantified. This is achieved by taking the number of desired outcomes and dividing it by the total number of all possible outcomes. This particular method yields what is known as “theoretical probability,” a concept distinct from empirical probability , which, by its nature, deals with probabilities derived from observations in actual, real-world experiments. As always, the resultant probability remains a number strictly bounded between 0 and 1; a higher value invariably indicates a greater predisposition for the desired outcome to occur.
To illustrate, consider the seemingly simple act of tossing a coin twice. The complete set of all possible outcomes, the sample space , would be: “head-head,” “head-tail,” “tail-head,” and “tail-tail.” The probability of obtaining the specific outcome of “head-head” is clearly 1 out of these 4 equally likely outcomes, which translates numerically to 1/4, or 0.25, or even 25% for those who prefer percentages. Conversely, the probability of achieving at least one head would encompass three of these four outcomes (“head-head,” “head-tail,” “tail-head”), yielding a probability of 3 out of 4, or 0.75. This event is, naturally, significantly more likely to occur than the specific “head-head” scenario.
However, once we leave the pristine, theoretical sandbox and venture into the messy, practical application of probability, a fundamental divergence in philosophical approach becomes apparent. There are two major, often competing, categories of probability interpretations , each with its own adherents who hold distinct views regarding the fundamental nature of probability itself:
Objectivists are those who endeavor to assign numerical probabilities as descriptions of some inherent, objective, or physical state of affairs in the world. The most prevalent form of objective probability is frequentist probability . This interpretation posits that the probability of a random event is, in essence, the relative frequency of that event’s occurrence when the underlying experiment is repeated an indefinite number of times under identical conditions. It fundamentally views probability as the limiting relative frequency observed “in the long run” â a concept that, while mathematically sound, can feel frustratingly asymptotic to human observers. [5c A related, yet distinct, modification is propensity probability , which interprets probability not as a frequency of past events, but as an inherent tendency or disposition of a particular experiment or system to yield a certain outcome, even if the experiment is performed only once. It’s less about what has happened and more about what is capable of happening.
Subjectivists , conversely, assign numerical probabilities not to external states, but as a representation of a degree of belief held by an individual. [6] This “degree of belief” has been ingeniously, if somewhat provocatively, interpreted as “the price at which you would buy or sell a bet that pays 1 unit of utility if E, 0 if not E.” [7] This economic framing, while insightful for some, is not universally accepted as the sole or definitive interpretation of subjective belief. [8] The most widely adopted form of subjective probability is Bayesian probability . This powerful framework allows for the systematic integration of existing “expert knowledge” (which is formalized as a subjective prior probability distribution ) with new experimental data (captured by a likelihood function ). The product of this prior belief and the likelihood, once mathematically normalized, yields a posterior probability distribution . This posterior distribution embodies all the information known to date, representing an updated and refined degree of belief. [9] A key consequence, articulated by Aumann’s agreement theorem , is that Bayesian agents who begin with sufficiently similar prior beliefs will, upon sharing information, eventually converge towards similar posterior beliefs. However, and this is a critical observation, if their initial priors are sufficiently different, they can lead to persistently divergent conclusions, regardless of how much information the agents subsequently share. [10] It seems even the most rigorous logic struggles to overcome the stubbornness of initial assumptions.
History
Main article: History of probability Further information: History of statistics
The scientific and systematic study of probability is, when viewed against the vast timeline of mathematics , a relatively recent intellectual endeavor. While the enduring human penchant for gambling undeniably demonstrates a long-standing, if often intuitive, interest in quantifying notions of chance throughout recorded history , precise mathematical descriptions and a formal theoretical framework emerged considerably later. There are, in retrospect, several compelling reasons for this rather slow development of the mathematics of probability. Even as games of chance provided a powerful impetus for the mathematical study of probability, many fundamental issues [note 2c remained stubbornly obscured by pervasive superstitions and a general resistance to rigorous, systematic thought. [11] One might suggest that for centuries, humanity preferred to attribute unpredictable outcomes to divine whim or fickle luck rather than to the cold logic of numbers.
According to the incisive analysis of Richard Jeffrey , “Before the middle of the seventeenth century, the term ‘probable’ (Latin probabilis) meant approvable, and was applied in that sense, univocally, to opinion and to action. A probable action or opinion was one such as sensible people would undertake or hold, in the circumstances.” [12] This observation highlights a profound semantic and conceptual shift. The original meaning of “probable” was rooted in social consensus, good judgment, and what was considered reasonable behavior, not in objective numerical likelihood. Nevertheless, it is worth noting that in specific legal contexts of the era, “probable” could also be applied to propositions for which there existed strong supporting evidence, foreshadowing its eventual scientific meaning. [13]
Gerolamo Cardano (16th century) Christiaan Huygens published one of the first books on probability (17th century).
It was the brilliant, if somewhat unconventional, sixteenth-century Italian polymath Gerolamo Cardano who first effectively demonstrated the utility of defining odds as a straightforward ratio of favorable outcomes to unfavorable outcomes. This seemingly simple yet deeply insightful approach inherently implied that the probability of an event could be rigorously determined by the ratio of favorable outcomes to the total number of all possible outcomes. [14] Cardano’s work, though foundational, remained somewhat isolated for a time.
The true genesis of the doctrine of probabilities as a distinct mathematical discipline is widely attributed to the seminal correspondence between the eminent French mathematicians Pierre de Fermat and Blaise Pascal in 1654. Their exchange, ignited by a problem related to the fair division of stakes in an unfinished game of chance, provided the critical spark. Following closely on their heels, Christiaan Huygens (1657) produced what is recognized as the earliest comprehensive and scientific treatment of the subject, systematically laying out its principles. [15] The field gained further mathematical maturity with the posthumous publication of Jakob Bernoulli ’s monumental Ars Conjectandi in 1713, and Abraham de Moivre ’s equally significant The Doctrine of Chances in 1718. Both works firmly established probability as a legitimate and robust branch of mathematics . [16] For those interested in the intricate historical tapestry of how the very concept of mathematical probability emerged, the works of Ian Hacking ’s The Emergence of Probability [4] and James Franklin’s The Science of Conjecture [17] offer invaluable insights.
The theoretical underpinnings of the theory of errors can be traced, albeit somewhat indirectly, to Roger Cotes ’s Opera Miscellanea, which saw posthumous publication in 1722. However, it was a pivotal memoir meticulously prepared by Thomas Simpson in 1755 (and subsequently printed in 1756) that marked the first systematic application of this burgeoning theory to the practical and crucial discussion of errors inherent in observational data. [18] A subsequent reprint of this significant memoir in 1757 explicitly articulated foundational axioms: specifically, that positive and negative errors are considered equally probable, and that certain assignable limits definitively delineate the entire range of all potential errors. Simpson’s pioneering work also ventured into the realm of continuous errors, providing an early conceptual description of what would later become known as a probability curve.
The initial two highly influential laws governing the distribution of errors were both formulated by the brilliant Pierre-Simon Laplace . His first law, published in 1774, proposed that the frequency of an error could be precisely expressed as an exponential function of the numerical magnitude of the error, without regard to its sign. The second law of error, introduced by Laplace in 1778, asserted that the frequency of the error is an exponential function of the square of the error. This second formulation has since become famously known as the normal distribution or, somewhat contentiously, the Gauss law. As was dryly noted at the time, “It is difficult historically to attribute that law to Gauss, who in spite of his well-known precocity had probably not made this discovery before he was two years old.” [19] A rather pointed way to suggest that historical credit should be accurately assigned, even among geniuses.
Daniel Bernoulli (1778) further enriched the field by introducing the principle of the maximum product of the probabilities, specifically in the context of a system of concurrent errors, adding a layer of sophistication to error analysis.
Carl Friedrich Gauss
Adrien-Marie Legendre (1805) made a significant and independent stride by developing the method of least squares , which he unveiled in his influential work, Nouvelles mĂ©thodes pour la dĂ©termination des orbites des comĂštes (New Methods for Determining the Orbits of Comets). [20] Remarkably, and without apparent knowledge of Legendre’s prior contribution, an Irish-American writer named Robert Adrain , then serving as editor of “The Analyst” (1808), independently deduced the fundamental law of facility of error. He articulated this law as:
$$ \phi (x)=ce^{-h^{2}x^{2}} $$
where $h$ is a constant directly indicative of the precision of observation, and $c$ is a scale factor meticulously chosen to ensure that the total area under the curve precisely equals 1. Adrain provided two distinct proofs for his derivation, with the second bearing a striking resemblance to the proof later offered by John Herschel in 1850. [ citation needed ] Carl Friedrich Gauss subsequently presented the first proof of this law that gained widespread recognition across Europe (making it the third known proof after Adrain’s), publishing his findings in 1809. Over the ensuing decades, further proofs and refinements were contributed by a host of mathematicians, including Laplace (1810, 1812), Gauss again (1823), James Ivory (1825, 1826), Hagen (1837), Friedrich Bessel (1838), W.F. Donkin (1844, 1856), and Morgan Crofton (1870). Other notable figures who enriched this field included Robert Leslie Ellis (1844), Augustus De Morgan (1864), James Whitbread Lee Glaisher (1872), and Giovanni Schiaparelli (1875). Christian August Friedrich Peters ’s (1856) formula [ clarification needed ] for $r$, representing the probable error of a single observation, remains a widely recognized result.
Throughout the nineteenth century, a multitude of authors contributed significantly to the general theory of probability. Among the prominent figures were Laplace himself, Sylvestre Lacroix (1816), Littrow (1833), Adolphe Quetelet (1853), Richard Dedekind (1860), Helmert (1872), Hermann Laurent (1873), Liagre, Didion, and Karl Pearson . The exposition and logical rigor of the theory were substantially enhanced and clarified by the profound contributions of Augustus De Morgan and George Boole , whose work laid crucial foundations for modern logic and set theory.
A pivotal moment in the early 20th century occurred in 1906 when Andrey Markov introduced [21] the groundbreaking notion of Markov chains . These mathematical models proved to be immensely influential, playing a critical and foundational role in the subsequent development of stochastic processes theory and its vast array of applications across numerous scientific and engineering disciplines. The modern, rigorous theory of probability, built upon the sophisticated and abstract framework of measure theory , was ultimately developed by the brilliant Soviet mathematician Andrey Kolmogorov in 1931. [22] His axiomatic approach provided the robust and universally accepted mathematical foundation that underpins all contemporary probability theory, finally bringing true order to the concept of chance.
On the geometric side of this evolving field, notable contributors to The Educational Times who advanced the study of integral geometry included Miller, Crofton, McColl, Wolstenholme, Watson, and Artemas Martin . [23]
Theory
Main article: Probability theory
Like all ambitious theories that seek to impose intellectual order on the sprawling complexity of the universe, the theory of probability functions as a formal representation of its underlying concepts. This means its terms are defined abstractly, allowing them to be considered, and more importantly, manipulated independently of their immediate real-world meaning. These formal terms are then subjected to the rigorous rules of mathematics and logic , and any results derived from these operations are subsequently interpreted or translated back into the original problem domain, ideally providing clearer insight.
Humanity, in its persistent quest for intellectual rigor, has produced at least two highly successful attempts to formally axiomatize probability: the Kolmogorov formulation and the Cox formulation. In Kolmogorov’s formulation (which is deeply intertwined with the concept of a probability space ), abstract sets are interpreted as events , and probability itself is defined as a measure assigned to a specific class of these sets. This approach provides a robust and widely accepted axiomatic foundation for the entire field. In contrast, Cox’s theorem takes probability as a primitive, or fundamental, concept â one that is not subjected to further analysis or decomposition. Here, the emphasis shifts to the construction of a consistent and coherent assignment of probability values to logical propositions. Crucially, despite their differing starting points and philosophical underpinnings, the fundamental laws of probability derived from both formulations are, apart from minor technical distinctions, essentially identical. It seems even divergent paths can lead to the same mathematical truths, which is either reassuring or profoundly boring, depending on your perspective.
It is important to acknowledge that other methods exist for quantifying uncertainty, such as the DempsterâShafer theory or possibility theory . However, these alternative frameworks are fundamentally different in their conceptual structure and, critically, are not compatible with the universally understood and applied laws of probability. They operate under distinct logical paradigms, offering different lenses through which to view uncertainty, but they are not interchangeable with classical probability theory.
Applications
Probability theory is far from an esoteric academic exercise; its applications are deeply embedded in the practical realities of everyday life, particularly in the critical domains of risk assessment and sophisticated modeling . The colossal insurance industry and the dynamic world of financial markets , for instance, are inextricably linked to actuarial science , a discipline that is fundamentally built upon probabilistic principles. Actuaries employ these principles to meticulously determine pricing strategies, assess potential liabilities, and inform complex trading decisions. Governments, too, extensively utilize probabilistic methods in a wide array of public policy areas, ranging from the formulation of environmental regulation to the intricate analysis of entitlement programs and the vigilant oversight of financial regulation . It seems even the most bureaucratic entities recognize the utility of quantifying uncertainty, however reluctantly.
A particularly salient example of probability theory’s pervasive influence in equity trading manifests in the profound effect of perceived geopolitical instability on commodity prices. Consider, for instance, how the perceived probability of a widespread conflict erupting in the Middle East can trigger immediate and significant fluctuations in global oil prices, which then generate cascading ripple effects throughout the entire economy. Should an astute commodity trader assess that the likelihood of war has increased, this conviction can instantly send that commodity’s prices either soaring or plummeting, simultaneously transmitting this informed opinion to a vast network of other traders. This highlights a crucial point: such probabilities are rarely assessed in isolation, nor are they necessarily derived through purely rational and objective processes. The burgeoning field of behavioral finance emerged precisely to scrutinize and elucidate the profound impact of such collective cognitive phenomena, often pejoratively termed groupthink , on pricing mechanisms, policy formulation, and even the delicate balance between peace and conflict. [24] It seems the market, like humanity itself, is often more driven by collective sentiment than by cold, hard numbers.
Beyond financial evaluations, probability serves as an indispensable analytical tool for discerning and predicting trends in diverse scientific fields such as biology (for example, understanding the dynamics of disease spread within a population) and ecology (such as predicting genetic outcomes using biological Punnett squares ). [25] As with finance, risk assessment, when viewed through a rigorous statistical lens, provides a powerful means to calculate the likelihood of undesirable events materializing. This, in turn, can directly inform the implementation of preventative protocols and strategic interventions designed to mitigate or entirely circumvent such adverse circumstances. Unsurprisingly, probability is also meticulously employed in the design and management of games of chance . Casinos, in a masterstroke of applied mathematics and self-interest, leverage probabilistic principles to ensure a guaranteed long-term profit, while simultaneously calibrating payouts to players frequently enough to maintain engagement and encourage continued, albeit ultimately losing, participation. [26] It’s a finely tuned system for separating people from their money.
Another profoundly significant and often overlooked application of probability theory in the fabric of everyday life is found in the realm of reliability . A vast array of consumer products, from the complex engineering of automobiles to the intricate circuitry of consumer electronics, meticulously integrate reliability theory into their fundamental design and manufacturing processes. The overarching objective is to systematically reduce the probability of product failure over its anticipated lifespan. The calculated probability of failure can, and frequently does, directly influence a manufacturer’s decisions regarding the scope, duration, and specific terms of a product’s warranty . [27]
Furthermore, sophisticated computational models such as the cache language model and other statistical language models , which are integral components of modern natural language processing (NLP) systems, stand as compelling examples of the practical and powerful application of probability theory. These models leverage probabilistic principles to predict, analyze, and generate human language, demonstrating how abstract mathematical concepts can be used to decipher even the complexities of human communication.
Mathematical treatment
Calculation of probability (risk) vs odds
See also: Probability axioms
Let us consider an experiment capable of producing a finite or countably infinite number of distinct results. The comprehensive collection of all such possible results is formally designated as the sample space of the experiment, often symbolized by the Greek capital letter $\Omega$. The power set of this sample space is then formed by considering all possible unique collections of these results. For a concrete example, imagine the simple act of rolling a standard six-sided die. This action can yield six discrete possible results, forming the sample space {1, 2, 3, 4, 5, 6}. Now, consider a specific collection of these possible results, such as the event that the die lands on an odd number. This specific collection, represented as the subset {1, 3, 5}, constitutes an element within the power set of the sample space of dice rolls. These defined collections of outcomes are universally referred to as “events.” In this particular case, {1, 3, 5} is the event that the die shows an odd number. If the actual result of the experiment falls within a given event, that event is said to have occurred.
A probability, in its most fundamental mathematical definition, is a function that systematically assigns to every event a specific numerical value. This value is strictly confined to the closed interval between zero and one, inclusive. This assignment is not arbitrary; it is governed by a crucial set of axioms. A primary requirement is that the event encompassing all possible results (which, in our die-rolling example, is the entire sample space {1, 2, 3, 4, 5, 6}) must be assigned a value of one, representing absolute certainty. To legitimately qualify as a probability measure, this assignment of values must further satisfy the requirement that for any collection of mutually exclusive events (events that, by definition, share no common results, such as the distinct events {1,6}, {3}, and {2,4} from our die roll), the probability that at least one of these events will occur is precisely given by the sum of the probabilities of all the individual events within that collection. [28] It’s a rather elegant system, if one appreciates order.
The probability of an event $A$ is conventionally written using one of several notations: $P(A)$, [29] $p(A)$, or $\text{Pr}(A)$. [30] This precise mathematical definition of probability, rooted in measure theory , possesses the profound capacity to extend its applicability beyond finite sample spaces, encompassing even countably infinite and uncountably infinite sample spaces, by leveraging the sophisticated concepts of measurable sets and probability measures.
The antithesis, or complementary event , of an event $A$ is the event “not $A$.” This signifies the occurrence of $A$ not happening. It is commonly denoted by various symbols, including $A’$, $A^c$, $\overline{A}$, $A^{\complement}$, $\neg A$, or $\sim A$. Its probability is determined by the straightforward relationship $P(\text{not } A) = 1 - P(A)$. [31] As an illustrative example, the probability of not rolling a six on a standard six-sided die is calculated as 1 minus the probability of rolling a six, which is $1 - \frac{1}{6} = \frac{5}{6}$. For a more comprehensive discussion of this concept, one may consult the article on Complementary event .
When two distinct events, $A$ and $B$, both occur during a single performance of an experiment, this co-occurrence is termed the intersection, or more formally, the joint probability of $A$ and $B$. This is typically denoted as $P(A \cap B)$.
Independent events
Events A and B depicted as mutually exclusive vs non-mutually exclusive in space Ω
If two events, $A$ and $B$, are truly independent â meaning the occurrence or non-occurrence of one event has absolutely no influence on the occurrence or non-occurrence of the other â then their joint probability is simply the product of their individual probabilities: [29]
$$ P(A \text{ and } B) = P(A \cap B) = P(A)P(B). $$
For instance, if one were to flip two separate coins, the probability of both coins landing on heads is calculated by multiplying the probability of the first coin being heads (which is $\frac{1}{2}$) by the probability of the second coin also being heads (also $\frac{1}{2}$), resulting in $\frac{1}{2} \times \frac{1}{2} = \frac{1}{4}$. [32] A surprisingly simple concept that many still manage to misunderstand in their daily lives.
Mutually exclusive events
Main article: Mutual exclusivity
If two events, $A$ and $B$, possess the inherent characteristic that one or the other can occur, but they can never occur simultaneously within a single trial of an experiment, then they are rigorously defined as mutually exclusive events . They simply cannot coexist in the same outcome.
For two such mutually exclusive events , the probability of both occurring is denoted as $P(A \cap B)$, and by definition, this probability is precisely 0:
$$ P(A \text{ and } B) = P(A \cap B) = 0. $$
Conversely, if two events are mutually exclusive , then the probability of either one occurring (which is the union of the events) is denoted as $P(A \cup B)$. In this specific case, the probability is simply the sum of their individual probabilities, as there is no overlap to account for:
$$ P(A \text{ or } B) = P(A \cup B) = P(A) + P(B) - P(A \cap B) = P(A) + P(B) - 0 = P(A) + P(B). $$
For example, the probability of rolling a 1 or a 2 on a fair six-sided die is calculated by summing the individual probabilities: $P(1 \text{ or } 2) = P(1) + P(2) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6} = \frac{1}{3}$. This is because rolling a 1 and rolling a 2 are distinct and cannot happen simultaneously.
Not (necessarily) mutually exclusive events
When the events under consideration are not necessarily mutually exclusive â meaning there is a possibility that they can occur simultaneously â the calculation of the probability of at least one of them occurring requires a more nuanced approach. Simply adding their individual probabilities would lead to an erroneous double-counting of the outcomes where both events occur. To correct for this, the probability of their intersection must be subtracted. This principle is formally known as the Principle of Inclusion-Exclusion for two events:
$$ P\left(A \text{ or } B\right) = P(A \cup B) = P\left(A\right) + P\left(B\right) - P\left(A \text{ and } B\right). $$
For clarity, using standard set notation, this can be rewritten as:
$$ P\left(A \cup B\right) = P\left(A\right) + P\left(B\right) - P\left(A \cap B\right). $$
Consider a classic example: drawing a single card from a standard deck of 52 playing cards. What is the probability of drawing either a heart or a face card (Jack, Queen, or King)? There are 13 hearts in the deck ($P(\text{Heart}) = \frac{13}{52}$). There are 12 face cards in the deck ($P(\text{Face Card}) = \frac{12}{52}$). However, there are 3 cards that are both hearts and face cards (the Jack of Hearts, Queen of Hearts, and King of Hearts). These 3 cards are included in both the count of hearts and the count of face cards. To avoid double-counting them when calculating the probability of drawing either a heart or a face card, we apply the inclusion-exclusion principle:
$$ \frac{13}{52} + \frac{12}{52} - \frac{3}{52} = \frac{25-3}{52} = \frac{22}{52} = \frac{11}{26}. $$
This principle can be systematically expanded for any number of events that are not necessarily mutually exclusive. For three events, $A$, $B$, and $C$, the formula becomes more involved, demonstrating the increasing complexity as more overlaps are introduced:
$$ \begin{aligned} P\left(A\cup B\cup C\right)=&P\left(\left(A\cup B\right)\cup C\right)\ =&P\left(A\cup B\right)+P\left(C\right)-P\left(\left(A\cup B\right)\cap C\right) \quad (\text{using the 2-event rule for } (A \cup B) \text{ and } C)\ =&P\left(A\right)+P\left(B\right)-P\left(A\cap B\right)+P\left(C\right)-P\left(\left(A\cap C\right)\cup \left(B\cap C\right)\right) \quad (\text{distributing } \cap C \text{ over } \cup)\ =&P\left(A\right)+P\left(B\right)+P\left(C\right)-P\left(A\cap B\right)-\left(P\left(A\cap C\right)+P\left(B\cap C\right)-P\left(\left(A\cap C\right)\cap \left(B\cap C\right)\right)\right) \quad (\text{applying 2-event rule to } (A\cap C) \text{ and } (B\cap C))\ P\left(A\cup B\cup C\right)=&P\left(A\right)+P\left(B\right)+P\left(C\right)-P\left(A\cap B\right)-P\left(A\cap C\right)-P\left(B\cap C\right)+P\left(A\cap B\cap C\right) \end{aligned} $$
It can be observed, then, that this intricate pattern of adding individual probabilities, subtracting probabilities of pairwise intersections, adding probabilities of triple intersections, and so on, can be generalized and repeated for any arbitrary number of events. This is the full power of the Principle of Inclusion-Exclusion, a necessary tool for navigating complex probabilistic scenarios where overlaps are common.
Conditional probability
Conditional probability is a fundamental concept that quantifies the likelihood of some event $A$ occurring, given that some other event $B$ has already occurred, or is known to be true. It represents a refinement of our knowledge, adjusting probabilities based on new information. Conditional probability is formally expressed as $P(A \mid B)$, and is read as “the probability of $A$, given $B$.” It is rigorously defined by the following ratio: [33]
$$ P(A \mid B) = \frac{P(A \cap B)}{P(B)} $$
This definition is valid only when $P(B) > 0$. If $P(B) = 0$, then $P(A \mid B)$ is formally undefined by this particular expression, as division by zero is not permitted. In such a scenario, if $A$ and $B$ are independent, then their joint probability $P(A \cap B) = P(A)P(B) = 0$. However, it is crucial to understand that it is still possible to define a conditional probability for certain zero-probability events in more advanced contexts, for instance, by utilizing the abstract mathematical framework of a Ï-algebra of such events (which might arise when dealing with a continuous random variable where the probability of any single point is zero). [34]
Let’s consider a concrete example: imagine a bag containing 2 red balls and 2 blue balls, totaling 4 balls. Initially, the probability of drawing a red ball is a straightforward $\frac{2}{4} = \frac{1}{2}$. However, once a ball is drawn and not replaced, the composition of the bag changes, and thus the probabilities for subsequent draws are altered. For instance, if a red ball was drawn first, the bag now contains 1 red ball and 2 blue balls. The conditional probability of then drawing another red ball becomes $\frac{1}{3}$. Conversely, if a blue ball was drawn first, the bag would then contain 2 red balls and 1 blue ball. In this modified state, the conditional probability of drawing a red ball next would be $\frac{2}{3}$. This simple illustration demonstrates how new information (the outcome of the first draw) directly updates and refines our understanding of future probabilities.
Inverse probability
Main article: Inverse probability
Within the extensive domain of probability theory and its manifold practical applications, Bayes’ rule serves as a cornerstone, providing a fundamental mechanism for relating the odds of one event, $A_1$, to another event, $A_2$, both before (prior to) and after (posterior to) conditioning on the occurrence of some other event, $B$. The odds of $A_1$ relative to $A_2$ is simply defined as the ratio of their respective probabilities. When one is interested in an arbitrary number of events, $A$, rather than being restricted to just two, Bayes’ rule can be more generally rephrased using a proportionality relationship: “the posterior is proportional to the prior times the likelihood.” This is typically expressed as:
$$ P(A | B) \propto P(A)P(B | A) $$
In this expression, the proportionality symbol ($\propto$) indicates that the left-hand side (the posterior probability of $A$ given $B$) is directly proportional to (i.e., equals a constant multiplied by) the right-hand side ($P(A)$, the prior probability of $A$, multiplied by $P(B | A)$, the likelihood of $B$ given $A$). This relationship holds as $A$ varies, for a fixed or given event $B$ (as detailed by Lee, 2012; Bertsch McGrayne, 2012). This particular form of Bayes’ rule has historical roots extending back to the pioneering work of Laplace (1774) and Cournot (1843); further historical context can be found in Fienberg (2005). It is, in essence, the mathematical framework for rational belief updating, a process humans often attempt, usually with less rigor.
Summary of probabilities
| Event | Probability Not to be confused with probability theory , game theory , graph theory , or statistics .
Probability: The Imperfect Art of Quantifying Uncertainty
The concept of probability forms a fundamental branch of both mathematics and statistics , primarily concerned with the rather ambitious task of numerically describing the likelihood of events occurring. It’s humanity’s attempt to put a number on the unpredictable. The probability of any given event is a value strictly confined between 0 and 1; the larger this number, the more likely the event is to manifest. [note 1] [1] [2] For those who prefer their certainty in more digestible forms, this numerical value is often presented as a percentage, ranging from 0% to a full 100%.
Consider the classic, almost clichĂ©, example of tossing a fair (meaning, completely unbiased) coin. Since the coin is, by definition, fair, the two possible outcomes â “heads” and “tails” â are equally probable. Consequently, the probability of “heads” is precisely equal to the probability of “tails.” And because the universe, in this specific instance, offers no other outcomes, the probability of either “heads” or “tails” occurring is 1/2. This can, of course, be expressed with equal validity as 0.5 or 50%, for those who appreciate variety in their numerical representations.
These foundational concepts have been meticulously formalized into an axiomatic mathematical framework known as probability theory . This theoretical construct is not merely an academic exercise; it is widely and critically applied across a vast spectrum of areas of study . These include, but are certainly not limited to, the rigorous fields of statistics , pure mathematics , the various sciences , the high-stakes world of finance , the strategically complex realm of gambling , the rapidly advancing domains of artificial intelligence and machine learning , the foundational principles of computer science , the competitive strategies explored in game theory , and even the philosophical inquiries into the nature of knowledge and reality. Its profound utility lies in its capacity to facilitate the drawing of informed inferences about the expected frequency of events and, more broadly, to describe the underlying mechanics and predictable regularities that govern the behavior of complex systems . [3c It’s a tool for seeing patterns in chaos, or at least pretending to.
Etymology
See also: History of probability § Etymology , and Glossary of probability and statistics Further information: Likelihood
The very word “probability” derives from the Latin probabilitas. This Latin root, however, carried a fascinating and somewhat divergent meaning: “probity.” In historical Europe, particularly within the context of legal proceedings, “probity” referred to a measure of the authority or credibility attributed to a witness in a legal case . This perceived trustworthiness was, rather predictably, often correlated with the witness’s social standing or nobility . A noble’s testimony, it seems, inherently possessed more probabilitas than that of a commoner, regardless of the actual facts.
This historical etymology reveals a stark contrast with the modern understanding of probability. The classical sense was deeply entwined with subjective human judgment, moral character, and social hierarchy. It was a measure of believability based on who was speaking. In direct opposition to this, the contemporary meaning of probability, as employed in scientific and mathematical discourse, functions as an objective, quantifiable measure of the weight of empirical evidence . It is a concept meticulously arrived at through systematic processes of inductive reasoning and rigorous statistical inference , utterly detached from the speaker’s social standing. [4] It seems we’ve, thankfully, moved past judging truth by noble lineage.
Interpretations
Main article: Probability interpretations
When one grapples with random experiments âthat is, experiments that are both inherently random and meticulously well-defined âwithin a purely theoretical setting (such as the classic thought experiment of tossing a coin), probabilities can be straightforwardly quantified. This is achieved by taking the number of desired outcomes and dividing it by the total number of all possible outcomes. This particular method yields what is known as “theoretical probability,” a concept distinct from empirical probability , which, by its nature, deals with probabilities derived from observations in actual, real-world experiments. As always, the resultant probability remains a number strictly bounded between 0 and 1; a higher value invariably indicates a greater predisposition for the desired outcome to occur.
To illustrate, consider the seemingly simple act of tossing a coin twice. The complete set of all possible outcomes, the sample space , would be: “head-head,” “head-tail,” “tail-head,” and “tail-tail.” The probability of obtaining the specific outcome of “head-head” is clearly 1 out of these 4 equally likely outcomes, which translates numerically to 1/4, or 0.25, or even 25% for those who prefer percentages. Conversely, the probability of achieving at least one head would encompass three of these four outcomes (“head-head,” “head-tail,” “tail-head”), yielding a probability of 3 out of 4, or 0.75. This event is, naturally, significantly more likely to occur than the specific “head-head” scenario.
However, once we leave the pristine, theoretical sandbox and venture into the messy, practical application of probability, a fundamental divergence in philosophical approach becomes apparent. There are two major, often competing, categories of probability interpretations , each with its own adherents who hold distinct views regarding the fundamental nature of probability itself:
Objectivists are those who endeavor to assign numerical probabilities as descriptions of some inherent, objective, or physical state of affairs in the world. The most prevalent form of objective probability is frequentist probability . This interpretation posits that the probability of a random event is, in essence, the relative frequency of that event’s occurrence when the underlying experiment is repeated an indefinite number of times under identical conditions. It fundamentally views probability as the limiting relative frequency observed “in the long run” â a concept that, while mathematically sound, can feel frustratingly asymptotic to human observers. [5c A related, yet distinct, modification is propensity probability , which interprets probability not as a frequency of past events, but as an inherent tendency or disposition of a particular experiment or system to yield a certain outcome, even if the experiment is performed only once. It’s less about what has happened and more about what is capable of happening.
Subjectivists , conversely, assign numerical probabilities not to external states, but as a representation of a degree of belief held by an individual. [6] This “degree of belief” has been ingeniously, if somewhat provocatively, interpreted as “the price at which you would buy or sell a bet that pays 1 unit of utility if E, 0 if not E.” [7] This economic framing, while insightful for some, is not universally accepted as the sole or definitive interpretation of subjective belief. [8] The most widely adopted form of subjective probability is Bayesian probability . This powerful framework allows for the systematic integration of existing “expert knowledge” (which is formalized as a subjective prior probability distribution ) with new experimental data (captured by a likelihood function ). The product of this prior belief and the likelihood, once mathematically normalized, yields a posterior probability distribution . This posterior distribution embodies all the information known to date, representing an updated and refined degree of belief. [9] A key consequence, articulated by Aumann’s agreement theorem , is that Bayesian agents who begin with sufficiently similar prior beliefs will, upon sharing information, eventually converge towards similar posterior beliefs. However, and this is a critical observation, if their initial priors are sufficiently different, they can lead to persistently divergent conclusions, regardless of how much information the agents subsequently share. [10] It seems even the most rigorous logic struggles to overcome the stubbornness of initial assumptions.
History
Main article: History of probability Further information: History of statistics
The scientific and systematic study of probability is, when viewed against the vast timeline of mathematics , a relatively recent intellectual endeavor. While the enduring human penchant for gambling undeniably demonstrates a long-standing, if often intuitive, interest in quantifying notions of chance throughout recorded history , precise mathematical descriptions and a formal theoretical framework emerged considerably later. There are, in retrospect, several compelling reasons for this rather slow development of the mathematics of probability. Even as games of chance provided a powerful impetus for the mathematical study of probability, many fundamental issues [note 2c remained stubbornly obscured by pervasive superstitions and a general resistance to rigorous, systematic thought. [11] One might suggest that for centuries, humanity preferred to attribute unpredictable outcomes to divine whim or fickle luck rather than to the cold logic of numbers.
According to the incisive analysis of Richard Jeffrey , “Before the middle of the seventeenth century, the term ‘probable’ (Latin probabilis) meant approvable, and was applied in that sense, univocally, to opinion and to action. A probable action or opinion was one such as sensible people would undertake or hold, in the circumstances.” [12] This observation highlights a profound semantic and conceptual shift. The original meaning of “probable” was rooted in social consensus, good judgment, and what was considered reasonable behavior, not in objective numerical likelihood. Nevertheless, it is worth noting that in specific legal contexts of the era, “probable” could also be applied to propositions for which there existed strong supporting evidence, foreshadowing its eventual scientific meaning. [13]
Gerolamo Cardano (16th century) Christiaan Huygens published one of the first books on probability (17th century).
It was the brilliant, if somewhat unconventional, sixteenth-century Italian polymath Gerolamo Cardano who first effectively demonstrated the utility of defining odds as a straightforward ratio of favorable outcomes to unfavorable outcomes. This seemingly simple yet deeply insightful approach inherently implied that the probability of an event could be rigorously determined by the ratio of favorable outcomes to the total number of all possible outcomes. [14] Cardano’s work, though foundational, remained somewhat isolated for a time.
The true genesis of the doctrine of probabilities as a distinct mathematical discipline is widely attributed to the seminal correspondence between the eminent French mathematicians Pierre de Fermat and Blaise Pascal in 1654. Their exchange, ignited by a problem related to the fair division of stakes in an unfinished game of chance, provided the critical spark. Following closely on their heels, Christiaan Huygens (1657) produced what is recognized as the earliest comprehensive and scientific treatment of the subject, systematically laying out its principles. [15] The field gained further mathematical maturity with the posthumous publication of Jakob Bernoulli ’s monumental Ars Conjectandi in 1713, and Abraham de Moivre ’s equally significant The Doctrine of Chances in 1718. Both works firmly established probability as a legitimate and robust branch of mathematics . [16] For those interested in the intricate historical tapestry of how the very concept of mathematical probability emerged, the works of Ian Hacking ’s The Emergence of Probability [4] and James Franklin’s The Science of Conjecture [17] offer invaluable insights.
The theoretical underpinnings of the theory of errors can be traced, albeit somewhat indirectly, to Roger Cotes ’s Opera Miscellanea, which saw posthumous publication in 1722. However, it was a pivotal memoir meticulously prepared by Thomas Simpson in 1755 (and subsequently printed in 1756) that marked the first systematic application of this burgeoning theory to the practical and crucial discussion of errors inherent in observational data. [18] A subsequent reprint of this significant memoir in 1757 explicitly articulated foundational axioms: specifically, that positive and negative errors are considered equally probable, and that certain assignable limits definitively delineate the entire range of all potential errors. Simpson’s pioneering work also ventured into the realm of continuous errors, providing an early conceptual description of what would later become known as a probability curve.
The initial two highly influential laws governing the distribution of errors were both formulated by the brilliant Pierre-Simon Laplace . His first law, published in 1774, proposed that the frequency of an error could be precisely expressed as an exponential function of the numerical magnitude of the error, without regard to its sign. The second law of error, introduced by Laplace in 1778, asserted that the frequency of the error is an exponential function of the square of the error. This second formulation has since become famously known as the normal distribution or, somewhat contentiously, the Gauss law. As was dryly noted at the time, “It is difficult historically to attribute that law to Gauss, who in spite of his well-known precocity had probably not made this discovery before he was two years old.” [19] A rather pointed way to suggest that historical credit should be accurately assigned, even among geniuses.
Daniel Bernoulli (1778) further enriched the field by introducing the principle of the maximum product of the probabilities, specifically in the context of a system of concurrent errors, adding a layer of sophistication to error analysis.
Carl Friedrich Gauss
Adrien-Marie Legendre (1805) made a significant and independent stride by developing the method of least squares , which he unveiled in his influential work, Nouvelles mĂ©thodes pour la dĂ©termination des orbites des comĂštes (New Methods for Determining the Orbits of Comets). [20] Remarkably, and without apparent knowledge of Legendre’s prior contribution, an Irish-American writer named Robert Adrain , then serving as editor of “The Analyst” (1808), independently deduced the fundamental law of facility of error. He articulated this law as:
$$ \phi (x)=ce^{-h^{2}x^{2}} $$
where $h$ is a constant directly indicative of the precision of observation, and $c$ is a scale factor meticulously chosen to ensure that the total area under the curve precisely equals 1. Adrain provided two distinct proofs for his derivation, with the second bearing a striking resemblance to the proof later offered by John Herschel in 1850. [ citation needed ] Carl Friedrich Gauss subsequently presented the first proof of this law that gained widespread recognition across Europe (making it the third known proof after Adrain’s), publishing his findings in 1809. Over the ensuing decades, further proofs and refinements were contributed by a host of mathematicians, including Laplace (1810, 1812), Gauss again (1823), James Ivory (1825, 1826), Hagen (1837), Friedrich Bessel (1838), W.F. Donkin (1844, 1856), and Morgan Crofton (1870). Other notable figures who enriched this field included Robert Leslie Ellis (1844), Augustus De Morgan (1864), James Whitbread Lee Glaisher (1872), and Giovanni Schiaparelli (1875). Christian August Friedrich Peters ’s (1856) formula [ clarification needed c for $r$, representing the probable error of a single observation, remains a widely recognized result.
Throughout the nineteenth century, a multitude of authors contributed significantly to the general theory of probability. Among the prominent figures were Laplace himself, Sylvestre Lacroix (1816), Littrow (1833), Adolphe Quetelet (1853), Richard Dedekind (1860), Helmert (1872), Hermann Laurent (1873), Liagre, Didion, and Karl Pearson . The exposition and logical rigor of the theory were substantially enhanced and clarified by the profound contributions of Augustus De Morgan and George Boole , whose work laid