← Back to home

Phonetics

For the study of phonemes, or how languages organize sounds, one might consult Phonology – a distinctly different, though often confused, domain. For the method of teaching reading and writing, which, let's be frank, some of you could use, see Phonics. For other uses, because obviously, this isn't complicated enough, see Phonetics (disambiguation).

Articulation

Places of articulation

Manners of articulation

Airstream mechanisms

Acoustics

Phonation (Voicing)

Phonation types

Perception

Theories of speech perception

Linguistics

General linguistics

Applied linguistics

Theoretical frameworks

Topics

Portal

  • *
    • v
  • t
  • e

Phonetics is a branch of linguistics that, for reasons unfathomable to some, dedicates itself to the exhaustive study of how humans manage to produce and perceive sounds. Or, in the case of sign languages, the equivalent aspects of physical articulation and visual interpretation. It's a truly exhausting endeavor, if you ask me, cataloging every grunt, sigh, and precisely formed utterance. Linguists who specialize in meticulously documenting the physical properties of human speech — or perhaps, more accurately, human noise — are known as phoneticians. This field, a sprawling testament to humanity's enduring fascination with its own noises, is traditionally segmented into three rather distinct, yet interconnected, sub-disciplines: articulatory phonetics, which examines how sounds are physically made; acoustic phonetics, which scrutinizes the sound waves themselves; and auditory phonetics, which delves into how those waves are perceived.

Traditionally, the minimal linguistic unit that phoneticians concern themselves with is the phone—a distinct speech sound in a language. This is often, and incorrectly, conflated with the phonological unit of a phoneme. To be excruciatingly clear, a phoneme is an abstract, conceptual categorization of phones, defined as the smallest unit capable of discerning meaning between sounds in any given language. Think of phones as the raw material, and phonemes as the structural blueprints.

Phonetics, at its core, grapples with two fundamental aspects of human speech: its production (the intricate ways humans conjure sounds from their biological machinery) and its perception (the complex mechanisms by which that speech is subsequently understood, or, more often, misunderstood). The communicative modality of a language dictates the specific method through which a language is both generated and interpreted. Languages employing oral-aural modalities, such such as English, produce speech by manipulating the vocal tract and perceive it aurally, through the ears. Conversely, sign languages, like Australian Sign Language (Auslan) and American Sign Language (ASL), operate on a manual-visual modality. Here, "speech" is produced manually, through precise movements of the hands and body, and perceived visually. Furthermore, ASL and certain other sign languages even possess a manual-manual dialect, specifically designed for tactile signing by deafblind speakers. In these instances, signs are both produced and perceived through touch, adding yet another layer of complexity to the human communication spectrum.

History

Antiquity

The earliest known systematic inquiry into phonetics wasn't by some grand Western philosopher, but rather by Sanskrit grammarians, who, with an impressive foresight, began their studies as far back as the 6th century BCE. Among these pioneering investigators, the Hindu scholar Pāṇini stands out. His monumental four-part grammar, meticulously penned around 350 BCE, remains a foundational text in modern linguistics. It is still lauded today as "the most complete generative grammar of any language yet written," a rather humbling achievement considering its age. Pāṇini's work laid the groundwork for many modern linguistic principles, offering detailed descriptions of several crucial phonetic concepts, including the elusive notion of voicing. This ancient account astutely characterized resonance as being generated either by tone, which occurs when the vocal folds are closed and vibrating, or by noise, which is produced when the vocal folds are open and air flows freely. The phonetic principles embedded within his grammar are considered "primitives," serving as the bedrock for his theoretical analysis rather than being the direct subjects of that analysis themselves. Nevertheless, these profound principles can be logically inferred and reconstructed from his intricate system of phonology.

The Sanskrit discipline dedicated to the study of phonetics is known as Shiksha. The 1st-millennium BCE Taittiriya Upanishad, a text of profound philosophical and spiritual significance, offers a concise yet comprehensive definition of this field:

Om! We will explain the Shiksha. Sounds and accentuation, Quantity (of vowels) and the expression (of consonants), Balancing (Saman) and connection (of sounds), So much about the study of Shiksha. || 1 | Taittiriya Upanishad 1.2, Shikshavalli, translated by Paul Deussen.

This ancient text highlights not only the basic elements of sound but also the crucial aspects of prosody and their interaction, demonstrating an understanding that goes far beyond simple articulation.

Modern

Following the remarkable early insights of Pāṇini and his contemporaries, significant advancements in phonetics remained largely stagnant for millennia. A few limited investigations by Greek and Roman grammarians provided minor contributions, but the primary focus during this vast interval shifted, unfortunately, away from the nuanced distinction between spoken and written language—a critical driving force behind Pāṇini's meticulous account. Instead, the focus narrowed, concentrating almost exclusively on the isolated physical properties of speech.

A sustained, renewed interest in phonetics didn't truly re-emerge until around 1800 CE, marking the dawn of the modern era of the discipline. The term "phonetics" itself, in its current academic usage, was first recorded in 1841. This period of reawakening coincided, rather conveniently, with significant developments in medicine and the advent of novel audio and visual recording technologies. These innovations provided phoneticians with unprecedented access to new, far more detailed data, allowing for deeper insights into the ephemeral nature of speech.

This early modern period saw the development of influential tools, such as a phonetic alphabet based on articulatory positions by Alexander Melville Bell. His system, famously known as visible speech, gained considerable prominence as a pedagogical tool, particularly in the oral education of deaf children. It was an attempt to make the invisible mechanics of speech tangible, a noble, if inherently limited, endeavor.

Before the widespread availability of reliable audio recording equipment, phoneticians were forced to rely heavily on a tradition of practical phonetic training. This was deemed essential to ensure that transcriptions and findings remained consistent across different researchers—a monumental task given the subjective nature of human perception. This rigorous training encompassed two critical components: ear training, which involved developing an acute ability to recognize and differentiate speech sounds, and production training, which demanded the capacity to accurately produce those very sounds. Phoneticians were expected to master the recognition of various sounds on the International Phonetic Alphabet (IPA) by ear. Indeed, the IPA still, to this day, tests and certifies speakers on their ability to accurately produce the phonetic patterns of English, though they have, perhaps wisely, discontinued this practice for other languages, acknowledging the sheer scope of such an undertaking.

As a refinement of his visible speech method, Melville Bell also developed a descriptive system for vowels based on their height and backness, which ultimately led to the concept of 9 cardinal vowels. As part of their practical phonetic training, aspiring phoneticians were expected to learn to produce these cardinal vowels with exacting precision. The idea was that these fixed points would serve as stable articulatory anchors, allowing phoneticians to calibrate their perception and transcription of other, more variable phones during demanding fieldwork. However, this approach faced a significant critique from Peter Ladefoged in the 1960s. His experimental evidence suggested that cardinal vowels were primarily auditory rather than strictly articulatory targets, effectively challenging the long-held claim that they represented immutable articulatory anchors by which all other articulations could be reliably judged. The human ear, it seems, is a more complicated instrument than once thought.

Production

Main article: Language production

Language production is not merely a simple utterance but a complex, multi-stage process, consisting of several interdependent cognitive and motor processes. These processes collectively transform an abstract, nonlinguistic message—a thought, an intention—into a concrete, spoken or signed linguistic signal. There's an ongoing, rather fervent, debate among linguists regarding the precise temporal organization of language production. Do these processes unfold in a rigid, sequential series of stages (known as serial processing), or do they occur, at least partially, in parallel, overlapping and influencing each other? The answer, as is often the case in these matters, is likely "it's complicated."

After a speaker has identified the specific message they intend to convey and initiate the process of linguistic encoding, the first concrete step involves selecting the individual words—more formally termed lexical items—that will best represent that message. This crucial process is known as lexical selection. The chosen words are not arbitrary; they are meticulously selected based on their meaning, which in linguistics is referred to as semantic information. This lexical selection then activates the word's lemma, an abstract mental representation that encapsulates both the semantic content and the grammatical properties of the word, but crucially, not yet its sound.

Once an utterance has been, at least partially, planned, it then progresses into the phonological encoding stage. In this phase of language production, the abstract mental representation of the selected words is assigned its concrete phonological content. This manifests as a precise sequence of phonemes that are destined to be produced. These phonemes are not just abstract symbols; they are specified for a detailed set of articulatory features. These features denote particular motor goals, such as the precise timing of lip closure or the exact positioning of the tongue within the oral cavity. Subsequently, these meticulously specified phonemes are coordinated into a sophisticated sequence of muscle commands. These commands are then dispatched to the relevant muscles of the vocal tract, and, assuming they are executed properly (a non-trivial assumption given human fallibility), the intended sounds are finally produced.

Thus, the entire convoluted process of production, from the initial glimmer of a message to its audible manifestation as sound, can be summarized as the following sequence, a rather linear depiction of what is, in reality, a chaotic dance of neurons and muscles:

  • Message planning (The intent, the thought, the cosmic sigh)
  • Lemma selection (Choosing the right word from the vast, messy mental lexicon)
  • Retrieval and assignment of phonological word forms (Giving the chosen words their auditory clothing)
  • Articulatory specification (Translating abstract sounds into concrete muscle movements)
  • Muscle commands (Sending the orders to the biological machinery)
  • Articulation (The actual, messy act of making sound)
  • Speech sounds (The resulting noise, hopefully conveying the original message)

Place of articulation

Main article: Place of articulation

Sounds that are formed by a complete or partial constriction of the vocal tract are aptly termed consonants. These sounds are primarily produced within the vocal tract, most often within the mouth, and the precise location of this constriction fundamentally alters the resulting acoustic output. Given the intimate and non-negotiable connection between the position of the tongue, lips, or other articulators and the sound that ultimately emerges, the place of articulation stands as a paramount concept across all subdisciplines of phonetics. It's where the action happens, quite literally.

Sounds are, in part, categorized by the specific location where a constriction is formed, as well as by the particular part of the body actively involved in creating that constriction. Consider, for instance, the English words "fought" and "thought." These constitute a classic minimal pair, meaning they differ by only a single phonetic element. In this case, the difference lies not in the location of the constriction, but rather in the organ making the constriction. The "f" sound in "fought" is a labiodental articulation, formed by the lower lip pressing against the upper teeth. In stark contrast, the "th" sound in "thought" is a linguodental articulation, produced by the tongue making contact with the teeth. Constrictions primarily involving the lips are generically referred to as labials, while those involving the tongue are, somewhat predictably, termed lingual.

Constrictions made with the tongue, being the most agile and versatile articulator, can occur at various points within the vocal tract. These are broadly classified into three primary categories: coronal, dorsal, and radical places of articulation. Coronal articulations are executed using the front part of the tongue—the tip or blade. Dorsal articulations employ the back of the tongue, the body, against the roof of the mouth. Finally, radical articulations are produced deep within the pharynx, often involving the tongue root or epiglottis. However, these broad divisions, while useful, are often insufficient for precisely distinguishing and describing the full spectrum of human speech sounds. For example, in English, both the [s] and [ʃ] sounds (as in "sip" and "ship") are classified as coronal, yet they are clearly produced in distinct areas of the mouth. To account for such subtle but crucial differences, phoneticians require more granular and detailed categories of place of articulation, based on the specific region of the mouth where the primary constriction occurs.

Labial

Articulations that involve the lips, those rather expressive facial features, can be formed in three primary configurations. These include sounds made with both lips (bilabial), those produced with one lip and the teeth—specifically, the lower lip acting as the active articulator against the upper teeth (thus, labiodental), and, less commonly, those involving the tongue and the upper lip (linguolabial). Depending on the specific definitional criteria employed, some or all of these articulations may be grouped under the broader umbrella of labial articulations.

Bilabial consonants are, as their name suggests, formed using both lips. In the production of these sounds, the lower lip typically performs the greatest displacement to meet the upper lip, which, contrary to popular intuition, also exhibits a slight active downward movement. However, in rapid speech or with strong airflow, the force of air moving through the aperture (the opening between the lips) can sometimes cause the lips to separate faster than their muscular action might otherwise dictate. Unlike most other articulations, where one articulator is often a hard surface like teeth or the palate, both articulators in bilabial stops are composed of soft tissue. This anatomical characteristic means that bilabial stops are statistically more prone to being produced with incomplete closures than articulations involving rigid surfaces. Furthermore, bilabial stops are somewhat anomalous in that an articulator from the upper section of the vocal tract—the upper lip—demonstrates active downward movement, rather than remaining purely passive.

Linguolabial consonants are produced by the blade of the tongue approaching or making contact with the upper lip. Similar to bilabial articulations, the upper lip displays a minor movement towards the more active lingual articulator. Interestingly, articulations within this group do not possess their own dedicated symbols in the International Phonetic Alphabet; instead, they are represented by combining an apical symbol with a diacritic, implicitly categorizing them within the coronal class. These sounds are not universally common, but they are attested in a number of languages indigenous to Vanuatu, such as Tangoa.

Labiodental consonants are formed by the lower lip ascending to make contact with the upper teeth. These consonants most frequently manifest as fricatives, where air is forced turbulently through a narrow constriction, though labiodental nasals are also typologically common across the world's languages. There is an ongoing debate among phoneticians as to whether truly distinct labiodental plosives occur in any natural language. While the existence of such sounds is contentious, a number of languages have been reported to feature labiodental plosives, including Zulu, Tonga, and Shubi. The very debate underscores the subtle and often ambiguous nature of phonetic classification.

Coronal

Coronal consonants are those produced using the highly agile tip or blade of the tongue. Due to the inherent flexibility and dexterity of the front of the tongue, this category encompasses a vast array of sounds, distinguished not only by their specific place of articulation but also by the nuanced posture of the tongue itself. The coronal places of articulation define the regions of the mouth where the tongue makes contact or creates a constriction. These include the dental, alveolar, and post-alveolar locations.

Tongue postures specifically utilizing the tip of the tongue can be further classified: they are apical if the very top of the tongue tip is employed; laminal if the broader blade of the tongue is used; or sub-apical if the tongue tip is curled backward, bringing the underside of the tongue into play. Coronals are unique as a group in their remarkable versatility, as every single manner of articulation (from stops to fricatives to approximants) is attested within this category. Australian languages are particularly renowned for their extensive and complex systems of coronal contrasts, exhibiting a rich diversity both within and across the languages of the region.

Dental consonants are formed by the tip or blade of the tongue making contact with the upper teeth. These are typically subdivided into two groups based on the precise part of the tongue used: apical dental consonants are produced with the tongue tip touching the teeth, while interdental consonants are produced with the blade of the tongue, often with the tongue tip protruding slightly in front of the teeth. No known language employs both types contrastively, though they may exist allophonically (as non-meaning-distinguishing variants) within a single language. Alveolar consonants are produced with the tip or blade of the tongue at the alveolar ridge, the bony ridge located just behind the upper teeth. Like dental consonants, these can similarly be either apical or laminal.

Cross-linguistically, dental consonants and alveolar consonants are frequently contrasted, leading to a number of fascinating generalizations about global phonetic patterns. The different places of articulation tend to also be distinguished by the part of the tongue used for their production: most languages featuring dental stops employ laminal dentals, whereas languages with alveolar stops typically utilize apical alveolar stops. It is exceedingly rare for a language to maintain two consonants at the exact same place of articulation that contrast solely in their laminality versus apicality, though Taa (ǃXóõ) stands as a notable counterexample to this pattern, defying neat categorization. Furthermore, if a language possesses only one type of stop—either a dental stop or an alveolar stop—it will generally be laminal if it is dental, and usually apical if it is alveolar. However, languages such as Temne and Bulgarian inconveniently refuse to follow this neat generalization. If a language does indeed feature both an apical and a laminal stop, the laminal stop is more frequently affricated (meaning it has a fricative release), as observed in Isoko. Yet, Dahalo presents the inverse pattern, with its alveolar stops exhibiting greater affrication. It seems human speech, much like humans themselves, thrives on exceptions.

Retroflex consonants are a particularly intriguing and somewhat debated category, with several different definitions vying for prominence depending on whether the emphasis is placed on the precise position of the tongue or the contact point on the roof of the mouth. Generally speaking, they represent a class of articulations in which the tip of the tongue is curled upwards to some discernible degree. Consequently, retroflex articulations can manifest at various locations along the roof of the mouth, including the alveolar, post-alveolar, and even palatal regions. If the underside of the tongue tip makes contact with the roof of the mouth, the articulation is specifically termed sub-apical. However, apical post-alveolar sounds are also frequently described as retroflex. Classic examples of sub-apical retroflex stops are commonly found in Dravidian languages, where they form a robust phonemic contrast. In some languages indigenous to the southwest United States, the subtle but contrastive difference between dental and alveolar stops is, in fact, attributed to a slight retroflexion of the alveolar stop. Acoustically, retroflexion typically has a noticeable effect on the higher formants of the speech signal, providing a measurable acoustic signature for this unique tongue posture.

Articulations that occur just behind the alveolar ridge, known as post-alveolar consonants, have been burdened with a confusing array of different terminologies over the years. Apical post-alveolar consonants are often, and somewhat loosely, referred to as retroflex, blurring the lines of classification. Meanwhile, laminal post-alveolar articulations are sometimes called palato-alveolar, emphasizing their dual contact points. In the specialized literature concerning Australian languages, these laminal stops are frequently described as 'palatal,' despite being produced further forward in the mouth than the region typically and prototypically considered palatal. This terminological variation highlights the challenges of standardizing phonetic descriptions across diverse linguistic traditions. Furthermore, due to inherent individual anatomical variations, the precise articulation of palato-alveolar stops—and indeed, coronals in general—can exhibit considerable variability even within a single speech community, making precise classification a perpetually moving target.

Dorsal

Dorsal consonants are those produced using the body of the tongue, rather than its tip or blade. These articulations typically occur at the palate, velum, or uvula, utilizing the broad surface of the tongue's back.

Palatal consonants are formed by pressing the body of the tongue against the hard palate, which forms the rigid roof of the mouth. These sounds are frequently contrasted with velar or uvular consonants. However, it is remarkably rare for a language to maintain a phonemic contrast among all three simultaneously, with Jaqaru being cited as a possible, intriguing example of such a three-way distinction.

Velar consonants are produced by the body of the tongue making contact or close approximation with the velum, or soft palate, located at the back of the roof of the mouth. These sounds are incredibly common across the world's languages; it is a near-universal truth that almost all languages possess at least one velar stop. Because both velars and vowels involve the body of the tongue as a primary articulator, they are highly susceptible to the effects of coarticulation with neighboring vowels. This means their precise articulation can shift considerably, being produced as far forward as the hard palate or as far back as the uvula, depending on the surrounding phonetic context. These contextual variations are typically categorized into front, central, and back velars, mirroring the traditional classification of the vowel space. They can often be deceptively difficult to distinguish phonetically from palatal consonants, though velars are generally produced slightly behind the area of prototypical palatal consonants.

Uvular consonants are formed by the body of the tongue contacting or closely approaching the uvula, the small fleshy appendage hanging at the very back of the soft palate. These sounds are comparatively rare, estimated to occur in only about 19 percent of the world's languages. Large geographical regions, particularly in the Americas and Africa, exhibit no languages with uvular consonants whatsoever. In languages that do feature uvular consonants, stops are the most frequently attested type, followed by continuants, a category that includes nasals and fricatives. One might wonder why such a seemingly inconvenient place of articulation persists.

Pharyngeal and laryngeal

Consonants produced by constrictions deep within the throat are termed pharyngeals, while those formed by a constriction within the larynx itself are known as laryngeals. Laryngeals are, by definition, created using the vocal folds, as the larynx is situated too far down the throat to be reached by the tongue. Pharyngeals, however, are sufficiently close to the oral cavity that certain parts of the tongue, specifically the tongue root, can indeed reach and manipulate them.

Radical consonants are those that employ either the root of the tongue or the epiglottis during their production, occurring at the very farthest reaches of the vocal tract. Pharyngeal consonants are produced by retracting the root of the tongue far enough into the throat to nearly touch the posterior wall of the pharynx. Due to the inherent physiological difficulties and constraints in this deep region, only fricatives and approximants can typically be produced in this manner. Epiglottal consonants are formed using the epiglottis in conjunction with the back wall of the pharynx. Instances of epiglottal stops have been documented, notably in Dahalo. It's generally considered physiologically impossible to produce voiced epiglottal consonants because the cavity between the glottis and the epiglottis is simply too diminutive to permit sustained voicing.

Glottal consonants are those produced exclusively using the vocal folds within the larynx. Since the vocal folds are the primary source of phonation and are situated below the oro-nasal vocal tract, a number of theoretically possible glottal consonants are, in practice, impossible to produce—for example, a voiced glottal stop. However, three glottal consonants are indeed possible: a voiceless glottal stop and two glottal fricatives, all of which are attested in various natural languages. Glottal stops, produced by a complete closure of the vocal folds, are remarkably common across the world's languages, a testament to their utility in speech. While many languages employ them primarily to demarcate phrase boundaries or as allophones, some languages, such as Arabic and Huatla Mazatec, elevate them to the status of contrastive phonemes, where their presence or absence changes word meaning. Additionally, in some contexts, glottal stops can be realized as laryngealization of the subsequent vowel. It is also worth noting that glottal stops, particularly when occurring between vowels, frequently do not achieve a complete, sustained closure. True, sustained glottal stops are more typically observed when they are geminated (doubled).

The larynx

Further information: Larynx

Ah, the larynx. Commonly, and rather simplistically, known as the "voice box," this intricate cartilaginous structure nestled within the trachea is, in fact, the primary orchestrator of phonation. Within this delicate apparatus, the vocal folds (often mistakenly referred to as "chords") are either held together in a precise configuration that allows them to vibrate, or held apart to prevent vibration. The subtle, yet profoundly impactful, positions of these vocal folds are meticulously achieved through the minute movements of the arytenoid cartilages. The intrinsic laryngeal muscles are the unsung heroes here, responsible not only for manipulating the arytenoid cartilages but also for finely modulating the tension of the vocal folds themselves. If the vocal folds are not positioned sufficiently close or tense enough, they will either vibrate sporadically, resulting in phenomena like creaky or breathy voice depending on the degree of looseness, or they will fail to vibrate at all, leading to voicelessness.

Beyond the precise positioning of the vocal folds, a critical prerequisite for their vibration is the continuous flow of air across them. Without this aerodynamic force, they remain stubbornly silent. The minimal difference in pressure across the glottis (the space between the vocal folds) required to initiate and sustain voicing is estimated to be a mere 1–2 cm H₂O (equivalent to approximately 98.0665–196.133 Pascals). This pressure differential can drop below the threshold required for phonation for two main reasons: either an increase in pressure above the glottis (known as supraglottal pressure) or a decrease in pressure below the glottis (subglottal pressure). The vital subglottal pressure is diligently maintained by the coordinated action of the respiratory muscles. Supraglottal pressure, in the absence of any constrictions or articulations, is roughly equivalent to atmospheric pressure. However, because articulations—especially consonants—inherently involve constrictions of the airflow, the pressure within the cavity behind these constrictions can build up significantly, leading to a higher supraglottal pressure. This delicate balance of pressures is what allows for the complex interplay of voiced and voiceless sounds that characterize human speech.

Lexical access

According to the prevailing lexical access model, also known as the two-stage theory of lexical access, the process of retrieving words from our mental lexicon involves two distinct, yet sequential, stages of cognitive processing. The first stage, termed lexical selection, is responsible for furnishing the information about lexical items that is necessary to construct what is known as the functional-level representation of an utterance. During this phase, words are retrieved based on their specific semantic (meaning) and syntactic (grammatical) properties. Crucially, their phonological forms—that is, how they sound—are not yet made available at this initial stage. The second stage, referred to as the retrieval of wordforms, then provides the specific information required for building the positional-level representation of the utterance. This is where the abstract semantic and syntactic information is finally mapped onto concrete phonological sequences, preparing the word for articulation. This staged approach helps to explain various speech errors and how our brains manage the immense complexity of transforming thoughts into spoken language.

Articulatory models

When the human apparatus is engaged in the act of producing speech, the various articulators—lips, tongue, jaw, velum, and so on—execute intricate movements, often making contact with specific locations within the vocal tract. This dynamic interplay directly results in measurable changes to the acoustic signal that we perceive as speech. Some sophisticated models of speech production take these physical movements as their foundational premise, aiming to model articulation within a defined coordinate system. This system may be intrinsic to the body, mapping the positions and angles of internal joints, or extrinsic, referring to external spatial coordinates.

Intrinsic coordinate systems, for instance, typically model the movement of articulators as precise positions and angles of joints within the body. Models focusing on the jaw, a relatively simple articulator, often employ two to three degrees of freedom to represent its translation and rotation. However, these models encounter significant challenges when attempting to accurately represent the tongue. Unlike the rigid, jointed structures of the jaw or limbs, the tongue is a muscular hydrostat—a marvel of biological engineering akin to an elephant's trunk or an octopus's arm—which fundamentally lacks joints. This unique physiological structure means that while movements of the jaw tend to follow relatively straight lines during both speech and mastication, the movements of the tongue are inherently more complex and typically trace curvilinear paths, defying simple linear modeling.

The observation of these curvilinear movements has been used to argue that articulations might be planned in extrinsic rather than intrinsic space. However, it's important to note that extrinsic coordinate systems are not limited to physical spatial coordinates; they can also encompass acoustic coordinate spaces. Models that posit that movements are planned in extrinsic space, whether physical or acoustic, inevitably encounter what is known as the inverse problem. This problem arises from the challenge of explaining how a specific observed path or acoustic signal can be uniquely mapped back to the underlying muscle and joint configurations that produced it. For example, the human arm, with its seven degrees of freedom and 22 muscles, can achieve the same final position through a multitude of different joint and muscle configurations. This one-to-many mapping applies equally to models of planning in extrinsic acoustic space: there is no single, unique mapping from desired physical or acoustic targets to the precise muscle movements required to achieve them. Yet, concerns about the severity of the inverse problem in speech may be somewhat exaggerated, given that speech is an extraordinarily highly learned skill, intricately managed by neurological structures that have, over evolutionary time, become exquisitely adapted for this very purpose.

The equilibrium-point model offers a compelling potential resolution to the vexing inverse problem. It proposes that movement targets are not represented as static positions, but rather as the equilibrium point of the forces exerted by opposing muscle pairs acting on a joint. Crucially, muscles within this model are conceptualized as springs, and the target position is effectively the equilibrium point for this idealized spring-mass system. By utilizing this spring-like analogy, the equilibrium-point model can elegantly account for compensatory movements and rapid responses when planned articulations are disrupted or perturbed. These are considered coordinate models because they fundamentally assume that these complex muscle positions are represented as specific points in space—the aforementioned equilibrium points—where the spring-like actions of the muscles converge.

Gestural approaches to speech production propose a fundamentally different perspective, arguing that articulations are not represented as particular coordinates to be hit, but rather as dynamic movement patterns. The minimal unit in these models is a "gesture," which is defined as a group of "functionally equivalent articulatory movement patterns that are actively controlled with reference to a given speech-relevant goal (e.g., a bilabial closure)." These 'gestures' are understood to represent coordinative structures or "synergies"—task-dependent groupings of muscles that function collaboratively as a single, integrated unit, rather than as individual, independently controlled muscles. This framework significantly reduces the perceived degrees of freedom required for articulation planning, a persistent challenge for intrinsic coordinate models, by allowing for any movement that successfully achieves the speech goal, rather than rigidly encoding specific movements within the abstract representation. Furthermore, the phenomenon of coarticulation—where neighboring sounds influence each other's articulation—is particularly well-explained by gestural models. They posit that articulations at faster speech rates can be understood as complex composites of independent gestures, which are simply compressed and overlapped in time compared to their slower-rate counterparts.

Acoustics

A waveform (top), spectrogram (middle), and transcription (bottom) of a woman saying "Wikipedia" displayed using the Praat software for linguistic analysis

Listen

The accompanying audio

Speech sounds, those fleeting ephemeral disturbances we call language, are fundamentally created by the modification of an airstream. This modification, in turn, results in the generation of a sound wave that propagates through the air. The crucial work of modification is performed by the various articulators, and it is the precise interplay of different places and manners of articulation that produces the diverse acoustic results we hear. Because the overall posture and configuration of the entire vocal tract—not merely the isolated position of the tongue—can profoundly affect the resulting sound, the manner of articulation holds immense importance for accurately describing any given speech sound. Consider the English words "tack" and "sack." Both begin with alveolar sounds, but they differ critically in how far the tongue is positioned from the alveolar ridge. This seemingly minor spatial difference has profound effects on the airflow dynamics and, consequently, on the distinct sounds that are ultimately produced. Similarly, the direction and the very source of the airstream can dramatically influence the character of the sound. The most common airstream mechanism employed in human speech is pulmonic, which utilizes air exhaled from the lungs. However, the glottis and even the tongue itself can also be harnessed to generate alternative airstreams, adding further complexity to the acoustic landscape of language.

Voicing and phonation types

A fundamental distinction among speech sounds is whether they are voiced. Sounds are considered voiced when the vocal folds initiate vibration during the process of phonation. Many sounds possess the inherent capability of being produced both with and without phonation, offering a binary choice. However, certain physical constraints within the vocal tract can render phonation inherently difficult or even outright impossible for particular articulations. When articulations are voiced, the primary source of the acoustic energy—the "noise"—is the periodic vibration of the vocal folds. Conversely, articulations like voiceless plosives inherently lack an internal acoustic source, and their presence is often noticeable precisely by the brief period of silence they create. Other voiceless sounds, such as fricatives, manage to generate their own distinct acoustic source through turbulent airflow, entirely independent of phonation.

Phonation itself is intricately controlled by the muscles of the larynx, and languages, in their infinite complexity, exploit far more acoustic detail than a simple binary voiced/voiceless distinction. During phonation, the vocal folds vibrate at a specific rate. This rhythmic vibration generates a periodic acoustic waveform, characterized by a fundamental frequency and its associated harmonics. The fundamental frequency of this acoustic wave can be finely manipulated by adjusting the muscular tension within the larynx. Listeners perceive this fundamental frequency as pitch. Languages leverage this precise pitch manipulation in various ways: in tonal languages, it conveys lexical information, distinguishing words that are otherwise identical in their segmental composition. In many other languages, pitch is employed to mark prosodic or pragmatic information, such as emphasizing certain words or indicating a question versus a statement.

For the vocal folds to vibrate, two conditions must be met: they must be in the correct anatomical position, and there must be a sufficient flow of air across the glottis. Phonation types are conceptualized as existing along a continuum of glottal states, ranging from a completely open glottis (resulting in voicelessness) to a completely closed glottis (producing a glottal stop). The optimal position for vibration, and the most common phonation type in typical speech, is modal voice, which resides comfortably in the middle of these two extremes. If the glottis is slightly wider than for modal voice, breathy voice (or murmur) occurs, characterized by a simultaneous vibration and turbulent airflow. Conversely, if the vocal folds are brought even closer together, the result is creaky voice, often described as a low-pitched, irregular series of pulses.

The normal, default phonation pattern employed in most everyday speech is modal voice. In this state, the vocal folds are held relatively close together with a moderate, balanced tension. They vibrate as a cohesive, single unit, producing a periodic and acoustically efficient waveform with a full glottal closure during each cycle, and notably, no aspiration (a puff of air) following the release. If the vocal folds are pulled further apart than in modal voicing, they simply cannot vibrate, resulting in the production of voiceless phones. Conversely, if they are held too firmly and tightly together, they cease vibrating altogether, instead producing a glottal stop—a complete, abrupt cessation of airflow.

If the vocal folds are held slightly further apart than the optimal position for modal voicing, they yield distinct phonation types such as breathy voice (often referred to as murmur) and whispery voice. In these states, the tension across the vocal ligaments is reduced compared to modal voicing, which permits air to flow more freely through the glottis even as the folds vibrate. Both breathy voice and whispery voice exist on a continuum, loosely characterized by a transition from the more periodic waveform of breathy voice to the more noisy, turbulent waveform of whispery voice. Acoustically, both tendencies typically manifest as a dampening of the first formant, with whispery voice exhibiting more extreme deviations in this regard.

Holding the vocal folds more tightly together than in modal voice results in creaky voice (also known as vocal fry). Here, the overall tension across the vocal folds might be less than in modal voice, but they are held so tightly adducted that only the ligamentous margins of the vocal folds vibrate. The resulting acoustic pulses are highly irregular, exhibiting a characteristically low pitch and reduced frequency amplitude, creating a distinctive 'creaking' or 'rattling' sound.

While some languages, such as Hawaiian (which, notably, does not contrast voiced and voiceless plosives), may not maintain a voicing distinction for certain consonants, it is a universal truth that all languages utilize voicing to some degree. For example, no language is known to possess a phonemic voicing contrast for vowels; all known vowels are canonically voiced. (Though, to be precise, there are languages, like Japanese, where vowels can be produced as voiceless in specific, predictable contexts, but this is an allophonic variation, not a contrastive one.) Other intricate positions of the glottis, such as breathy and creaky voice, are employed in a number of languages, like Jalapa Mazatec, to establish contrastive phonemes and differentiate word meanings. In other languages, such as English, these phonation types exist purely allophonically, adding expressive nuance but not altering the lexical identity of a word.

Several methods exist to determine whether a speech segment is voiced or not. The simplest, and most low-tech, involves feeling the larynx (Adam's apple) during speech and noting the presence or absence of vibrations. For more precise, scientific measurements, acoustic analysis through a spectrogram or spectral slice is employed. In a spectrographic analysis, voiced segments typically reveal a prominent "voicing bar"—a region of high acoustic energy—in the low-frequency range. When examining a spectral slice, which represents the acoustic spectrum at a given point in time, a computational model of the pronounced vowel can be used to reverse the filtering effects of the mouth, thereby revealing the underlying spectrum of the glottis. A computational model of the unfiltered glottal signal is then fitted to this inverse-filtered acoustic signal to meticulously determine the characteristics of the glottis. Beyond acoustic methods, visual analysis is also available through specialized medical equipment, such as ultrasound and endoscopy, offering a direct, if invasive, view of the vocal folds in action.

Vowels

IPA: Vowels

| | Front | Central | Back | | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Phonetics is a branch of linguistics that, for reasons unfathomable to some, dedicates itself to the exhaustive study of how humans manage to produce and perceive sounds. Or, in the case of sign languages, the equivalent aspects of physical articulation and visual interpretation. It's a truly exhausting endeavor, if you ask me, cataloging every grunt, sigh, and precisely formed utterance. Linguists who specialize in meticulously documenting the physical properties of human speech — or perhaps, more accurately, human noise — are known as phoneticians. This field, a sprawling testament to humanity's enduring fascination with its own noises, is traditionally segmented into three rather distinct, yet interconnected, sub-disciplines: articulatory phonetics, which examines how sounds are physically made; acoustic phonetics, which scrutinizes the sound waves themselves; and auditory phonetics, which delves into how those waves are perceived by the human ear and brain.

Traditionally, the minimal linguistic unit that phoneticians concern themselves with is the phone—a distinct, concrete speech sound as it is actually produced and heard in a language. This is often, and incorrectly, conflated with the phonological unit of a phoneme. To be excruciatingly clear, a phoneme is an abstract, conceptual categorization of phones, defined as the smallest unit capable of discerning meaning between sounds in any given language. Think of phones as the raw, physical manifestations of sound, and phonemes as the underlying, abstract mental categories that native speakers use to organize those sounds. A single phoneme can have multiple allophones (different phones) that do not change the meaning of a word, a nuance often lost on the uninitiated.

Phonetics, at its core, grapples with two fundamental aspects of human speech: its production (the intricate ways humans conjure sounds from their biological machinery, a marvel of inefficient design) and its perception (the complex mechanisms by which that speech is subsequently understood, or, more often, misunderstood, given the inherent flaws of human communication). The communicative modality of a language dictates the specific method through which a language is both generated and interpreted. Languages employing oral-aural modalities, such as English, produce speech by manipulating the vocal tract and perceive it aurally, through the ears. Conversely, sign languages, like Australian Sign Language (Auslan) and American Sign Language (ASL), operate on a manual-visual modality. Here, "speech" is produced manually, through precise movements of the hands, arms, and body, and perceived visually, through sight. Furthermore, ASL and certain other sign languages even possess a manual-manual dialect, specifically designed for tactile signing by deafblind speakers. In these instances, signs are both produced with the hands and perceived with the hands through touch, adding yet another layer of complexity to the human communication spectrum, proving that if there's a way to communicate, humans will find it, no matter how convoluted.

History

Antiquity

The earliest known systematic inquiry into phonetics wasn't by some grand Western philosopher, but rather by Sanskrit grammarians, who, with an impressive foresight that puts many modern scholars to shame, began their studies as far back as the 6th century BCE. Among these pioneering investigators, the Hindu scholar Pāṇini stands out. His monumental four-part grammar, meticulously penned around 350 BCE, remains a foundational text in modern linguistics. It is still lauded today as "the most complete generative grammar of any language yet written," a rather humbling achievement considering its antiquity and the subsequent millennia of human intellectual endeavor. Pāṇini's work laid the groundwork for many modern linguistic principles, offering detailed descriptions of several crucial phonetic concepts, including the elusive notion of voicing. This ancient account astutely characterized resonance as being generated either by tone, which occurs when the vocal folds are closed and vibrating, or by noise, which is produced when the vocal folds are open and air flows freely without vibration. The phonetic principles embedded within his grammar are considered "primitives," serving as the bedrock for his theoretical analysis rather than being the direct subjects of that analysis themselves. Nevertheless, these profound principles can be logically inferred and reconstructed from his intricate system of phonology, a testament to the depth of his analytical framework.

The Sanskrit discipline dedicated to the study of phonetics is known as Shiksha, meaning "instruction" or "learning." This ancient field was not merely descriptive but prescriptive, aimed at ensuring the correct pronunciation of Vedic texts, which was considered essential for their ritual efficacy. The 1st-millennium BCE Taittiriya Upanishad, a text of profound philosophical and spiritual significance within Hinduism, offers a concise yet comprehensive definition of this field, showcasing its holistic approach to spoken language:

Om! We will explain the Shiksha. Sounds and accentuation, Quantity (of vowels) and the expression (of consonants), Balancing (Saman) and connection (of sounds), So much about the study of Shiksha. || 1 | Taittiriya Upanishad 1.2, Shikshavalli, translated by Paul Deussen.

This ancient text highlights not only the basic elements of sound—the individual phones—but also the crucial aspects of prosody (accentuation, quantity), and their intricate interaction within a continuous stream of speech (balancing and connection). This demonstrates an understanding that goes far beyond simple articulation, encompassing the rhythmic and melodic qualities of language, a level of sophistication that took Western linguistics centuries to rediscover.

Modern

Following the remarkable early insights of Pāṇini and his contemporaries, significant advancements in phonetics remained largely stagnant for millennia. A few limited investigations by Greek and Roman grammarians provided minor contributions, but the primary focus during this vast interval shifted, unfortunately, away from the nuanced distinction between spoken and written language—a critical driving force behind Pāṇini's meticulous account. Instead, the focus narrowed, concentrating almost exclusively on the isolated physical properties of speech, often neglecting its functional role in communication. It seems humanity occasionally forgets what's important.

A sustained, renewed interest in phonetics didn't truly re-emerge until around 1800 CE, marking the dawn of the modern era of the discipline. The term "phonetics" itself, in its current academic usage, was first recorded in 1841, finally giving a proper name to this renewed obsession with sounds. This period of reawakening coincided, rather conveniently, with significant developments in medicine, which provided new anatomical insights, and the advent of novel audio and visual recording technologies. These innovations provided phoneticians with unprecedented access to new, far more detailed data, allowing for deeper insights into the ephemeral nature of speech. Suddenly, the fleeting sounds could be captured, analyzed, and replayed, a game-changer for a field dedicated to the transient.

This early modern period saw the development of influential tools, such as a phonetic alphabet based on articulatory positions by Alexander Melville Bell, father of the more famous Alexander Graham Bell. His system, famously known as visible speech, gained considerable prominence as a pedagogical tool, particularly in the oral education of deaf children. It was an attempt to make the invisible mechanics of speech tangible, to provide a visual representation of how sounds were formed in the mouth, a noble, if inherently limited, endeavor to bridge the sensory gap.

Before the widespread availability of reliable audio recording equipment, phoneticians were forced to rely heavily on a tradition of practical phonetic training. This was deemed essential to ensure that transcriptions and findings remained consistent across different researchers—a monumental task given the subjective nature of human perception and the inherent variability of speech. This rigorous training encompassed two critical components: ear training, which involved developing an acute ability to recognize and differentiate speech sounds with a precision bordering on the superhuman, and production training, which demanded the capacity to accurately produce those very sounds, often from languages entirely foreign to the phonetician. Phoneticians were expected to master the recognition of various sounds on the International Phonetic Alphabet (IPA) by ear. Indeed, the IPA still, to this day, tests and certifies speakers on their ability to accurately produce the phonetic patterns of English, though they have, perhaps wisely, discontinued this practice for other languages, acknowledging the sheer scope and impracticality of such an undertaking for every language on Earth.

As a refinement of his visible speech method, Melville Bell also developed a descriptive system for vowels based on their height and backness, which ultimately led to the concept of 9 cardinal vowels. These were idealized, extreme points in the vowel space, supposedly representing the outermost limits of possible vowel articulations. As part of their practical phonetic training, aspiring phoneticians were expected to learn to produce these cardinal vowels with exacting precision, often through arduous practice. The idea was that these fixed points would serve as stable articulatory anchors, allowing phoneticians to calibrate their perception and transcription of other, more variable phones during demanding fieldwork. However, this approach faced a significant critique from Peter Ladefoged in the 1960s. His experimental evidence, based on acoustic analysis and X-ray studies, suggested that cardinal vowels were primarily auditory rather than strictly articulatory targets, effectively challenging the long-held claim that they represented immutable articulatory anchors by which all other articulations could be reliably judged. The human ear, it seems, is a more complicated instrument than once thought, and the relationship between articulation and perception is rarely as straightforward as we'd like to believe.

Production

Main article: Language production

Language production is not merely a simple utterance but a complex, multi-stage process, consisting of several interdependent cognitive and motor processes. These processes collectively transform an abstract, nonlinguistic message—a thought, an intention, or perhaps a deeply cynical observation—into a concrete, spoken or signed linguistic signal. There's an ongoing, rather fervent, debate among linguists regarding the precise temporal organization of language production. Do these processes unfold in a rigid, sequential series of stages (known as serial processing), where one stage must fully complete before the next begins? Or do they occur, at least partially, in parallel, overlapping and influencing each other in a messy, interactive fashion? The answer, as is often the case in these matters, is likely "it's complicated," with current research leaning towards more interactive models, acknowledging the brain's remarkable capacity for parallel processing.

After a speaker has identified the specific message they intend to convey and initiates the process of linguistic encoding, the first concrete step involves selecting the individual words—more formally termed lexical items—that will best represent that message. This crucial process is known as lexical selection. The chosen words are not arbitrary; they are meticulously selected based on their meaning, which in linguistics is referred to as semantic information. This lexical selection then activates the word's lemma, an abstract mental representation that encapsulates both the semantic content and the grammatical properties of the word (e.g., its part of speech, gender, number). Crucially, its phonological forms—that is, how it sounds—are not yet made available at this initial stage.

Once an utterance has been, at least partially, planned (because who plans everything perfectly?), it then progresses into the phonological encoding stage. In this phase of language production, the abstract mental representation of the selected words is assigned its concrete phonological content. This manifests as a precise sequence of phonemes that are destined to be produced. These phonemes are not just abstract symbols; they are specified for a detailed set of articulatory features. These features denote particular motor goals, such as the precise timing of lip closure, the exact positioning of the tongue within the oral cavity, or the state of the vocal folds. Subsequently, these meticulously specified phonemes are coordinated into a sophisticated sequence of muscle commands. These commands are then dispatched to the relevant muscles of the vocal tract, and, assuming they are executed properly (a non-trivial assumption given human fallibility and the sheer complexity of the motor system), the intended sounds are finally produced. This entire cascade, from thought to utterance, happens in milliseconds, a testament to the brain's astonishing, if imperfect, efficiency.

Thus, the entire convoluted process of production, from the initial glimmer of a message to its audible manifestation as sound, can be summarized as the following sequence, a rather linear depiction of what is, in reality, a chaotic dance of neurons and muscles, often with feedback loops and parallel processing:

  • Message planning: The initial cognitive stage where the speaker conceives the communicative intent and selects the overall message to be conveyed. This is where the thought forms, perhaps a particularly cutting remark.
  • Lemma selection: The process of retrieving abstract representations of words (lemmas) from the mental lexicon, based on their semantic and syntactic properties, independent of their sound form.
  • Retrieval and assignment of phonological word forms: Accessing the sound structure (phonological form) associated with the selected lemmas, transforming abstract words into sequences of phonemes.
  • Articulatory specification: Translating the phonological sequence into specific motor commands for the articulators, detailing the precise movements required for each sound.
  • Muscle commands: The neural signals sent from the brain to the muscles of the vocal tract, initiating the physical movements.
  • Articulation: The actual, messy, physical act of moving the articulators (tongue, lips, jaw, velum, larynx) to create the speech sounds.
  • Speech sounds: The resulting acoustic output, the audible manifestation of the original message, hopefully conveying what was intended.

Place of articulation

Main article: Place of articulation

Sounds that are formed by a complete or partial constriction of the vocal tract are aptly termed consonants. These sounds are primarily produced within the vocal tract, most often within the mouth, and the precise location of this constriction fundamentally alters the resulting acoustic output. Given the intimate and non-negotiable connection between the position of the tongue, lips, or other articulators and the sound that ultimately emerges, the place of articulation stands as a paramount concept across all subdisciplines of phonetics. It's where the action happens, quite literally, determining much of a consonant's identity.

Sounds are, in part, categorized by the specific location where a constriction is formed, as well as by the particular part of the body actively involved in creating that constriction. Consider, for instance, the English words "fought" and "thought." These constitute a classic minimal pair, meaning they differ by only a single phonetic element that changes meaning. In this case, the difference lies not in the general location of the constriction (both are roughly "at the teeth"), but rather in the organ making the constriction. The "f" sound in "fought" is a labiodental articulation, formed by the lower lip pressing against the upper teeth. In stark contrast, the "th" sound in "thought" is a linguodental articulation, produced by the tongue making contact with the teeth. Constrictions primarily involving the lips are generically referred to as labials, while those involving the tongue are, somewhat predictably, termed lingual. This seemingly minor distinction highlights the precision required in phonetic description.

Constrictions made with the tongue, being the most agile and versatile articulator in the vocal tract, can occur at various points within the oral and pharyngeal cavities. These are broadly classified into three primary categories: coronal, dorsal, and radical places of articulation. Coronal articulations are executed using the front part of the tongue—the tip or blade. Dorsal articulations employ the back of the tongue, the body, against the roof of the mouth. Finally, radical articulations are produced deep within the pharynx, often involving the tongue root or epiglottis. However, these broad divisions, while useful for initial categorization, are often insufficient for precisely distinguishing and describing the full spectrum of human speech sounds. For example, in English, both the [s] and [ʃ] sounds (as in "sip" and "ship") are classified as coronal, yet they are clearly produced in distinct areas of the mouth. To account for such subtle but crucial differences, phoneticians require more granular and detailed categories of place of articulation, based on the specific region of the mouth where the primary constriction occurs, often named after the passive articulator.

Labial

Articulations that involve the lips, those rather expressive facial features, can be formed in three primary configurations. These include sounds made with both lips (bilabial), those produced with one lip and the teeth—specifically, the lower lip acting as the active articulator against the upper teeth (thus, labiodental), and, less commonly, those involving the tongue and the upper lip (linguolabial). Depending on the specific definitional criteria employed, some or all of these articulations may be grouped under the broader umbrella of labial articulations.

Bilabial consonants are, as their name suggests, formed using both lips. In the production of these sounds, the lower lip typically performs the greatest displacement to meet the upper lip, which, contrary to popular intuition, also exhibits a slight active downward movement, contributing to the closure. However, in rapid speech or with strong airflow, the force of air moving through the aperture (the opening between the lips) can sometimes cause the lips to separate faster than their muscular action might otherwise dictate. Unlike most other articulations, where one articulator is often a hard surface like teeth or the palate, both articulators in bilabial stops are composed of soft tissue. This anatomical characteristic means that bilabial stops are statistically more prone to being produced with incomplete closures than articulations involving rigid surfaces. Bilabial stops are also somewhat anomalous in that an articulator from the upper section of the vocal tract—the upper lip—demonstrates some active downward movement, rather than remaining purely passive.

Linguolabial consonants are produced by the blade of the tongue approaching or making contact with the upper lip. Like in bilabial articulations, the upper lip displays a minor movement towards the more active lingual articulator. Interestingly, articulations within this group do not possess their own dedicated symbols in the International Phonetic Alphabet; instead, they are represented by combining an apical symbol with a diacritic (a small mark added to a symbol), implicitly categorizing them within the coronal class. These sounds are not universally common, but they are attested in a number of languages indigenous to Vanuatu, such as Tangoa, demonstrating the astonishing diversity of human speech sounds.

Labiodental consonants are formed by the lower lip ascending to make contact with the upper teeth. These consonants most frequently manifest as fricatives (e.g., [f], [v]), where air is forced turbulently through a narrow constriction, though labiodental nasals are also typologically common across the world's languages. There is an ongoing debate among phoneticians as to whether truly distinct labiodental plosives occur in any natural language, as a complete seal between the lip and teeth can be difficult to maintain for a true stop. While the existence of such sounds is contentious, a number of languages have been reported to feature labiodental plosives, including Zulu, Tonga, and Shubi. The very debate underscores the subtle and often ambiguous nature of phonetic classification, proving that even seemingly simple sounds can hide complexities.

Coronal

Coronal consonants are those produced using the highly agile tip or blade of the tongue. Due to the inherent flexibility and dexterity of the front of the tongue, this category encompasses a vast array of sounds, distinguished not only by their specific place of articulation but also by the nuanced posture of the tongue itself. The coronal places of articulation define the regions of the mouth where the tongue makes contact or creates a constriction. These include the dental, alveolar, and post-alveolar locations, each offering a slightly different acoustic signature.

Tongue postures specifically utilizing the tip of the tongue can be further classified: they are apical if the very top (apex) of the tongue tip is employed; laminal if the broader blade of the tongue, just behind the tip, is used; or sub-apical if the tongue tip is curled backward, bringing the underside of the tongue into play, a rather acrobatic feat. Coronals are unique as a group in their remarkable versatility, as every single manner of articulation (from stops to fricatives to approximants and beyond) is attested within this category, making them a rich area of study. Australian languages are particularly renowned for their extensive and complex systems of coronal contrasts, exhibiting a rich diversity both within and across the languages of the region, often distinguishing between multiple points along the alveolar and palatal regions.

Dental consonants are formed with the tip or blade of the tongue making contact with the upper teeth. They are typically subdivided into two groups based on the precise part of the tongue used: apical dental consonants are produced with the tongue tip touching the teeth; interdental consonants are produced with the blade of the tongue as the tip of the tongue sticks out slightly in front of the teeth. No known language employs both types contrastively, though they may exist allophonically (as non-meaning-distinguishing variants) within a single language. Alveolar consonants are made with the tip or blade of the tongue at the alveolar ridge, just behind the upper teeth, and can similarly be apical or laminal. This small ridge is a busy place in the mouth.

Cross-linguistically, dental consonants and alveolar consonants are frequently contrasted, leading to a number of fascinating generalizations about global phonetic patterns. The different places of articulation tend to also be distinguished by the part of the tongue used for their production: most languages featuring dental stops employ laminal dentals, whereas languages with alveolar stops typically utilize apical alveolar stops. It is exceedingly rare for a language to maintain two consonants at the exact same place of articulation that contrast solely in their laminality versus apicality, though Taa (ǃXóõ) stands as a notable counterexample to this pattern, proudly defying neat categorization. Furthermore, if a language possesses only one type of stop—either a dental stop or an alveolar stop—it will generally be laminal if it is dental, and usually apical if it is alveolar. However, languages such as Temne and Bulgarian inconveniently refuse to follow this neat generalization, proving that linguistic rules often have more exceptions than actual rules. If a language does indeed feature both an apical and a laminal stop, the laminal stop is more frequently affricated (meaning it has a fricative release), as observed in Isoko. Yet, Dahalo presents the inverse pattern, with its alveolar stops exhibiting greater affrication. It seems human speech, much like humans themselves, thrives on inconvenient exceptions.

Retroflex consonants are a particularly intriguing and somewhat debated category, with several different definitions vying for prominence depending on whether the emphasis is placed on the precise position of the tongue or the contact point on the roof of the mouth. Generally speaking, they represent a class of articulations in which the tip of the tongue is curled upwards and backward to some discernible degree. Consequently, retroflex articulations can manifest at various locations along the roof of the mouth, including the alveolar, post-alveolar, and even palatal regions. If the underside of the tongue tip makes contact with the roof of the mouth, the articulation is specifically termed sub-apical, a truly impressive feat of lingual gymnastics. However, apical post-alveolar sounds where the top of the tongue tip contacts the post-alveolar region are also frequently described as retroflex. Classic examples of sub-apical retroflex stops are commonly found in Dravidian languages, where they form a robust phonemic contrast. In some languages indigenous to the southwest United States, the subtle but contrastive difference between dental and alveolar stops is, in fact, attributed to a slight retroflexion of the alveolar stop. Acoustically, retroflexion typically has a noticeable effect on the higher formants of the speech signal, specifically lowering the third formant, providing a measurable acoustic signature for this unique tongue posture.

Articulations that occur just behind the alveolar ridge, known as post-alveolar consonants, have been burdened with a confusing array of different terminologies over the years, a common affliction in linguistics. Apical post-alveolar consonants are often, and somewhat loosely, referred to as retroflex, blurring the lines of classification. Meanwhile, laminal post-alveolar articulations are sometimes called palato-alveolar, emphasizing their dual contact points. In the specialized literature concerning Australian languages, these laminal stops are frequently described as 'palatal,' though they are produced further forward in the mouth than the region typically and prototypically considered palatal. This terminological variation highlights the challenges of standardizing phonetic descriptions across diverse linguistic traditions and anatomical realities. Furthermore, due to inherent individual anatomical variation, the precise articulation of palato-alveolar stops (and coronals in general) can exhibit considerable variability even within a single speech community, making precise classification a perpetually moving target, much like trying to nail down a definitive human intention.

Dorsal

Dorsal consonants are those produced using the body of the tongue rather than its tip or blade. These articulations typically occur at the palate, velum, or uvula, utilizing the broad surface of the tongue's back.

Palatal consonants are formed by pressing the body of the tongue against the hard palate, which forms the rigid roof of the mouth. They are frequently contrasted with velar or uvular consonants, though it is remarkably rare for a language to maintain a phonemic contrast among all three simultaneously, with Jaqaru being cited as a possible, intriguing example of such a three-way distinction. The precise point of contact along the palate can vary, influencing the sound's quality.

Velar consonants are made using the tongue body against the velum, or soft palate, located at the back of the roof of the mouth. They are incredibly common cross-linguistically; it is a near-universal truth that almost all languages possess at least one velar stop (e.g., [k], [g]). Because both velars and vowels involve the body of the tongue as a primary articulator, they are highly susceptible to the effects of coarticulation with neighboring vowels. This means their precise articulation can shift considerably, being produced as far forward as the hard palate or as far back as the uvula, depending on the surrounding phonetic context. These contextual variations are typically categorized into front, central, and back velars, mirroring the traditional classification of the vowel space. They can often be deceptively difficult to distinguish phonetically from palatal consonants, though velars are generally produced slightly behind the area of prototypical palatal consonants, a subtle but measurable difference.

Uvular consonants are made by the tongue body contacting or closely approaching the uvula, the small fleshy appendage hanging at the very back of the soft palate. These sounds are comparatively rare, estimated to occur in only about 19 percent of the world's languages. Large geographical regions, particularly in the Americas and Africa, exhibit no languages with uvular consonants whatsoever, suggesting their less central role in universal phonology. In languages that do feature uvular consonants, stops are the most frequently attested type, followed by continuants (a category that includes nasals and fricatives). One might wonder why such a seemingly inconvenient and deep place of articulation persists, but then, humans do many inconvenient things.

Pharyngeal and laryngeal

Consonants produced by constrictions deep within the throat are termed pharyngeals, and those formed by a constriction within the larynx itself are known as laryngeal consonants. Laryngeal consonants are, by definition, created using the vocal folds, as the larynx is situated too far down the throat to be reached by the tongue. Pharyngeals, however, are sufficiently close to the oral cavity that certain parts of the tongue, specifically the tongue root, can indeed reach and manipulate them.

Radical consonants are those that employ either the root of the tongue or the epiglottis during their production, occurring at the very farthest reaches of the vocal tract. Pharyngeal consonants are made by retracting the root of the tongue far enough to almost touch the posterior wall of the pharynx. Due to the inherent physiological difficulties and constraints in this deep region, only fricatives and approximants can typically be produced in this manner. Epiglottal consonants are made with the epiglottis (the flap of cartilage that covers the windpipe during swallowing) and the back wall of the pharynx. Instances of epiglottal stops have been documented, notably in Dahalo, a language known for its phonetic complexity. It's generally considered physiologically impossible to produce voiced epiglottal consonants because the cavity between the glottis and the epiglottis is simply too diminutive to permit sustained voicing, a constraint of biological design.

Glottal consonants are those produced exclusively using the vocal folds within the larynx. Because the vocal folds are the primary source of phonation and are situated below the oro-nasal vocal tract, a number of theoretically possible glottal consonants are, in practice, impossible to produce—for example, a voiced glottal stop. However, three glottal consonants are indeed possible: a voiceless glottal stop ([ʔ], as in the middle of "uh-oh") and two glottal fricatives ([h], [ɦ]), all of which are attested in various natural languages. Glottal stops, produced by closing the vocal folds completely and abruptly, are remarkably common across the world's languages, a testament to their utility in speech. While many languages use them primarily to demarcate phrase boundaries or as allophones, some languages, such as Arabic and Huatla Mazatec, elevate them to the status of contrastive phonemes, where their presence or absence changes word meaning. Additionally, in some contexts, glottal stops can be realized as laryngealization (a creaky quality) of the subsequent vowel. It is also worth noting that glottal stops, particularly when occurring between vowels, frequently do not form a complete, sustained closure. True, complete glottal stops are more typically observed when they are geminated (doubled) or at the beginning of an utterance.

The larynx

Further information: Larynx

Ah, the larynx. Commonly, and rather simplistically, known as the "voice box," this intricate cartilaginous structure nestled within the trachea (windpipe) is, in fact, the primary orchestrator of phonation—the process of producing vocal sounds. Within this delicate apparatus, the vocal folds (often mistakenly referred to as "chords," despite being folds of mucous membrane) are either held together in a precise configuration that allows them to vibrate, or held apart to prevent vibration. The subtle, yet profoundly impactful, positions of these vocal folds are meticulously achieved through the minute movements of the arytenoid cartilages, small pyramidal structures located atop the cricoid cartilage. The intrinsic laryngeal muscles are the unsung heroes here, responsible not only for manipulating the arytenoid cartilages but also for finely modulating the tension and length of the vocal folds themselves. If the vocal folds are not positioned sufficiently close or tense enough, they will either vibrate sporadically, resulting in phenomena like creaky or breathy voice depending on the degree of looseness, or they will fail to vibrate at all, leading to voicelessness. It's a remarkably precise instrument, easily thrown off balance.

Beyond the precise positioning of the vocal folds, a critical prerequisite for their vibration is the continuous flow of air across them. Without this aerodynamic force, they remain stubbornly silent, much like a musician without an instrument. The minimal difference in pressure across the glottis (the space between the vocal folds) required to initiate and sustain voicing is estimated to be a mere 1–2 cm H₂O (equivalent to approximately 98.0665–196.133 Pascals). This pressure differential can drop below the threshold required for phonation for two main reasons: either an increase in pressure above the glottis (known as supraglottal pressure) or a decrease in pressure below the glottis (subglottal pressure). The vital subglottal pressure is diligently maintained by the coordinated action of the respiratory muscles, primarily the diaphragm and intercostals. Supraglottal pressure, in the absence of any constrictions or articulations, is roughly equivalent to atmospheric pressure. However, because articulations—especially consonants—inherently involve constrictions of the airflow, the pressure within the cavity behind those constrictions can build up significantly, leading to a higher supraglottal pressure. This delicate balance of pressures is what allows for the complex interplay of voiced and voiceless sounds that characterize human speech, a constant negotiation between breath and obstruction.

Lexical access

According to the prevailing lexical access model, also known as the two-stage theory of lexical access, the process of retrieving words from our mental lexicon involves two distinct, yet sequential, stages of cognitive processing. This model attempts to simplify the daunting complexity of how we pull words from our mental dictionary when we want to speak. The first stage, termed lexical selection, is responsible for furnishing the information about lexical items that is necessary to construct what is known as the functional-level representation of an utterance. During this phase, words are retrieved based on their specific semantic (meaning) and syntactic (grammatical) properties (e.g., whether it's a noun or a verb, its gender in some languages, its argument structure). Crucially, their phonological forms—that is, how they sound—are not yet made available at this initial stage. It's like picking out the right concept without knowing how to pronounce it yet. The second stage, referred to as the retrieval of wordforms, then provides the specific information required for building the positional-level representation of the utterance. This is where the abstract semantic and syntactic information is finally mapped onto concrete phonological sequences, preparing the word for articulation by specifying its constituent phonemes and their order. This staged approach helps to explain various speech errors (like "tip-of-the-tongue" phenomena, where meaning is accessed but sound isn't) and how our brains manage the immense complexity of transforming thoughts into spoken language.

Articulatory models

When the human apparatus is engaged in the act of producing speech, the various articulators—lips, tongue, jaw, velum, and so on—execute intricate movements, often making contact with specific locations within the vocal tract. This dynamic interplay directly results in measurable changes to the acoustic signal that we perceive as speech. Some sophisticated models of speech production take these physical movements as their foundational premise, aiming to model articulation within a defined coordinate system. This system may be intrinsic to the body, mapping the positions and angles of internal joints and muscles, or extrinsic, referring to external spatial coordinates, perhaps even in an abstract acoustic space.

Intrinsic coordinate systems, for instance, typically model the movement of articulators as precise positions and angles of joints within the body. Models focusing on the jaw, a relatively simple articulator, often employ two to three degrees of freedom (e.g., opening/closing, protrusion/retraction) to represent its translation and rotation. However, these models encounter significant challenges when attempting to accurately represent the tongue. Unlike the rigid, jointed structures of the jaw or limbs, the tongue is a muscular hydrostat—a marvel of biological engineering akin to an elephant's trunk or an octopus's arm—which fundamentally lacks joints. It achieves its complex shapes and movements through the coordinated contraction and relaxation of its intrinsic and extrinsic muscles. This unique physiological structure means that while movements of the jaw tend to follow relatively straight lines during both speech and mastication, the movements of the tongue are inherently more complex and typically trace curvilinear paths, defying simple linear modeling.

The observation of these curvilinear movements has been used to argue that articulations might be planned in extrinsic rather than intrinsic space, where targets are defined relative to the external environment or the acoustic output. However, it's important to note that extrinsic coordinate systems are not limited to physical spatial coordinates; they can also encompass acoustic coordinate spaces, where the target is a specific sound quality. Models that posit that movements are planned in extrinsic space, whether physical or acoustic, inevitably encounter what is known as the inverse problem. This problem arises from the challenge of explaining how a specific observed path or acoustic signal can be uniquely mapped back to the underlying muscle and joint configurations that produced it. For example, the human arm, with its seven degrees of freedom and 22 muscles, can achieve the same final position through a multitude of different joint and muscle configurations. This one-to-many mapping applies equally to models of planning in extrinsic acoustic space: there is no single, unique mapping from desired physical or acoustic targets to the precise muscle movements required to achieve them. Yet, concerns about the severity of the inverse problem in speech may be somewhat exaggerated, given that speech is an extraordinarily highly learned skill, intricately managed by neurological structures that have, over evolutionary time, become exquisitely adapted for this very purpose.

The equilibrium-point model offers a compelling potential resolution to the vexing inverse problem. It proposes that movement targets are not represented as static positions, but rather as the equilibrium point of the forces exerted by opposing muscle pairs acting on a joint. Crucially, muscles within this model are conceptualized as springs, and the target position is effectively the equilibrium point for this idealized spring-mass system. By utilizing this spring-like analogy, the equilibrium-point model can elegantly account for compensatory movements and rapid responses when planned articulations are disrupted or perturbed (e.g., if the jaw is unexpectedly pushed out of position during speech). These are considered coordinate models because they fundamentally assume that these complex muscle positions are represented as specific points in space—the aforementioned equilibrium points—where the spring-like actions of the muscles converge. This provides a dynamic, rather than static, representation of articulatory goals.

Gestural approaches to speech production propose a fundamentally different perspective, arguing that articulations are not represented as particular coordinates to be hit, but rather as dynamic movement patterns. The minimal unit in these models is a "gesture," which is defined as a group of "functionally equivalent articulatory movement patterns that are actively controlled with reference to a given speech-relevant goal (e.g., a bilabial closure)." These 'gestures' are understood to represent coordinative structures or "synergies"—task-dependent groupings of muscles that function collaboratively as a single, integrated unit, rather than as individual, independently controlled muscles. This framework significantly reduces the perceived degrees of freedom required for articulation planning, a persistent challenge for intrinsic coordinate models, by allowing for any movement that successfully achieves the speech goal, rather than rigidly encoding specific movements within the abstract representation. Furthermore, the phenomenon of coarticulation—where neighboring sounds influence each other's articulation—is particularly well-explained by gestural models. They posit that articulations at faster speech rates can be understood as complex composites of independent gestures, which are simply compressed and overlapped in time compared to their slower-rate counterparts, much like chords in music.

Acoustics

A waveform (top), spectrogram (middle), and transcription (bottom) of a woman saying "Wikipedia" displayed using the Praat software for linguistic analysis

Listen

The accompanying audio

Speech sounds, those fleeting ephemeral disturbances we call language, are fundamentally created by the modification of an airstream which results in a sound wave. This modification, in turn, results in the generation of a sound wave that propagates through the air. The crucial work of modification is performed by the various articulators, and it is the precise interplay of different places and manners of articulation that produces the diverse acoustic results we hear. Because the overall posture and configuration of the entire vocal tract—not merely the isolated position of the tongue—can profoundly affect the resulting sound, the manner of articulation holds immense importance for accurately describing any given speech sound. Consider the English words "tack" and "sack." Both begin with alveolar sounds, but they differ critically in how far the tongue is positioned from the alveolar ridge and how the air flows through that constriction. This seemingly minor spatial difference has profound effects on the airflow dynamics and, consequently, on the distinct sounds that are ultimately produced. Similarly, the direction and the very source of the airstream can dramatically influence the character of the sound. The most common airstream mechanism employed in human speech is pulmonic, which utilizes air exhaled from the lungs. However, the glottis and even the tongue itself can also be harnessed to generate alternative airstreams (glottalic and velaric, respectively), adding further complexity to the acoustic landscape of language, proving that if there's a way to make noise, humans will exploit it.

Voicing and phonation types

A major distinction among speech sounds is whether they are voiced. Sounds are considered voiced when the vocal folds initiate vibration during the process of phonation. Many sounds possess the inherent capability of being produced both with and without phonation, offering a binary choice. However, certain physical constraints within the vocal tract can render phonation inherently difficult or even outright impossible for particular articulations (e.g., a voiced epiglottal stop). When articulations are voiced, the primary source of the acoustic energy—the "noise"—is the periodic vibration of the vocal folds. Conversely, articulations like voiceless plosives (e.g., [p], [t], [k]) inherently lack an internal acoustic source during their closure phase, and their presence is often noticeable precisely by the brief period of silence they create, followed by a burst. Other voiceless sounds, such as fricatives (e.g., [s], [f]), manage to generate their own distinct acoustic source through turbulent airflow at the point of constriction, entirely independent of phonation.

Phonation itself is intricately controlled by the muscles of the larynx, and languages, in their infinite complexity, exploit far more acoustic detail than a simple binary voiced/voiceless distinction. During phonation, the vocal folds vibrate at a specific rate. This rhythmic vibration generates a periodic acoustic waveform, characterized by a fundamental frequency (F0) and its associated harmonics, which are integer multiples of the F0. The fundamental frequency of this acoustic wave can be finely manipulated by adjusting the muscular tension and length within the larynx. Listeners perceive this fundamental frequency as pitch. Languages leverage this precise pitch manipulation in various ways: in tonal languages, it conveys lexical information, distinguishing words that are otherwise identical in their segmental composition (e.g., Mandarin "ma" with different tones meaning "mother," "hemp," "horse," or "scold"). In many other languages, pitch is employed to mark prosodic or pragmatic information, such as emphasizing certain words, indicating a question versus a statement (intonation), or marking pitch accents within phrases.

For the vocal folds to vibrate, two conditions must be met: they must be in the correct anatomical position, and there must be a sufficient flow of air across the glottis. Phonation types are conceptualized as existing along a continuum of glottal states, ranging from a completely open glottis (resulting in voicelessness) to a completely closed glottis (producing a glottal stop). The optimal position for vibration, and the most common phonation type in typical speech, is modal voice, which resides comfortably in the middle of these two extremes. If the glottis is slightly wider than for modal voice, breathy voice (or murmur) occurs, characterized by a simultaneous vibration and turbulent airflow. Conversely, if the vocal folds are brought even closer together, but not fully closed, the result is creaky voice, often described as a low-pitched, irregular series of pulses, a sound that can be quite grating to listen to.

The normal, default phonation pattern employed in most everyday speech is modal voice. In this state, the vocal folds are held relatively close together with a moderate, balanced tension. They vibrate as a cohesive, single unit, producing a periodic and acoustically efficient waveform with a full glottal closure during each cycle, and notably, no aspiration (a puff of air) following the release. If the vocal folds are pulled further apart than in modal voicing, they simply cannot vibrate, resulting in the production of voiceless phones. If they are held too firmly and tightly together, they cease vibrating altogether, instead producing a glottal stop—a complete, abrupt cessation of airflow.

If the vocal folds are held slightly further apart than the optimal position for modal voicing, they yield distinct phonation types such as breathy voice (often referred to as murmur) and whispery voice. In these states, the tension across the vocal ligaments is reduced compared to modal voicing, which permits air to flow more freely through the glottis even as the folds vibrate. Both breathy voice and whispery voice exist on a continuum, loosely characterized by a transition from the more periodic waveform of breathy voice (with audible breathiness) to the more noisy, turbulent waveform of whispery voice (where vibration is minimal or absent). Acoustically, both tendencies typically manifest as a dampening of the first formant, with whispery voice showing more extreme deviations, making it less sonorous.

Holding the vocal folds more tightly together than in modal voice results in creaky voice (also known as vocal fry or laryngealized voice). Here, the overall tension across the vocal folds might be less than in modal voice, but they are held so tightly adducted that only the ligamentous margins of the vocal folds vibrate, often in an irregular, aperiodic fashion. The resulting acoustic pulses are highly irregular, exhibiting a characteristically low pitch and reduced frequency amplitude, creating a distinctive 'creaking' or 'rattling' sound. This can be a sign of vocal fatigue or, in some languages, a linguistically meaningful distinction.

While some languages, such as Hawaiian (which, notably, does not contrast voiced and voiceless plosives), may not maintain a voicing distinction for certain consonants, it is a universal truth that all languages utilize voicing to some degree. For example, no language is known to possess a phonemic voicing contrast for vowels; all known vowels are canonically voiced. (Though, to be precise, there are languages, like Japanese, where vowels can be produced as voiceless in specific, predictable contexts, often between voiceless consonants or at the end of an utterance, but this is an allophonic variation, not a contrastive one.) Other intricate positions of the glottis, such as breathy and creaky voice, are employed in a number of languages, like Jalapa Mazatec, to establish contrastive phonemes and differentiate word meanings. In other languages, such as English, these phonation types exist purely allophonically, adding expressive nuance or marking discourse features, but not altering the lexical identity of a word.

Several methods exist to determine whether a speech segment is voiced or not. The simplest, and most low-tech, involves feeling the larynx (Adam's apple) during speech and noting the presence or absence of vibrations. For more precise, scientific measurements, acoustic analysis through a spectrogram or spectral slice is employed. In a spectrographic analysis, voiced segments typically reveal a prominent "voicing bar"—a region of high acoustic energy—in the low-frequency range, indicating regular vocal fold vibration. When examining a spectral slice, which represents the acoustic spectrum at a given point in time, a computational model of the pronounced vowel can be used to reverse the filtering effects of the mouth, thereby revealing the underlying spectrum of the glottis. A computational model of the unfiltered glottal signal is then fitted to this inverse-filtered acoustic signal to meticulously determine the characteristics of the glottis, such as its open quotient or spectral slope. Beyond acoustic methods, visual analysis is also available through specialized medical equipment, such as ultrasound and endoscopy, offering a direct, if invasive, view of the vocal folds in action, providing undeniable proof of their movement (or lack thereof).

Vowels

IPA: Vowels

| | Front | Central | Back | | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------