Latin Script

Contents

1. Overview
2. Etymology
3. Cultural Impact

For those seeking the Latin alphabet as it was originally employed by the venerable Ancient Romans to inscribe their language, you’ll find its dedicated entry elsewhere. This particular discourse delves into the broader construct of the Latin script itself.

One might notice this article, much like a conversation with an academic, relies on a list of general references but appears to have skipped the tedious step of providing sufficient, corresponding inline citations . Should you feel compelled to embark on an act of selfless service, your efforts to improve this piece by introducing more precise citations would be, begrudgingly, appreciated. (October 2017) (One can Learn how and when to remove this message if the sight of such administrative clutter offends their sensibilities.)

Scriptum Latinum

Script type Alphabet

Period c. 700 BC – present

Direction Left-to-right

Languages See List of Latin-script alphabets

Related scripts

Parent systems

Egyptian hieroglyphs
- Proto-Sinaitic script
  - Phoenician alphabet
    - Greek alphabet
      - Old Italic script
        Latin script

Child systems

Fraser alphabet (Lisu )
Osage script
(partially) several phonetic alphabets , such as IPA , which have been used to write languages with no native script
Deseret alphabet
(partially) Pollard script (Miao)
(partially) Caroline Island script (Woleaian)
(indirectly) Cherokee syllabary
(indirectly, partially) Yugtun script

Sister systems

ISO 15924 ISO 15924 Latn (215), Latin

Unicode Unicode alias Latin Unicode range See Latin characters in Unicode

This article contains phonetic transcriptions in the International Phonetic Alphabet (IPA). For an introductory guide on IPA symbols, see Help:IPA . For the distinction between [ ], / / and ⟨ ⟩, see IPA § Brackets and transcription delimiters .

The Latin script , often referred to with a shrug as the Roman script , is a foundational writing system that has, rather impressively, managed to endure for millennia. Its origins are not, as some might naively assume, a sudden divine revelation, but rather a slow, iterative evolution. It is fundamentally based on the letters that comprised the classical Latin alphabet , a system that itself wasn’t entirely original. Instead, it was derived from a specific iteration of the venerable Greek alphabet , one that was actively in use in the ancient Greek city of Cumae , nestled within the sprawling Hellenic colonies of Magna Graecia .

The journey of transformation didn’t stop there, of course. The Etruscans , a fascinatingly enigmatic civilization that predated the Romans in the Italian Peninsula , took the Greek alphabet and, with their own linguistic needs, altered it. One might say they put their unique stamp on it, much to the eventual benefit of their successors. Subsequently, this Etruscan-modified alphabet was further refined and altered by the very Ancient Romans themselves, who, in their characteristic fashion, adopted, adapted, and then spread it across a burgeoning empire. This process of continuous evolution means that while the core remains, several distinct Latin-script alphabets exist today, each differing in its specific collection of graphemes , its rules for collation (how letters are ordered), and, naturally, the phonetic values assigned to its characters, all diverging to some degree from the classical Latin alphabet upon which they are based.

Perhaps one of the most significant testaments to the Latin script’s adaptability and enduring utility is its role as the fundamental basis for the International Phonetic Alphabet (IPA). This global standard for phonetic transcription, designed to capture the myriad sounds of human speech, heavily leverages Latin characters, often with the addition of diacritics or slight modifications. Furthermore, the 26 letters that are most universally recognized and utilized form the bedrock of the ISO basic Latin alphabet , which, conveniently enough, are precisely the same letters found in the English alphabet . This coincidence is not accidental, but rather a reflection of English’s global reach and the practical need for a widely accepted common denominator in digital communication.

It’s hardly surprising, then, that the Latin script serves as the foundation for the largest number of alphabets of any known writing system currently in existence. [1] This widespread influence has cemented its status as the most widely adopted writing system globally, a rather inconvenient truth for any aspiring contenders. You’ll find the Latin script entrenched as the standard method for writing languages across Western and Central Europe , vast swathes of sub-Saharan Africa, the entirety of the Americas, and the island nations of Oceania. Its pervasive reach extends even further, encompassing numerous languages in other corners of the world, making it an inescapable part of the modern linguistic landscape.

Name

The designation of this script as either the “Latin script” or the “Roman script” is, predictably, a direct reference to its foundational origins in ancient Rome . It’s a straightforward, if somewhat uninspired, naming convention, despite the rather significant detail that some of its more prominent capital letters can trace their lineage back to Greek prototypes. In contexts where one is attempting the often-thankless task of converting text from one writing system into the Latin script, the term “romanization ” (or, for those across the pond, “romanisation” in British English ) is frequently encountered. [2] [3] The global arbiters of digital character encoding, Unicode , have, with typical efficiency, settled on the concise term “Latin” [4] for this script. This pragmatic approach is echoed by the International Organization for Standardization (ISO), [5] further solidifying its universally accepted, if somewhat bland, moniker.

It’s important to delineate the script itself from its associated numerical system. The system for denoting quantities, known as the Roman numeral system , is a distinct entity, with its collection of elements (I, V, X, L, C, D, M) forming what we know as Roman numerals . However, the numbers we commonly use in everyday life – 1, 2, 3, and so forth – are not, in fact, Roman numerals in their graphical representation. These are the Latin/Roman script renditions of the symbols derived from the Hindu–Arabic numeral system , a system that, for all its practical utility, is entirely separate from the historical Roman method of calculation. Clarity, apparently, is a constant struggle.

ISO basic Latin alphabet

The foundational set of characters recognized as the ISO basic Latin alphabet forms the essential core of this ubiquitous script. It comprises the following letters, presented in both their stately uppercase and more common lowercase forms:

Uppercase Latin alphabet

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Lowercase Latin alphabet

a b c d e f g h i j k l m n o p q r s t u v w x y z

Historically, the original Latin alphabet, in its elegant simplicity, utilized the letters I and V to represent both consonant and vowel sounds. This duality, while perhaps charmingly minimalist to the Romans, proved to be an increasing source of inconvenience and ambiguity as the alphabet was adapted to the more phonetically diverse structures of Germanic languages and Romance languages . The inherent lack of distinct characters for certain common sounds necessitated innovation, leading to the eventual introduction of new letters.

Consider the letter W . It didn’t just appear out of thin air; it originated, quite literally, as a doubled V (VV). This ingenious, if somewhat inelegant, solution was first employed to represent the voiced labial–velar approximant sound, often transcribed as /w /, a phoneme notably present in Old English as early as the 7th century. While initially a novel orthographic choice, its utility led to its gradual integration into common use by the later 11th century. Its adoption effectively displaced the older, more enigmatic letter wynn (Ƿ ƿ), which had previously served to denote the same sound.

The evolution of U and J followed a similar path of pragmatic necessity. In the emerging Romance languages , the minuscule form of V naturally evolved into a rounded ‘u’. This organic development then spurred the creation of a distinct, rounded capital U to exclusively represent the vowel sound, a convention that solidified in the 16th century. Concurrently, a new, more angular minuscule ‘v’ was derived from the original V to unequivocally denote the consonant. In the case of I, a word-final swash form, ‘j’, began to be used for the consonant sound, thereby restricting the simpler, unswashed form of ‘i’ primarily to vowel usage. These conventions, however, were not immediately universal; they remained somewhat erratic and inconsistently applied for several centuries, a testament to the slow, often messy, nature of linguistic standardization. The letter J, for instance, was introduced into English to represent the consonant sound in the 17th century (though it had been rarely used as a vowel). Yet, it wasn’t universally recognized as a truly distinct letter within the established alphabetic order until a century later, in the 19th century.

By the 1960s, an era of burgeoning technological advancement, it became glaringly apparent to the burgeoning computer and telecommunications industries, particularly within the nations of the First World , that a standardized, non-proprietary method for encoding characters was not merely desirable, but absolutely essential. In response to this pressing need, the International Organization for Standardization (ISO) undertook the task of formally encapsulating the Latin alphabet within its (ISO/IEC 646 ) standard. To ensure the broadest possible acceptance and utility, this encapsulation was wisely predicated upon existing, widespread popular usage. Given the undeniable preeminence of the United States in both the nascent computer and telecommunications sectors during this pivotal decade, the ISO standard was logically based on the already published American Standard Code for Information Interchange, more commonly and succinctly known as ASCII . This foundational standard included within its character set the complete 26 letters of the English alphabet , represented in both their uppercase and lowercase forms (26 × 2). Subsequent and more expansive standards issued by the ISO, such as ISO/IEC 10646 (which underpins Unicode Latin ), have consistently maintained this definition of the 26 × 2 letters of the English alphabet as the fundamental basic Latin alphabet, prudently extending it with additional characters and diacritics to accommodate the orthographic requirements of other languages across the globe.

Spread

The distribution of the Latin script across the globe is a testament to its historical resilience and the far-reaching influence of the cultures that adopted it.

Latin script is the sole official (or de facto official) national script.
Latin script is a co-official script at the national level.
Latin script is not officially used.

One might observe that Latin-script alphabets are sometimes extensively used even in the areas colored grey on the map, often due to the prevalence of unofficial second languages, such as French in Morocco or English in Egypt. Furthermore, the practice of Latin transliteration of official scripts, like pinyin in China, further demonstrates its pervasive, if sometimes unofficial, reach.

Spread of the Latin script

The journey of the Latin alphabet beyond its humble beginnings on the Italian Peninsula is intrinsically tied to the expansionist ambitions of the Roman Empire . As Roman legions marched and Roman governance solidified, the Latin script spread inexorably, hand-in-hand with the Latin language itself, to encompass the vast territories surrounding the Mediterranean Sea . However, this expansion was not uniform. The eastern half of the Empire, encompassing regions like Greece, Türkiye, the Levant , and Egypt, had a deeply entrenched linguistic tradition, and consequently, Greek maintained its status as the prevailing lingua franca for administration, commerce, and intellectual discourse. In stark contrast, Latin became the dominant spoken language throughout the western half of the Empire. As the Empire eventually fractured and the western Romance languages gradually evolved from Vulgar Latin, they, quite naturally, continued to utilize and adapt the Latin alphabet, thus ensuring its continued prominence in these regions.

Middle Ages

The Middle Ages proved to be another pivotal period for the Latin alphabet’s expansion, driven largely by the relentless spread of Western Christianity . As missionaries and monastic orders ventured forth, the Latin alphabet, inextricably linked to the liturgy and sacred texts of the Roman Catholic Church , was gradually embraced by a diverse array of peoples across Northern Europe . This included speakers of various Celtic languages , for whom it eventually displaced the more ancient and somewhat cryptic Ogham alphabet. Similarly, the Germanic languages , previously recorded using earlier Runic alphabets , found a new, more standardized form of expression in the Latin script. Even the speakers of Baltic languages and several Uralic languages , most notably Hungarian , Finnish , and Estonian , eventually succumbed to its pervasive influence, adopting it for their respective written traditions.

The script’s influence also extended eastward, permeating the linguistic landscape of the West Slavic languages and certain South Slavic languages . This adoption was a direct consequence of these populations embracing Roman Catholicism , drawing a clear cultural and religious demarcation line. In contrast, the speakers of East Slavic languages largely adopted the Cyrillic alphabet, a script intrinsically linked with the Eastern Orthodox Church . This historical religious divide continues to manifest in the orthographic choices of nations to this day. A particularly interesting case is the Serbian language , which, with a rather pragmatic flexibility, employs both scripts. While Cyrillic typically predominates in official communications, as stipulated by the Law on Official Use of the Language and Alphabet, [6] the Latin script enjoys widespread use in other, less formal contexts, a testament to its enduring versatility even in a traditionally Cyrillic sphere.

Since the 16th century

By the dawn of the 16th century, the geographical dominion of the Latin script, while significant, was still largely confined to the languages spoken within Western , Northern , and Central Europe . Beyond these boundaries, a rich tapestry of other writing systems prevailed. The Orthodox Christian Slavs residing in Eastern and South-eastern Europe predominantly utilized Cyrillic , a script that symbolized their religious and cultural ties. The ancient Greek alphabet continued its unbroken legacy among Greek speakers scattered across the eastern Mediterranean. Further afield, the Arabic script held sway as a widespread and sacred writing system within the expansive Islamic world, embraced not only by Arabs but also by numerous non-Arab nations, including the Iranians , Indonesians , Malays , and various Turkic peoples . The vast majority of the remainder of Asia, with its profound historical depth, relied on a diverse array of Brahmic scripts or the intricate complexity of the Chinese script .

However, this relatively stable linguistic landscape was dramatically reshaped by the advent of European colonization . With the relentless expansion of European powers across the globe, the Latin script, carried by the languages of the colonizers—primarily forms based on the Spanish , Portuguese , English , French , German , and Dutch alphabets—spread like a linguistic wildfire. It reached the distant shores of the Americas , the scattered islands of Oceania , and permeated various regions of Asia, Africa, and the Pacific.

This colonial imposition, or rather, propagation, led to the adoption of the Latin script for numerous Austronesian languages , a remarkable shift for a language family with its own rich indigenous writing traditions. Notable examples include the languages of the Philippines and the Malaysian and Indonesian languages , where the Latin script largely replaced earlier Arabic and indigenous Brahmic alphabets, a testament to the sheer force of cultural and administrative influence. Interestingly, the Latin letters even served as a structural basis for the forms of the Cherokee syllabary developed by the visionary Sequoyah ; however, it’s crucial to note that the sound values assigned to these visually familiar characters are entirely, and deliberately, different. citation needed

Under the persistent influence of Portuguese missionaries, a Latin alphabet was ingeniously devised for the Vietnamese language , a linguistic feat considering it had previously relied on the complex and logographic Chinese characters . Similarly, these same Portuguese and other European missionaries, arriving with their distinctive scripts in Goa on the west coast of India during the sixteenth and seventeenth centuries, introduced a Roman script for the Konkani language , an Indo-Aryan language that had previously employed various indigenous scripts. [7] In Vietnam, the Latin-based alphabet eventually replaced the Chinese characters in administrative use during the 19th century, coinciding with the era of French colonial rule, underscoring how deeply interwoven linguistic shifts often are with political power dynamics.

Since the 19th century

The 19th century witnessed further significant shifts in script usage, often driven by burgeoning national identities and cultural reorientations. In a notable instance of linguistic introspection, the Romanians , recognizing their Romance linguistic heritage, made a deliberate return to the Latin alphabet, consciously abandoning the Romanian Cyrillic alphabet that had been in use. This move underscored their cultural alignment with Western Europe, reinforcing the fact that Romanian is indeed one of the Romance languages , a direct descendant of Latin.

Since 20th century

The 20th century, a period of immense political and social upheaval, brought with it even more dramatic changes in script adoption. In 1928, as a cornerstone of Mustafa Kemal Atatürk ’s sweeping reforms aimed at modernizing and Westernizing the nascent Republic of Türkiye , a Latin alphabet was officially adopted for the Turkish language . This monumental decision replaced a modified Arabic alphabet that had served the language for centuries, symbolizing a decisive break with Ottoman traditions and a pivot towards a more European identity.

Across the vast Eurasian landmass, most of the Turkic -speaking peoples within the former USSR , including groups such as the Tatars , Bashkirs , Azeri , Kazakh , Kyrgyz , and others, experienced their own orthographic rollercoaster. In the 1930s, their diverse writing systems were replaced by the Latin-based Uniform Turkic alphabet , a grand Soviet project aimed at linguistic standardization. However, this Latinization was short-lived; by the 1940s, a new directive from Moscow mandated that all these languages switch to Cyrillic, a move often interpreted as an attempt to strengthen cultural ties to Russia and the broader Soviet identity.

Following the seismic collapse of the Soviet Union in 1991, many of these newly independent nations seized the opportunity to reassert their cultural and linguistic autonomy. Three of the Turkic-speaking republics – Azerbaijan , Uzbekistan , and Turkmenistan – along with Romanian-speaking Moldova , officially reverted to Latin alphabets for their respective national languages. This was a powerful symbolic act of distancing themselves from their Soviet past. Conversely, Kyrgyzstan , the Iranian -speaking nation of Tajikistan , and the breakaway region of Transnistria opted to retain the Cyrillic alphabet, a decision largely influenced by their continued close political and cultural ties with Russia.

Elsewhere, in the 1930s and 1940s, the majority of Kurds also undertook a significant orthographic change, replacing the Arabic script they had traditionally used with two distinct Latin alphabets. While the official Kurdish government in Iraq continues to employ an Arabic alphabet for public documents, the Latin Kurdish alphabet has achieved widespread adoption and remains extensively used throughout the broader region by the majority of Kurdish -speakers, particularly in private and informal contexts.

In 1957, the People’s Republic of China , in its own efforts towards linguistic modernization and standardization, introduced a script reform for the Zhuang language . This involved a radical shift in its orthography from Sawndip , a complex writing system heavily based on adapted Chinese characters, to a Latin script alphabet. This new system ingeniously combined elements of Latin, Cyrillic, and even IPA letters to represent both the distinct phonemes and the intricate tones of the Zhuang language, notably without the cumbersome use of diacritics . This hybrid system was further simplified and standardized in 1982 to rely exclusively on Latin script letters, streamlining its use and potentially facilitating literacy.

Finally, with the collapse of the Derg regime in 1991 and the subsequent end of decades of Amharic linguistic assimilation in Ethiopia , various ethnic groups seized the opportunity to discard the Geʽez script . This ancient script, while historically significant, was increasingly deemed unsuitable and cumbersome for languages outside of the Semitic branch . [8] In the years that followed, languages such as Kafa , [9] Oromo , [10] Sidama , [11] Somali , [11] and Wolaitta all made the definitive switch to Latin-based orthographies. The debate, however, continues for others, with ongoing discussions about whether to follow suit for the Hadiyya and Kambaata languages. [12]

21st century

The 21st century has continued this pattern of linguistic evolution and political maneuvering around script choices. On 15 September 1999, the authorities of Tatarstan , a republic within Russia, passed a law with the ambitious goal of making the Latin script a co-official writing system alongside Cyrillic for the Tatar language by 2011. [13] However, this assertion of regional linguistic autonomy was met with swift opposition; a mere year later, the Russian government, emphasizing central control, overruled the law and effectively banned Latinization within its federal territory. [14]

In Central Asia, the government of Kazakhstan announced in 2015 a significant linguistic policy: a Kazakh Latin alphabet would officially replace the Kazakh Cyrillic alphabet as the standard writing system for the Kazakh language by 2025. [15] This move, much like Turkey’s earlier transition, signals a clear cultural and political orientation away from its Soviet past and towards a more globalized, and often Latin-script-dominated, future. Concurrently, there have been ongoing discussions and proposals about similar shifts from Cyrillic to Latin in Ukraine, [16] Kyrgyzstan , [17] [18] and even Mongolia . [19] Mongolia, however, has since opted for a different path, choosing to revive the traditional Mongolian script instead of making the switch to Latin, [20] a decision that speaks to a desire for preserving unique cultural heritage.

In a move towards greater linguistic unity and accessibility, Inuit Tapiriit Kanatami (ITK), the national organization representing Inuit in Canada, announced on 15 October 2019, its intention to introduce a unified writing system for the various Inuit languages spoken across the country. This new writing system is based on the Latin alphabet and draws inspiration from the successful model already in use for the Greenlandic language . [21]

Further demonstrating the ongoing nature of these orthographic transitions, the government of Uzbekistan declared on 12 February 2021 that it would finalize the complete transition from Cyrillic to Latin for the Uzbek language by 2023. [22] This wasn’t a new initiative; plans to switch to Latin had originally begun as far back as 1993 but had subsequently stalled, resulting in Cyrillic remaining in widespread use for decades. The renewed commitment highlights the persistent drive for linguistic reform. [23]

Currently, the Crimean Tatar language navigates a complex linguistic landscape, utilizing both Cyrillic and Latin scripts. The use of Latin was initially approved by Crimean Tatar representatives in the aftermath of the Soviet Union’s dissolution [24] but was never formally implemented by the regional government. Following Russia’s annexation of Crimea in 2014, the Latin script was effectively dropped entirely within the peninsula. Nevertheless, Crimean Tatars residing outside of Crimea have continued to use Latin, and in a significant development on 22 October 2021, the government of Ukraine officially approved a proposal, strongly endorsed by the Mejlis of the Crimean Tatar People , to transition the Crimean Tatar language to the Latin script by 2025. [25] This decision reflects both political alignment and a commitment to cultural self-determination.

As of July 2020, a staggering 2.6 billion people, representing approximately 36% of the world’s population, actively use the Latin alphabet in some form or another. [26] Its global dominance, it seems, is undeniable.

International standards

By the 1960s, a critical juncture in the history of information technology, it became glaringly obvious to the burgeoning computer and telecommunications industries, particularly those operating within the economically advanced nations of the First World , that a universally recognized, non-proprietary method for encoding characters was not merely a convenience but an absolute necessity. Without such a standard, interoperability between different systems would remain a chaotic, proprietary mess.

In a commendable effort to address this challenge, the International Organization for Standardization (ISO) took on the monumental task of formally encapsulating the Latin alphabet within its (ISO/IEC 646 ) standard. To ensure widespread adoption and practical utility, this encapsulation was wisely founded upon existing popular usage. Given that the United States held a undeniably preeminent position in both these industries during that pivotal decade, the standard was naturally based on the already widely published American Standard Code for Information Interchange, universally known as ASCII . This foundational encoding standard included, as its core, the 26 letters of the English alphabet , represented in both their uppercase and lowercase forms (a total of 26 × 2 characters). Subsequent, more comprehensive standards issued by the ISO, such as ISO/IEC 10646 (which forms the basis for Unicode Latin ), have consistently continued to define these 26 × 2 letters of the English alphabet as the fundamental “basic Latin alphabet,” while simultaneously providing extensive extensions to accommodate the myriad other letters and diacritics required by other languages around the world. It’s a pragmatic compromise, ensuring both a universal baseline and the necessary flexibility.

National standards

Beyond international agreements, individual nations and regional blocs also develop their own standards to manage the intricacies of character encoding and data exchange. The DIN standard DIN 91379 serves as a prime example of such national-level standardization, specifically addressing a subset of Unicode letters, special characters, and defined sequences of letters and diacritic signs. Its primary objective is to ensure the correct and unambiguous representation of names and to simplify the often-complex process of data exchange within Europe. This particular specification is remarkably comprehensive, supporting all official languages of the European Union and European Free Trade Association countries, which, rather tellingly, includes not only Latin-based scripts but also the Greek and Cyrillic scripts, alongside the various German minority languages . clarification needed Furthermore, to facilitate the transliteration of names from other writing systems into the Latin script, meticulously adhering to the relevant ISO standards, the DIN 91379 standard provides all the necessary combinations of base letters and their corresponding diacritic signs. [27] Efforts are currently underway to further refine and develop this robust national standard into a broader European CEN standard, aiming for even greater interoperability and consistency across the continent. [28]

As used by various languages

The Latin alphabet, for all its perceived simplicity, is a chameleon. In the course of its remarkably widespread use, it has been, perhaps grudgingly, adapted for application in a bewildering array of new languages. This process has often necessitated representing phonemes and sounds that were, quite naturally, not present in the languages that were already comfortably written with the Roman characters. To adequately represent these novel sounds, a rather creative suite of extensions had to be developed. This involved everything from the elegant addition of diacritics to existing letters , to the pragmatic joining of multiple letters together to form distinctive ligatures , and even, in more extreme cases, the creation of entirely new graphic forms. Alternatively, a special function might be assigned to simple pairs or triplets of letters, a more subtle form of adaptation. These new forms, whether modified or entirely novel, are then integrated into the alphabet by defining a specific alphabetical order or collation sequence, a system that, much like human preferences, can vary considerably depending on the particular language in question.

Letters

For a more comprehensive and perhaps exhaustive enumeration, one might consult the List of Latin-script letters .

Some rather compelling examples of these “new” letters, additions beyond the standard Latin alphabet, include the distinctive Runic letters wynn (Ƿ ƿ) and thorn (Þ þ), alongside the letter eth (Ð/ð). These were all thoughtfully incorporated into the alphabet of Old English to capture sounds unique to that language. Another intriguing Irish letter, the insular ‘g’, underwent its own evolution, developing into yogh (Ȝ ȝ), a character that found its place in Middle English . However, the relentless march of linguistic evolution eventually rendered some of these innovations obsolete, at least in English. Wynn was later replaced by the more familiar and modern letter ‘w’, while eth and thorn were superseded by the digraph ’th’ (as in the pronunciation of English th ). Yogh, too, faded from common use, giving way to the digraph ‘gh’ (as in gh (digraph) ). Yet, proving that nothing truly vanishes in linguistics, while these four letters are no longer part of the contemporary English or Irish alphabets, eth and thorn have, rather stubbornly, persisted and are still actively used in the modern Icelandic alphabet , with eth also making a notable appearance in the Faroese alphabet .

Moving to other corners of the world, certain West, Central, and Southern African languages have also adopted a few additional letters into their orthographies, each designed to represent specific sounds with values often similar to their counterparts in the IPA . For instance, the Adangme language employs the letters Ɛ ɛ and Ɔ ɔ, while the Ga language utilizes Ɛ ɛ, Ŋ ŋ, and Ɔ ɔ. The Hausa language features the distinctive Ɓ ɓ and Ɗ ɗ to represent its characteristic implosives , and Ƙ ƙ for an ejective . Recognizing the need for consistency across these diverse linguistic systems, Africanists have commendably standardized these various additions into the comprehensive African reference alphabet .

A particularly fascinating pair of distinctions arises in the form of the dotted and dotless I — İ i and I ı. These are not merely stylistic variants but are, in fact, two distinct forms of the letter I, each with its own specific phonetic value and usage, employed by the Turkish , Azerbaijani , and Kazakh alphabets. [29] The Azerbaijani language further distinguishes itself with the letter Ə ə, which precisely represents the near-open front unrounded vowel , a sound that might otherwise be ambiguously rendered.

Multigraphs

For those who find single letters insufficiently complex, there are multigraphs . A digraph is, quite simply, a pair of letters that conspire to represent a single sound, or occasionally a combination of sounds, that does not directly correspond to the individual written letters taken in sequence. One might consider them an orthographic shortcut. Familiar examples from English include ‘ch’ (as in ch (digraph) ), ’ng’ (as in eng (letter) ), ‘rh’ (as in rh (digraph) ), ‘sh’ (as in sh (digraph) ), ‘ph’ (as in phi ), and ’th’ (as in the pronunciation of English th ). The Dutch language, with its own phonetic quirks, offers ‘ij’ (as in IJ (digraph) ), ’ee’, ‘ch’ (as in ch (digraph) ), and ’ei’. In Dutch, the ‘ij’ digraph presents an interesting capitalization rule: it is capitalized as ‘IJ’ or sometimes as the ligature ‘Ĳ’, but never as ‘Ij’. Furthermore, in handwriting, it frequently assumes the appearance of a ligature ‘ĳ’, which bears a striking resemblance to the letter ‘ÿ’.

Taking complexity a step further, a trigraph is, as its name suggests, composed of three letters working in concert. Examples include the German ‘sch’ (as in sch (trigraph) ), the Breton ‘c’h’ (as in c’h (trigraph) ), or the Milanese ‘oeu’. In the intricate orthographies of some languages, these digraphs and trigraphs are not merely perceived as letter combinations but are formally recognized as independent letters of the alphabet in their own right, a testament to their distinct phonetic roles. The rules governing the capitalization of digraphs and trigraphs are, predictably, language-dependent. In some cases, only the initial letter is capitalized, while in others, all component letters are capitalized simultaneously, even when appearing in words written in title case , where letters following the multigraph might remain in lowercase. Consistency, it seems, is a luxury.

Ligatures

A ligature is a rather elegant solution born from the practicalities of typography, representing a fusion of two or more ordinary letters into a single, cohesive glyph or character. It’s a visual shorthand, often designed to improve readability or aesthetic flow. Examples, which often carry their own distinct historical baggage, include Æ æ (derived from ‘AE’, and rather charmingly called ‘ash’), Œ œ (originating from ‘OE’, sometimes referred to as ‘oethel’ or ’eðel’), and the ubiquitous abbreviation ‘&’ (which is, perhaps surprisingly, a graphical evolution of the Latin word et, lit. ‘and’, and is known as an ‘ampersand’). A particularly striking example from German is ẞ ß (derived from a combination of the archaic medial form of ’s’, ſ, followed by either ‘ʒ’ or ’s’, and known as ‘sharp S’ or ’eszett’). These ligatures are not just typographical curiosities; they often represent distinct phonetic units or historical linguistic developments, embodying centuries of written tradition in a single, condensed form.

Diacritics

The letter a with an acute diacritic

A diacritic , occasionally and perhaps more colloquially referred to as an “accent,” is a rather subtle yet profoundly impactful small symbol. It can manifest above or below a letter, or in various other positions adjacent to it. Consider the umlaut sign gracing the German characters ä , ö , and ü , or the distinctive marks adorning the Romanian characters ă , â , î , ș , and ț . Its primary and most direct function is to subtly, or sometimes dramatically, alter the phonetic value of the letter to which it is appended. However, its influence can extend further, modifying the pronunciation of an entire syllable or even a complete word. Diacritics can also serve to indicate the precise beginning of a new syllable, or, with commendable precision, distinguish between homographs – words that are spelled identically but carry different meanings or pronunciations. A classic illustration comes from the Dutch language , where een (pronounced ən ) signifies “a” or “an,” while één (pronounced e:n ) unambiguously means “one.” As with the pronunciation of letters themselves, the specific effect and interpretation of a diacritic are entirely dependent on the language in which it is used, a testament to the charming chaos of human communication.

English stands as a rather notable outlier among the major modern European languages , in that it generally requires no diacritics for its native vocabulary. [note 1] This is not to say English is entirely free of them; historically, particularly in more formal writing, a diaeresis was occasionally employed to clearly indicate the start of a new syllable within a sequence of letters that might otherwise be misconstrued as a single vowel sound. Examples like “coöperative” or “reëlect” were once common. However, contemporary writing styles have largely abandoned such marks, opting instead to either omit them entirely or to utilize a hyphen to unequivocally signal a syllable break (e.g., “co-operative,” “re-elect”). [note 2] [30] The practicality of simplicity, it seems, often wins out over orthographic precision in the long run, leaving behind a trail of abandoned diacritics like forgotten linguistic relics.

Collation

Collation is the often-overlooked art of establishing a precise order for characters, a system that dictates how words are sorted in dictionaries, phone books, and databases. Some modified letters, particularly those adorned with diacritics , such as the Swedish symbols å , ä , and ö , are considered to be entirely new, individual letters in their own right. Consequently, they are assigned a specific, distinct place within the alphabet for collation purposes, separate from the base letter from which they were derived. This is a common practice in the Swedish alphabet , reflecting the unique phonological distinctions these characters represent.

In other linguistic contexts, however, this approach is not taken. For instance, in German, the characters ä , ö , and ü , while visually distinct and phonetically significant, are not treated as separate letters for sorting. Instead, these letter-diacritic combinations are identified and collated alongside their unmodified base letters (a, o, u, respectively). The same principle often applies to digraphs and trigraphs , where the combination is treated as a single unit but sorted under its initial letter. It’s also worth noting that different diacritics within a single language may be treated dissimilarly during collation. In Spanish, for example, the character ñ is unequivocally considered a distinct letter of the alphabet, and thus it is sorted between ’n’ and ‘o’ in dictionaries. However, the accented vowels (á, é, í, ó, ú, ü) are not separated from their unaccented counterparts (a, e, i, o, u); they are collated together, demonstrating a nuanced approach to orthographic ordering. It’s a system that, much like human nature, is full of subtle distinctions and occasional inconsistencies.

Capitalization

The languages that have, for better or worse, adopted the Latin script today generally adhere to a set of rather predictable, if sometimes arbitrary, rules for capitalization . These typically dictate that paragraphs, sentences, and proper nouns should commence with capital letters . However, the specific rules governing capitalization have, much like fashion trends, undergone considerable evolution over time, and different languages have, quite naturally, developed their own distinct conventions.

Consider, for instance, Old English , where the practice of capitalizing even proper nouns was, to our modern sensibilities, surprisingly rare. Fast forward to Modern English in the 18th century, and one would frequently encounter a rather excessive capitalization scheme, where all nouns were capitalized. This mirrored, and indeed still mirrors, the practice in Modern German , where nouns retain their initial capital letter as a grammatical rule. For example, in German: Alle Schwestern der alten Stadt hatten die Vögel gesehen, which lit. translates to ‘All of the Sisters of the old City had seen the Birds’. It’s a stylistic choice that, while consistent within its own system, highlights the diverse and often idiosyncratic paths linguistic conventions can take.

Romanization

When encountering words from languages that are natively inscribed using other scripts , such as the flowing Arabic script or the intricate Chinese characters, these words are typically either transliterated (converting character by character) or transcribed (converting sound by sound) when they are embedded within Latin-script text or utilized in broader multilingual international communication. This entire process is, rather uncreatively, termed romanization .

While the romanization of such languages is predominantly employed at unofficial levels, it gained particular prominence and necessity in early computer messaging. In those simpler times, only the rather limited seven-bit ASCII code was reliably available on older systems, making the direct input of non-Latin characters a technical impossibility. However, with the advent and widespread adoption of Unicode , which offers a vastly expanded character repertoire, the practical necessity of romanization has, thankfully, become somewhat less pressing. Despite this, keyboards designed for entering such text may still restrict users to romanized input, a lingering legacy of older technological constraints, as only ASCII or Latin-alphabet characters might be readily available. It’s a reminder that technological progress, much like human understanding, is rarely a clean, instantaneous leap.

Notes

^ In formal English writing, however, diacritics are often preserved on many loanwords , a polite nod to their origins. Examples include “café,” “naïve ,” “façade ,” “jalapeño ” or the German prefix “über -.”
^ As an example, an article containing a diaeresis in “coöperate” and a cedilla in “façade” as well as a circumflex in the word “crêpe”: Grafton, Anthony (23 October 2006). “Books: The Nutty Professors, The history of academic charisma”. The New Yorker . [30]

Latin script

Name

ISO basic Latin alphabet

ISO basic Latin alphabet

Spread

Spread of the Latin script

Middle Ages

Since the 16th century

Since the 19th century

Since 20th century

21st century

International standards

National standards

As used by various languages

Letters

Multigraphs

Ligatures

Diacritics

Collation

Capitalization

Romanization

See also

Notes