Right. So you want to dissect the mechanics of language, but through a lens that's, what, less... human? Fine. Don't expect me to hold your hand. This is Computational linguistics, after all. It’s about the how, not the why anyone bothers.
Use of computational tools for the study of linguistics
This isn't some fluffy academic pursuit. This is about the cold, hard logic of how we string words together, and how we can make machines understand it. It’s a scientific field, naturally. And for those who get lost in the jargon, there’s a journal. Computational Linguistics (journal). Don't say I didn't warn you.
This whole endeavor is part of a larger tapestry, the grand, often tedious, subject of Linguistics. If you’re feeling ambitious, there’s an Outline, a dive into its History, and an Index for the truly dedicated.
General linguistics
This is where the foundational pieces of language are chipped away at. We’re talking about the changes over time in Diachronic studies, the meticulous craft of Lexicography (dictionaries, how quaint), the building blocks of words in Morphology (linguistics), the sound systems in Phonology, the unspoken rules of interaction in Pragmatics, the very meaning of it all in Semantics, and the skeletal structure of sentences in Syntax. Then there's the messy intersection of syntax and meaning, the Syntax–semantics interface, and the vast diversity of languages in Typology. It's a lot.
Applied linguistics
This is where the theory gets its hands dirty. Think about how we Acquisition of language, the cultural nuances in Anthropological linguistics, the practical applications in Applied linguistics, and the pure numbers in Mathematical linguistics. Then, of course, there's the reason we're here: Computational linguistics. We dissect Conversation analysis, build vast Corpus linguistics archives, break down Discourse analysis, and even dabble in the unsettling idea of Determinism in language. We measure Distance, painstakingly document languages through Documentation, explore the Ethnography of communication and the intricate dance of Ethnomethodology. There's Forensic work, tracing the History of linguistics itself, the curious world of Interlinguistics, the brain's language center in Neurolinguistics, the ancient art of Philology, the philosophical underpinnings in Philosophy of language, the physical sounds in Phonetics, the mind's language processing in Psycholinguistics, the social fabric of language in Sociolinguistics, the structure of Text itself, and the complex art of Translating and interpreting. And, of course, the very symbols we use in Writing systems.
Theoretical frameworks
These are the lenses through which we view language. There are the strict, rule-bound Formalist approaches, focusing on Constituency and Dependency. Then there's Distributionalism, the powerful engine of Generative grammar, the intricate system of Glossematics, and the diverse world of Functional approaches. This includes the mind-mapping of Cognitive grammar, the building blocks of Construction grammar, the dynamic flow of Functional discourse grammar, the evolution of grammar through Grammaticalization, the interactive nature of Interactional linguistics, the structured thought of the Prague circle, the layered meaning of Systemic functional linguistics, the evidence-based Usage-based models of language, and the foundational principles of Structuralism.
Topics
These are the specific puzzles we try to solve. The debate over the Autonomy of syntax, the principle of Compositionality, the tension between Conservative and innovative language, the divide between Descriptivism and Prescriptivism, the search for word origins in Etymology, the concept of Iconicity in language, the digital realm of Internet linguistics, the nuances of LGBTQ linguistics, the elusive Origin of language, the obscure field of Orismology, the rules of Orthography, the philosophical questions in Philosophy of linguistics, the process of Second-language acquisition, the fundamental Theory of language, and the precise use of Terminology.
Computational linguistics
This is where the gears grind. It's an interdisciplinary field, which means a lot of people trying to talk to each other. We're concerned with the computational modelling of natural language – making machines understand what we say. It's a messy business, pulling from linguistics, of course, but also computer science, the wild west of artificial intelligence, the sterile precision of mathematics, the rigid structure of logic, the abstract musings of philosophy, the intricate workings of the mind in cognitive science and cognitive psychology, the language processing in the brain via psycholinguistics, the cultural context from anthropology, and the very wiring of our being in neuroscience. It’s closely related to mathematical linguistics. Just another field trying to make sense of the chaos.
Origins
This whole mess started to overlap with artificial intelligence back in the 1950s. The Americans, bless their ambitious hearts, wanted to use computers to translate foreign languages – specifically, Russian scientific journals. They figured if computers could crunch numbers faster and more accurately than humans, they could surely master lexicon, morphology, syntax, and semantics with a few explicit rules. Turns out, that was a bit optimistic. After the great failure of rule-based approaches, a certain David Hays coined the term "computational linguistics" to distance it from AI. He even co-founded the Association for Computational Linguistics (ACL) and the International Committee on Computational Linguistics (ICCL). What began as a clumsy translation effort has since morphed into the sprawling beast known as natural language processing.
Annotated corpora
To really dissect the English language, you need data. Lots of it. And not just raw text. You need it annotated. The Penn Treebank was one such effort, a monumental undertaking. It amassed IBM computer manuals, transcribed phone calls – over 4.5 million words of American English, meticulously tagged with part-of-speech information and syntactic structures. A digital dissection table, if you will.
Even Japanese sentence corpora have yielded their secrets, revealing a pattern of log-normality in sentence lengths. Fascinating.
Modeling language acquisition
Children learn language. It's a miracle, or at least a highly complex process. At the heart of it is the problem of language acquisition. Children are mostly exposed to positive evidence – what is said, not what isn't. This was a major hurdle for early models, especially before the advent of deep learning in the late 1980s.
But it turns out, with a steady diet of simple input and gradually improving memory and attention spans, languages can be learned. This explains, in part, the protracted period of language acquisition in human infants. Some researchers even used robots to test these theories. These artificial learners, mimicking children, built word-to-meaning mappings from action and perception, often without needing explicit grammatical structures. It’s like teaching a child by showing, not by lecturing.
And then there’s the Price equation and Pólya urn dynamics. Researchers have used these to predict linguistic evolution and peer back into the murky past of modern languages. It’s a bit like digital archaeology.
Chomsky's theories
Of course, you can't talk about linguistics without mentioning Noam Chomsky. His theories have cast a long shadow, even over computational linguistics, particularly concerning how infants grasp complex grammar, like that described in Chomsky normal form. The challenge remains: how does an infant learn a grammar that deviates from this theoretical ideal? Researchers are still grappling with this, combining structural analysis with computational models and those massive linguistic corpora like the Penn Treebank to find patterns. It’s a constant effort to map the intricate pathways of language acquisition.
Software
For those who want to get their hands dirty, there are tools. You might find some of these useful, or perhaps just more distractions.
See also: List of linguistic research software
See also
This is where they throw in anything vaguely related. A digital rabbit hole.
- Philosophy portal
- Artificial intelligence in fiction
- Collostructional analysis
- Computational lexicology
- Computational Linguistics (journal)
- Computational models of language acquisition
- Computational semantics
- Computational semiotics
- Computer-assisted reviewing
- Dialog systems
- Glottochronology
- Grammar induction
- Human speechome project
- Internet linguistics
- Lexicostatistics
- Natural language processing
- Natural language user interface
- Quantitative linguistics
- Semantic relatedness
- Semantometrics
- Systemic functional linguistics
- Translation memory
- Universal Networking Language