Vanishing Languages: Hidden Knowledge at Risk of Extinction

Across the world, dying languages carry irreplaceable ecological knowledge, mathematical systems, and medical insights that disappear forever when the last speaker falls silent — and linguists are racing against time to recover them.

Vanishing Languages: Hidden Knowledge at Risk of Extinction

The Library That Burns Every Time a Language Dies

Somewhere in the highlands of Papua New Guinea, a language called Koro is spoken by fewer than 800 people. It is unrelated to any neighboring tongue, and it encodes a system of spatial reasoning so distinct from Indo-European languages that its speakers orient themselves not by left and right, but by cardinal directions at all times, even indoors. When linguists discovered Koro in 2008, they were not finding a curiosity. They were finding a different cognitive architecture, one that had evolved over thousands of years and would vanish, along with its speakers’ descendants, within a generation or two. The world currently loses approximately one language every two weeks. At that rate, roughly half of the estimated 7,000 languages alive today will be extinct by the end of this century.

What is rarely discussed in mainstream coverage is what exactly disappears alongside the grammar and the vocabulary, and why scientists across fields from ethnobotany to mathematics are increasingly alarmed. The common framing treats language death as a cultural tragedy, something to be mourned like the demolition of an old building or the disappearance of a regional cuisine. That framing, however well-intentioned, fundamentally undersells the stakes. Languages are not decorative overlays on universal human experience. They are distinct operating systems for perceiving, categorizing, and reasoning about the world, and when they go silent, they take with them entire bodies of empirical knowledge, alternative mathematical frameworks, and irreplaceable evidence about the range of human cognition. The loss is not sentimental. It is scientific.

Pharmacopeias Encoded in Grammar

Indigenous and minority languages frequently contain millennia of accumulated empirical observation about local ecosystems, and this knowledge is often untranslatable in any meaningful sense. Not because the words lack equivalents, but because the conceptual categories themselves do not exist in dominant languages. The Tzeltal Maya of southern Mexico use a pharmacological vocabulary that distinguishes dozens of plant states, including ripeness, moisture content, time of harvest, and lunar phase at picking, that correspond to measurable differences in alkaloid concentration. Western ethnobotanists working with Tzeltal speakers in the 1990s discovered that certain preparation methods described only in the language’s grammatical aspect system, not in any lexical term, were the key variable in whether a plant compound was medicinally active or inert. The language, in effect, stored a protocol. The instructions were not written in a recipe. They were written into the verb system.

This phenomenon, in which pharmacological or ecological knowledge is embedded in grammatical structure rather than in vocabulary, is far more widespread than most people realize. It means that word-for-word translation, or even careful dictionary-building, is insufficient to capture what a language knows. You can record every noun a Tzeltal elder uses to describe a medicinal plant and still lose the critical information if you fail to document the aspectual markers that specify when and how it must be processed. For researchers working against the clock with aging speaker communities, this creates an almost impossible challenge.

Similarly, the Seri language of Sonora, Mexico, spoken by fewer than 700 people, contains a taxonomic system for marine invertebrates that distinguishes species that European biology had not formally separated until the 20th century. Seri fishermen had maintained behavioral and ecological distinctions between organisms for generations, encoded in naming conventions and verb forms that implied different habitats, feeding patterns, and seasonal movements. When biologists finally cross-referenced this linguistic taxonomy with genetic analysis, the correspondence was striking. The language had preserved scientifically accurate distinctions that Western zoology had missed for centuries. The Seri were not operating on myth or tradition in any vague sense. They were operating on accumulated observational data, refined across generations, stored in a linguistic system that Western science was only beginning to take seriously, just as it was most at risk of disappearing.

Mathematical Systems Hidden in Number Words

The relationship between language and mathematical cognition is one of the more contested and fascinating areas of cognitive science, and the dying of languages is forcing a reckoning with assumptions that have gone largely unexamined. The Pirahã language of the Brazilian Amazon, spoken by a few hundred people along the Maici River, famously lacks number words beyond rough approximations of one, two, and many. For decades, this was treated as evidence of cognitive limitation, a kind of numerical blindness that seemed to confirm hierarchical assumptions about linguistic and intellectual development. More recent analysis, however, suggests the Pirahã system reflects a deliberate philosophical commitment to immediate experiential knowledge, a kind of radical empiricism built into the grammar itself, rather than any deficit. The language has no recursion in the standard linguistic sense, no tense, and no color terms beyond light and dark. It is not impoverished. It is organized around entirely different principles, ones that prioritize the directly witnessed over the abstract, the present over the historical.

The debate around Pirahã has been unusually contentious, partly because the implications are so significant. If a language can systematically encode a philosophy of knowledge rather than merely a vocabulary, then the loss of that language represents the loss of a coherent intellectual tradition, not just a communication system. The question of what Pirahã speakers can and cannot do mathematically without number words has generated dozens of studies, and the results remain genuinely ambiguous, a fact in itself revealing. The ambiguity suggests that the relationship between linguistic structure and cognitive capacity is more complex and more variable than the standard model allows.

More practically significant are the base systems embedded in dying languages. While most of the world operates on a base-10 number system inherited from ancient Mesopotamia, dozens of endangered languages use base-20, base-12, base-5, or even base-6 systems. The Ndom language of Papua New Guinea uses a base-6 system. Mathematicians have noted that base-12 and base-6 systems have significant computational advantages for certain types of fractional arithmetic, which is precisely why some engineers in the early computing era advocated for base-12 over base-10. A base-12 system divides evenly by 2, 3, 4, and 6, making it considerably more flexible for mental calculation involving halves, thirds, and quarters than base-10, which divides cleanly only by 2 and 5. These alternative mathematical frameworks, preserved in languages on the verge of extinction, represent independently developed solutions to problems of enumeration and calculation that the dominant mathematical tradition solved differently, and in some cases less elegantly.

The New Science of Language Rescue

The field of documentary linguistics has undergone a technological revolution over the past decade, changing what is possible in language preservation. Where earlier generations of linguists relied on handwritten field notes and reel-to-reel recordings, contemporary teams use high-definition video, 3D motion capture for sign languages and gesture systems, and machine learning models trained to assist in rapid transcription. The Endangered Languages Project, a collaborative initiative supported by Google and over 100 academic partners, has digitized materials for more than 3,500 languages since 2012. The Endangered Archives Program at the British Library has funded over 300 projects in 80 countries, producing archives of oral literature, ecological knowledge, ceremonial language, and everyday speech that would otherwise be lost within a generation.

These are genuine achievements, and the pace of documentation has accelerated meaningfully in the last decade. But digitization alone does not preserve a living knowledge system. Recordings of elders describing plant medicine or navigation techniques are only as useful as the interpretive framework that accompanies them, and that framework often requires fluent speakers to decode. A recording of a Tzeltal healer explaining a preparation method is not the same as having a fluent Tzeltal speaker who can recognize when a student has correctly understood the aspectual nuance. Archives preserve evidence. They do not preserve the cognitive tradition that gives the evidence meaning.

This is why a growing number of linguists argue that the goal cannot simply be documentation but must be revitalization, creating conditions under which communities choose to transmit their languages to children. The most successful modern example is Welsh, which went from approximately 500,000 speakers in 1991 to over 880,000 by 2021, largely through mandatory Welsh-medium education and a sustained public broadcasting infrastructure. The Maori language of New Zealand underwent a similar trajectory after the establishment of Kura Kaupapa Maori immersion schools in 1985. Hawaiian, once reduced to fewer than 2,000 native speakers in the 1980s, now has an estimated 18,000 speakers following decades of immersion schooling. These cases demonstrate that revitalization is possible, but they also share a common feature that is not always replicable: a critical mass of remaining speakers, sustained political will, and significant institutional investment over multiple decades. For languages with fewer than a hundred speakers and no state apparatus behind them, the window for revitalization is narrow and closing.

What Cognitive Science Loses in the Silence

Beyond ecology and mathematics, dying languages represent irreplaceable data for understanding the range of human cognition. The Guugu Yimithirr language of Queensland, Australia, the source of the word kangaroo, borrowed by Captain Cook’s crew in 1770, uses absolute rather than relative spatial reference, as do several other Australian Aboriginal languages. Speakers of these languages maintain a continuous, automatic sense of cardinal direction so precise that researchers testing them found they could point accurately toward distant landmarks in unfamiliar rooms with no windows. This is not a cultural habit layered on top of ordinary cognition. Neuroimaging and behavioral studies suggest it reflects a fundamentally different allocation of spatial processing resources in the brain, one that develops in childhood through linguistic immersion and reshapes the underlying neural architecture over time.

The Kuuk Thaayorre, another Queensland language group, arrange time spatially from east to west rather than from left to right or front to back as English speakers do. When asked to arrange photographs of events in chronological order, Kuuk Thaayorre speakers consistently oriented the sequence toward the east regardless of which direction they themselves were facing, a finding that directly challenges the assumption that temporal cognition is universal and language-independent. These are not anecdotes. They are data points in an ongoing scientific argument about the degree to which language shapes thought, an argument that cannot be resolved if the languages themselves disappear before they can be studied.

What makes this especially urgent is that the scientific study of linguistic diversity is still in its early stages. Many of the most significant findings about how language structures cognition have emerged only in the last thirty years, as researchers moved beyond the small sample of well-documented European and East Asian languages that dominated earlier cognitive science. The field is only beginning to understand the full range of variation, and that range is shrinking faster than it can be mapped. Each extinction is not merely a cultural loss. It is the permanent closure of a scientific experiment that took thousands of years to run, conducted by communities who had no idea they were running it, and observed by a scientific community that arrived, in most cases, far too late.

Related Fun Facts:
← Back

Subscribe for weekly updates!