An Introduction to Historical Linguistics
Definition and Examples
Godong / Getty Images
- An Introduction to Punctuation
- Ph.D., Rhetoric and English, University of Georgia
- M.A., Modern English and American Literature, University of Leicester
- B.A., English, State University of New York
Historical linguistics —traditionally known as philology—is the branch of linguistics concerned with the development of languages over time (where linguistics usually looks at one language at a time, philology looks at them all).
The primary tool of historical linguistics is the comparative method , a way of identifying relations among languages that lack written records. For this reason, historical linguistics is sometimes called comparative-historical linguistics . This field of study has been around for centuries.
Linguists Silvia Luraghi and Vit Bubenik point out, "[The] official act of birth of comparative historical linguistics is conventionally indicated in Sir William Jones' The Sanscrit Language , delivered as a lecture at the Asiatic Society in 1786, in which the author remarked that the similarities between Greek, Latin , and Sanskrit hinted to a common origin, adding that such languages might also be related to Persian , Gothic and the Celtic languages," (Luraghi and Bubenik 2010).
Why Study Linguistic History?
The task of comparing insufficiently recorded languages to each other is not an easy one, but it is a worthwhile endeavor for those interested in learning about a group of people. "Linguistic history is basically the darkest of the dark arts, the only means to conjure up the ghosts of vanished centuries. With linguistic history, we reach farthest back into the mystery: humankind," (Campbell 2013).
Philology, to be useful, must take into account everything contributing to language changes. Without proper context and without studying the ways in which language is transmitted from one generation to the next, linguistic shifts could be grossly over-simplified. "[A] language is not some gradually and imperceptibly changing object which smoothly floats through time and space, as historical linguistics based on philological material all too easily suggests. Rather, the transmission of language is discontinuous, and a language is recreated by each child on the basis of the speech data it hears," (Kiparsky 1982).
Dealing With Historical Gaps
Of course, with any field of history comes a fair amount of uncertainty. And with that, a degree of educated guesswork. "[O]ne fundamental issue in historical linguistics concerns how best to deal with the inevitable gaps and discontinuities that exist in our knowledge of attested language varieties over time. ... One (partial) response is that—to put matters bluntly—in order to deal with gaps, we speculate about the unknown (i.e. about intermediate stages) based on the known. While we typically use loftier language to characterize this activity ... the point remains the same.
In this respect, one of the relatively established aspects of language that can be exploited for historical study is our knowledge of the present, where we normally have access to far more data than could ever possibly become available for any previously attested stage (at least before the age of audio and video recording), no matter how voluminous an earlier corpus may be," (Joseph and Janda 2003).
The Nature and Causes of Language Change
You might be wondering why language changes. According to William O'Grady et al., historical language change is distinctly human. As society and knowledge shift and grow, so, too, does communication. " Historical linguistics studies the nature and causes of language change . The causes of language change find their roots in the physiological and cognitive makeup of human beings. Sound changes usually involve articulatory simplification as in the most common type, assimilation . Analogy and reanalysis are particularly important factors in morphological change. Language contact resulting in borrowing is another important source of language change.
"All components of the grammar, from phonology to semantics , are subject to change over time. A change can simultaneously affect all instances of a particular sound or form, or it can spread through the language word by word by means of lexical diffusion. Sociological factors can play an important role in determining whether or not a linguistic innovation is ultimately adopted by the linguistic community at large. Since language change is systemic, it is possible, by identifying the changes that a particular language or dialect has undergone, to reconstruct linguistic history and thereby posit the earlier forms from which later forms have evolved," (O'Grady et al. 2009).
- Campbell, Lyle. Historical Linguistics: An Introduction. 3rd ed. Edinburgh University Press, 2013.
- Joseph, Brian D., and Richard D. Janda. "On Language, Change, and Language Change." The Handbook of Historical Linguistics . 1st ed., Wiley-Blackwell, 2003.
- Kiparsky, Paul. Explanation in Phonology . Foris Publications, 1982.
- Luraghi, Silvia, and Vit Bubenik. The Bloomsbury Companion to Historical Linguistics. Bloomsbury Publishing, 2010.
- O'Grady, William, et al. Contemporary Linguistics: An Introduction . 6th ed., Bedford/St. Martin's, 2009.
- Definition and Examples of Sound Change in English
- Examples of Linguistic Mutation
- 10 Types of Grammar (and Counting)
- Definition and Examples of Linguists
- Linguistic Typology
- What Is a Syllable in the English Language?
- An Introduction to Semantics
- Definition and Examples of Native Languages
- Definition and Examples of Diachronic Linguistics
- The Principle of Least Effort: Definition and Examples of Zipf's Law
- Language Change
- Linguistic Variation
- Standard English (SE)
- Speech in Linguistics
- What Is Lexical Diffusion?
- Loanwords: Definition and Examples
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts.
- Historical Linguistics
Historical linguistics is the scientific study of how languages change over time, which seeks to understand the relationships among languages and to reconstruct earlier stages of languages. At UGA, our primary focus is on historical Indo-European linguistics – the history and development of the Indo-European family of languages, which includes English.
Sociolinguistics, Heritage language communities, Historical linguistics, Germanic languages
Old English, Old Norse, the works of Tolkien
Historical Semitic linguistics and philology
Historical Semitic linguistics and philology. Diffusion models.
Literary representation of oral and stylistic registers (key, tone, modals).
Historical linguistics, Indo-European syntax and discourse structure
Language variation, Corpus linguistics, American English
Slavic prosody and the phonology/morphology interface; historical Slavic linguistics and accentology; and sociolinguistics, with a focus on questions of language and identity and language contact in the former Yugoslavia.
Language acquisition, Historical linguistics, Hispanic linguistics
Language variation and change, Romance languages
Middle High German, Historical Germanic linguistics
Historical Indo-European linguistics, Prepositional semantics, Slavic linguistics and folklore, Acquisition of Russian by foreign and heritage language learners
Historical Indo-European linguistics, Syntax
UGA Linguistics faculty members Dr. Margaret Renwick and Dr. Jon Forrest were featured in this week's Columns article , discussing the rapid decline of the Georgia accent among…
- Computational Linguistics
- Corpus Methods
- Language Acquisition
- Language Documentation
- Phonetics and Phonology
- Pragmatics and Discourse Analysis
- Psycholinguistics and Neurolinguistics
- Sociolinguistics and Language Variation
- Syntax and Morphology
Historical linguistics, the study of how languages change over time, subsumes both the general study of language change and the history of specific languages and language families. The intellectual spectrum thus defined bridges part of the gap between linguistic theory and the areas traditionally known as “philology.” At Harvard, the more theoretical aspects of historical linguistics are covered in courses offered by the Department of Linguistics, while courses dealing with the historical linguistics of specific languages are offered both by the Department of Linguistics and the relevant language departments.
View Course Requirements
Linguistic theory, the core of the modern field of linguistics, seeks to characterize the linguistic knowledge that normal human beings acquire in the course of mastering their native language between the ages of one and five. Studied as an internalized formal system, language is a source of insight into a wide range of human pursuits and abilities, some of them traditionally approached through the humanities, others through the social sciences, and others through the behavioral and natural sciences. The major divisions of linguistic theory are syntax, the study of sentence structure; phonology, the study of sounds and sound systems; morphology, the study of word structure; and semantics; the study of meaning. Courses in these areas regularly draw students from other Harvard departments, especially Psychology, Philosophy, and other departments associated with the Mind, Brain, Behavior Initiative. The secondary field in Linguistic Theory allows such students to receive official recognition for their linguistics coursework.
The contact person for Ph.D students wishing to pursue a secondary field in Linguistic Theory or Historical Linguistics is the Director of Graduate Studies in Linguistics .
Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .
Enter the email address you signed up with and we'll email you a reset link.
- We're Hiring!
- Help Center
Campbell History of linguistics
by Lyle Campbell
2000 The history of linguistics. Handbook of Linguistics, ed. by Mark Aronoff and Janie Rees-Miller, 81-104. Oxford: Blackwells.
Free Related PDFs
2007, Linguistics. An Interdisciplinary Journal of the Language Sciences
Review article of Brian D. Joseph / Richard D. Janda (eds.), The handbook of historical linguistics. Blackwell Handbooks in Linguistics: 2003. Linguistics 45/2007: 349-372.
Garabet K Moumdjian, Ph.D.
The Oxford Handbook of the History of Linguistics offers comprehensive coverage of the history of linguistics in a single volume and will serve as an introduction to the understanding of countless topics within the history of linguistics. This project began immediately after I had completed The Western Classical Tradition in Linguistics (Allan 2010a), which contains pretty much all I wanted to write on that subject; but even on topics within the history of linguistics that I covered in that book, there are other perspectives to be presented and, on many matters, much greater expertise than mine to be tapped. In addition, there are the non-western traditions to consider. So the present volume was conceived as a book that would make a significant contribution to the historiography of linguistics on a very wide range of topics. Thirty-four chapters, many covering a variety of issues, were commissioned from scholars who are expert in the field outlined in the title for each chapter. The size of the book necessarily favors concision over-expansiveness, but there is a vast bibliography pointing to sources for further inquiry in all the fields covered in the book for readers wishing to pursue a special interest.
SMART M O V E S J O U R N A L IJELLH
The purpose of this paper is to provide a radical critical review of Lyle Campbell's (2013) Historical linguistics: An introduction, (3 nd edn.). Edinburgh: Edinburgh University Press. More precisely, it gives an overview, survey, and critique of the main topics, principles, and theories which the work covers. My review is based on using it as the main textbook for ENGL 358 Historical Linguistics for over 5 years where the students say 'it's frightening'. The book comes in 538 (+ xxii) pages, consisting of seventeen chapters appended with a bibliography and three indices. The main topic of the book is language change at the levels of sounds (Ch. 2), meaning (Ch. 9), morphology (Ch. 10), syntax (Ch. 11), and writing or orthography (Ch. 15), and language classification (Chs. 5-6, 12, 14). The book, which I usually cover in one semester of 16 weeks, is interestingly encyclopedic and bulky with plenty of generally useful and practical examples and exercises although some chapters have almost no exercises at all like Chs. 7, 9, 12, 13, & 15. Overall the book is interesting and stimulating to read with a huge amount of information which may be confusing and unclear at times. However, it suffers from major drawbacks which will be considered chapter by chapter
Historical Linguistics – the study of language change – is a major field in linguistics. With its long history and numerous subfields of its own, Historical Linguistics provides challenges to both beginning students and scholars not specialized in this field. This Glossary meets these challenges by providing accessible and widely representative definitions, discussion, and examples of key terms and concepts used in the field. It is written by two well-known authorities in this field. The book is extremely valuable to anyone wishing to understand historical linguistic terminology and concepts. Key Features • A handy, easily understandable pocket guide, and a valuable companion for courses in Historical Linguistics, history of individual languages, history of linguistics, and for anyone curious about how and why languages change
This is a prepublication version.
EDITOR: Laurel J. Brinton TITLE: English Historical Linguistics SUBTITLE: Approaches and Perspectives PUBLISHER: Cambridge University Press YEAR: 2017 REVIEWER: Heli Tissari, Stockholm University, Sweden
Carlos Assunção , Gonçalo Fernandes , Rolf Kemmler
2016, History of Linguistics 2014: Selected papers from the 13th International Conference on the History of the Language Sciences (ICHoLS XIII), Vila Real, Portugal, 25–29 August 2014
The historiography of linguistics, widely recognized as a viable and vibrant branch of linguistics since at least the 1970s, has built on earlier histories of linguistics with a view to elucidating, in particular, the theoretical and methodological underpinnings of past and present analyses of language. The successful institutionalization of this branch of study is confirmed by multiple dedicated journals, national and international societies, and a large community of practitioners at universities and other higher education institutions around the world. The present volume, a selection of papers from the 13th International Conference on the History of the Language Sciences held in Vila Real, Portugal in 2014, is representative of the broad spectrum of topics that occupy researchers of linguistic historiography. They not only constitute a selection of twenty papers presented at ICHoLS XIII, but are simultaneously representative of the overall quality that currently can be found in contributions to the field. The volume is divided chronologically into four parts, from classical antiquity (Greek and Sanskrit) to the end of the twentieth century.
Brian D Joseph
Quantifying the quantitative (re-)turn in historical linguistics
- Barbara McGillivray 1 &
- Gard B. Jenset ORCID: orcid.org/0000-0001-7423-3112 2
Humanities and Social Sciences Communications volume 10 , Article number: 37 ( 2023 ) Cite this article
- Language and linguistics
Historical linguistics is the study of language change and stability, of the history of individual languages, and of the relatedness between languages. In spite of numerous acknowledgements, the adoption of quantitative methods in historical linguistics is still far from being mainstream and it falls below the level of other branches of linguistics. This comment considers the adoption of quantitative methods in recent historical linguistics research, and compares a study on 2012 publications with a similar study conducted seven years later. This comment argues for the advantages of a wider adoption of quantitative methods among historical linguists, and considers various reasons for the relatively slow progress in this direction. It also clarifies when quantitative methods are not the preferred route.
Why should we talk about quantitative historical linguistics?
Historical linguistics is the academic field that studies language change and language stability, explores the history of individual languages, and identifies the relatedness between languages (Harrison, 2003 , p. 214). This can involve investigating how different aspects of languages such as grammar, sound, or meaning changed over the course of the history of a language and across languages (diachronic analyses), reconstructing the pre-history of languages and language families, and tracing words’ etymologies. Historical linguistics also covers the study of the languages of the past (historical languages) from a synchronic viewpoint, i.e. at a single point in time (see Campbell, 2021 , among others).
Historical linguistics has been data-centric since its beginnings. Labov ( 1972 , p. 100) acknowledged that historical linguistics makes the best use of “bad data”, referring to the numerous gaps in the evidence available to historical linguists. The empirical basis of historical linguistics has also long been recognised by other scholars: Rydén ( 1980 , p. 38) wrote that the “study of the past […] must be basically empirical”, Fischer ( 2004 , p. 57) that “[t]he historical linguist has only one firm knowledge base and that is the historical documents”, a status also recognised by Penke and Rosenbach ( 2007 , p. 1) more recently.
In spite of these numerous acknowledgements, the adoption of quantitative methods in historical linguistics is still far from being mainstream and it falls below the level reached by other branches of linguistics. For example, Joseph ( 2008 , p. 687) notes that, while linguistics has always been an empirical field, “the bar [seems to have been raised] on the nature of the evidence we work with”, noting, in particular, an increase in the reliance on corpus data. Similar arguments are put forward by Winter ( 2022 ), Kortmann ( 2021 ), and Brinton et al. ( 2021 ), among others. The fact that quantitative methods in historical linguistics are underused is a serious limitation because quantitative methods offer researchers the opportunity to test theoretical hypotheses that have been proposed on many historical linguistics phenomena. Moreover, quantitative methods can fruitfully complement qualitative example-based research: large- or medium-scale multivariate data analyses have the potential to provide descriptions of multidimensional phenomena where different factors are at play, which is a fairly typical situation in historical linguistics.
Going beyond historical linguistics into the broader field of digital humanities and cultural heritage studies, the recent availability of large cultural datasets (many of which are in textual form), coupled with breakthroughs in computational research (particularly machine learning, natural language processing, and scientific data analysis), have renewed excitement about the so-called “computational turn” in humanities research, concerned with applying and/or developing computational methods to answer research questions in the humanities (McGillivray et al., 2020 ). This trend is further supported and strengthened by the Open Science movement, which has brought issues of open data and reproducibility to the front of the scientific debate, also trickling into digital humanities discourse (cf. e.g. McGillivray et al., 2022 ).
Alongside specific examples of quantitative studies that can advance the field, it is also important to articulate a quantitative framework for doing quantitative historical linguistics research. In Jenset and McGillivray ( 2017 ) we introduced a corpus framework that aims to provide the methodological and epistemological “scaffolding” to bridge the gap between conceptual considerations and concrete quantitative techniques. In this comment, we present the results of a quantitative analysis of articles published in historical linguistics journals: based on this analysis, we argue for the importance of the wider adoption of quantitative methods in historical linguistics. This study updates the study we presented in Jenset and McGillivray ( 2017 , pp. 25–35), which we will refer to as the “2012 study”. Our 2012 study focussed on 62 articles published in six historical linguistics journals in 2012 and found that 29% of the papers analysed were corpus-based, 40% were quantitative (as opposed to 80% of general linguistics articles in the study by Sampson, 2005 ), and corpus studies were more likely to adopt quantitative methods. Following the “chasm” model of technology adoption proposed by Moore ( 1991 ), and with appropriate caveats, this result pointed to historical linguistics being in an early phase of adoption of quantitative methods, with less than half of researchers adopting them and therefore falling into the category of “early majority”. In contrast to historical linguistics, the evidence for general linguistics points to this field having progressed to the full adoption of quantitative methods. A few years later, we wanted to check if the situation had changed and if the trend towards more quantitative studies had stopped or continued.
The aim of the analysis is to provide a snapshot of the field of historical linguistics today compared with the recent past. Following our 2012 study, the only previous quantitative study that has analysed the distribution of quantitative studies in historical linguistics journals, and to keep the task manageable, we selected six historical linguistics journals according to the following criteria (Jenset and McGillivray, 2017 , p. 27):
Research journals (thus excluding monographs, edited books, and yearbooks);
Journals published in the English language;
Journals focussing specifically on historical linguistics and/or language change;
Journals that had a general scope (thus excluding specific subfields of historical linguistics such as historical pragmatics);
Linguistics journals (thus excluding interdisciplinary journals).
We based our methodology on this previous study to provide a longitudinal perspective on its findings. Therefore, we selected the same list of peer-reviewed academic journals we chose in Jenset and McGillivray ( 2017 , pp. 25–35) according to criteria 1–5 above. The list of journals selected is:
Folia Linguistica Historica
Journal of Historical Linguistics
Language Dynamics and change
Language variation and change
Transactions of the Philological Society
We analysed all 63 research articles published in the journals listed above in 2019. The number of articles analysed is very close to the number we analysed in the 2012 study (62). We recognise that the size of this sample is rather limited, but we have decided to not expand the dataset further for a number of reasons. First, this analysis provides an empirical illustration of our argument, in line with the aims of a comment paper in contrast with a research paper. Second, as stated above, we kept the same selection criteria as our 2012 study to ensure a longitudinal perspective. Third, we carried out a statistical analysis that can measure the size of the effects detected and reveal whether indeed there is sufficient evidence for a statistically significant result. We selected all relevant research articles from the journal issues in question, excluding non-primary research such as editorials, comments, book reviews, and descriptions of software tools. We also excluded a very small number of articles that were not historical or diachronic, as well as introductions to special issues.
We read each article to collect the following information: the type of evidence base used in the paper (digital corpora, word lists, examples, etc.) and the statistical techniques used for the analysis if any ( t -tests, regression models, principal component analysis, etc.). We then classified the articles across two dimensions: corpus-based vs. non-corpus-based and quantitative vs. non-quantitative.
A paper was described as being corpus-based if the authors used a corpus (or a subset of a corpus) as the main evidence source of their research. In other words, the study had to use a machine-readable collection of historical natural language data which is published or at least accessible by others (even if not freely). Therefore, studies based only on word lists, private resources, purpose-built collections not available to the academic community, or other language resources such as dictionaries were not considered to be corpus-based. The type of data used in corpus-based studies in our sample included existing corpora such as Lip (Lessico di frequenza dell’italiano parlato) or portions of them, annotated corpora such as treebanks, and corpora of elicited utterances from fieldwork. The type of data used in non-corpus-based studies includes historical dictionaries, texts quoted in previous literature and examples from texts and manuscripts.
We considered a study to be quantitative if its conclusion relied on quantitative evidence, for example by including statements about the frequency of a given construction or set of items, testing a hypothesis quantitatively in some form or another, or measuring the statistical significance of a phenomenon such as a correlation between two variables. Phylogenetic studies, although they did not tend to use corpus frequency data in our sample, were considered quantitative because they compute distances between linguistic features. The techniques used in the quantitative studies range from simple percentages to chi-square tests and t -tests, to random forest, and regression models, including mixed effect models. Thus, the criterion for what we consider a quantitative study is not the presence of numbers in the article, nor is the definition as we operationalise it linked to any specific statistical technique. The only criterion we considered was whether the conclusion or main line of argumentation relied upon quantification in some form. We interpreted the absence of such quantification, determined by a close reading of each article, as indicating a qualitative study.
It is important to note that the two dimensions, corpus-based vs. non-corpus-based and quantitative vs. non-quantitative, although often correlated are nevertheless independent. A study may be corpus-based using qualitative methods, for example, if it relies on examples drawn from a corpus without presenting a quantitative analysis of them. On the other hand, a study may be quantitative without being corpus-based, for example, if it uses other evidence sources like phylogenetic research.
The articles covered a wide range of topics and linguistic subfields, from language typology and language classification to historical phonology, morphology, syntax, semantics and lexicon. The languages analysed include Latin, ancient Greek, Gothic, English, Medieval French, Eastern Tukanoan, Ecuadorian Siona, Vera’a, Spanish, Bantu languages, Japanese, Russian, old Saxon, Sanskrit, Celtic, Indian Punjabi, Italo-Romance languages, Dutch, German, and Grico.
One of the reviewers pointed out a potential risk of bias in our quantitative analysis, given that Transactions of the Philological Society (TPS) has a scope that might disproportionately attract studies of less attested and resourced languages, hence limiting the potential for quantitative analysis. However, this does not seem to be the case in our data. Of the 19 TPS articles in our sample we found that 13 were done on relatively well-attested and resourced languages: English (including Old and Middle English), Middle French, Middle Dutch, Old High German, Latin, Middle Norwegian and Old Irish.
Table 1 shows the number of articles in each category, alongside the percentages over the total number of articles (63). Of the articles analysed, 27 (43%) were qualitative and 36 (or 57%) were quantitative. Compared with the results from our 2012 (Jenset and McGillivray, 2017 , pp. 25–35), we notice an increase in the number of quantitative articles (57% vs. 40%) and corpus-based articles (49% vs. 29%). In other words, the split qualitative/quantitative split seems to have changed in favour of quantitative studies and the same happened in favour of corpus-based studies.
The majority (22 out of 31) of corpus-based articles are also quantitative, while the majority of those that are not corpus-based (18 out of 32) are qualitative. Out of the quantitative studies, 22 (or 61%) were corpus-based and 14 (or 39%) were not. The association between these two dimensions was statistically significant, as per a chi-squared test ( χ 2 = 5.73, p < 0.05, φ = 0.29). This is similar to the 2012 study, which also found a statistically significant association between corpus-based and quantitative studies ( χ 2 = 14.79, p << 0.05, φ = 0.49) but a larger effect size as measured by the φ coefficient. Both chi-squared tests are exact tests, without Yates’ continuity correction. Our original manuscript reported Yates’ corrected results (the default in R), but as a reviewer pointed out, Yates’ correction can be overly conservative. In our case, applying Yates’ correction resulted in a non-significant result for the 2019 data. All expected frequencies in the table are above five, meaning that the conditions for using an uncorrected test, as reported above, are met by Yates’ own criteria (Hitchcock, 2009 ). For completeness, we also ran Fisher’s exact test on the 2019 data, which also showed a significant result ( p < 0.05, OR = 3.34). Although the statistically significant association between corpus-based and quantitative studies persists from 2012 to 2019, the degree of association between them is less strong.
The 95% confidence intervals for quantitative articles in the 2012 study and in this study are presented in Table 2 . These confidence intervals show that 95% of the observations from the underlying population of articles from the journals from which the sample was taken (if this is representative) would fall between 44% and 70% for the 2019 data and between 28% and 52% for the 2012 data. A binomial test shows a statistically significant difference between the two samples ( p << 0.05).
To summarise our findings:
Quantitative studies have gone from 40% of the sample in 2012 to 57% in 2019.
Qualitative, corpus-based papers have increased since 2012. Qualitative, non-corpus papers have seen a decline.
There is still a significant association between corpus use and quantitative methods, but the strength of the association between them, as measured by the φ coefficient, has decreased from 0.49 in 2012 to 0.29 in 2019.
Arguments in support of quantitative methods in historical linguistics
Not all historical linguistics research can (or should) be quantitative. For certain linguistic phenomena, we simply do not have (enough) data to conduct statistical investigations. In other areas, non-quantitative computer-assisted methods, such as phylogenetic trees or networks, are more suitable (List, 2021 ). And in some areas, notably historical phonology and morphology, the traditional approach is in many cases not just the best but the only method available.
Nonetheless, it is clear from our data that 2019 has seen a statistically significant increase in the proportion of articles using quantitative methods compared to 2012. The increase from 40% to 57% represents a growth of 42.5%. In our opinion, this growth is a good thing, because this methodological alignment between synchronic and diachronic linguistics can facilitate other types of alignment and help break down the artificial distinction once introduced by Saussure (Pierce and Boas, 2019 ). However, the 42.5% growth in quantitative papers must be seen in its proper context. Firstly, the growth unfolds over a period of 7 years, meaning that the compound annual growth is only about 5%. For comparison, 5% of our 2019 sample is about 3 papers. This suggests that the growth might be a gradual one, rather than an abrupt shift, although a year-by-year analysis would be required to rule out the possibility of any sudden jumps. In other words, although historical linguistics articles have seen considerable growth in quantitative methods compared to 2012, the field remains behind, or at least not conclusively level with, that of linguistics as a whole.
This raises the interesting question of what is a reasonable, or appropriate, level of quantitative studies in historical linguistics. This is a question that cannot be answered prescriptively, if at all. In this article, we restrict ourselves to observing firstly that in general, an increase in the adoption of quantitative methods is desirable both to open new avenues of research and to facilitate alignment with synchronic linguistics. Secondly, we observe that the proportion of quantitative studies is growing, which suggests that historical linguists see value in conducting and publishing more quantitatively oriented research.
This ties in with broader trends in the humanities, where recent years have seen a number of textbooks in quantitative methods aimed specifically at researchers and students in the humanities. Examples include Tilton ( 2015 ), Lemercier and Zalc ( 2019 ), McGillivray and Toth ( 2020 ) and Karsdorp et al. ( 2021 ). A similar proliferation of quantitative methods textbooks can be found in linguistics in this period, with recent examples from cognitive linguistics (Winter, 2022 ), psycholinguistics (Rij van et al., 2020 ), and sociolinguistics (Macaulay, 2009 ).
There does not seem to be a similar publication burst of quantitative methods textbooks specifically for historical linguistics. Perhaps this is because some techniques, e.g. regression modelling, can be taught equally well with synchronic data and because some textbooks, such as Baayen ( 2008 ) and Johnson ( 2008 ) include chapters relevant to historical linguistics. However, it is also noteworthy that a popular historical linguistics textbook such as Campbell ( 2021 ) devotes its chapter on quantitative historical linguistics almost exclusively to criticism (Jenset and McGillivray, 2017 , p. 86), suggesting at least some degree of resistance to their adoption in the field.
However, it is also worth considering the differences between historical linguistics and synchronic linguistics. The quantitative trend in linguistics generally seems driven partly by criticism of previous reliance on introspection in linguistics and partly by better access to quantitative or quantifiable data, such as web data or experimental data obtained via websites such as Amazon’s Mechanical Turk (Winter, 2022 ). However, although web data might be an interesting source of data for some diachronic studies, historical linguistics in its proto-typical sense is cut off from these data sources. There are no native speakers of Old English or Latin to be recruited from Mechanical Turk. Instead, historical linguists must, by necessity, make use of the various types of evidence available, whether textual evidence or the present-day languages themselves, as related entities produced by a historical process. This, then, might constitute a form of absolute limit on the degree to which quantitative methods can be applied in historical linguistics. To be clear, a complete ban on qualitative studies in historical linguistics would be both futile and undesirable (Jenset and McGillivray, 2017 ; Kortmann, 2021 ). However, we believe the field should be striving towards a high degree of adoption of quantitative methods, to the extent possible, for these reasons of transparency, reproducibility, code and data sharing on a larger scale, as well as methodological alignment with linguistics in general and ultimately other adjoining fields.
Whence quantitative historical linguistics?
Based on our analysis, it seems clear that historical linguistics is undergoing, or has undergone, a quantitative turn, similar to linguistics in general (Winter, 2022 ; Brinton et al., 2021 ; Kortmann, 2021 ; Pierce and Boas, 2019 ; Janda, 2013 ; Joseph, 2008 ). It is difficult to judge if we have reached some natural or optimal level of application for quantitative methods in historical linguistics, or if there is still room to increase the proportion of quantitative studies further. Ultimately, that is a question for the future. However, after taking stock of where we are, it seems to us that a few clear challenges for the future can be formulated.
Firstly, there is the question of the quantitative turn itself, and its inclusion in historical linguistics. It should be noted that although the proportion of quantitative studies in our sample has increased, we should not forget the qualitative side of quantitative methods. Not all quantitative methods are equally informative or well-adapted to historical linguistics. As a consequence, we see room for moving away from the classical null hypothesis tests, towards more advanced methods that can better account for the context of the data. Null-hypothesis tests are sometimes useful (we have used them here, for instance) but they can be problematic with historical data (Jenset and McGillivray, 2017 , p. 96), and although multilevel/mixed-effects regression models have gained a firm foothold in historical linguistics, there is probably room for further adoption of such models in particular, and generally for a broader repertoire of techniques suited for specific research questions. Even if the quantitative method has been thoughtfully picked, it doesn’t automatically follow that its use is well integrated with the linguistic problem at hand. The results, in our experience, are often studies where the conclusions and the quantitative analysis do not support each other. Kortmann ( 2021 ) discusses the same problem from a general linguistic point of view and argues (correctly in our view) that linguistic questions should lead the way in selecting the appropriate methods. This sounds (and is) reasonable but it is potentially challenging. Firstly, it requires a wider overview of the available statistical methods as well as a deeper conceptual understanding of what they do. It is also challenging since it might break with community norms, both for researchers and journal reviewers, and editors.
Next, there is clearly a set of open questions for historical linguistics in general that is not limited to quantitative historical linguistics, but which quantitative approaches to historical linguistics must also inevitably grapple with. For example, using quantitative methods in themselves does not automatically address the problem that multiple explanations and hypotheses might be compatible with the observed historical data (Jenset and McGillivray, 2017 , p. 47). Roberts et al. ( 2020 ) present an interesting supporting tool to deal with this problem, which we find interesting and encouraging, but insufficient on its own to address this problem. Instead, we will probably need an even closer alignment of theory, hypotheses, data, and methods. Another such general problem is data quality, with gaps and various forms of historical preservation bias (geographical, social, gender-based, etc.) as prominent examples. Again we can find interesting partial technological solutions, such as imputation techniques for missing data, and simulation experiments including agent-based modelling (Stevens and Harrington, 2022 ; Harrington et al., 2019 ). Yet despite these promising technical advances we still see the greatest gains stemming from a closer engagement between theory, methods, and the available, imperfect, data.
We also think that historical linguistics could stand to gain from making better use of the data already available. In some cases, this would undoubtedly require the development of more natural language processing (NLP) tools for historical language varieties, to allow further enrichment of historical data, in addition to what already exists (Jenset and McGillivray, 2017 ). Another way in which the existing data could be better leveraged is by further enriching it with human annotation. Numerous such projects exist, and many historical linguists have undoubtedly done much annotation work that could, and should be shared with colleagues, e.g. as open datasets and described in a data paper. However, we believe there is also a benefit in unlocking annotated historical data that are, in our experience, too often difficult to integrate with current quantitative modelling platforms and techniques. A quantitative analysis of syntactically annotated data (Taylor, 2020 ), e.g. chapters 6 and 7 of Jenset and McGillivray ( 2017 ), will often require programming skills (or else very lengthy manual re-recording of annotations) to extract the rich, detailed information needed to perform multivariate regression analyses. New tools such as TreeNet (Jenset, 2022 ) would partially help, but a combination of training historical linguists in coding or having research teams with more diverse skills seems inevitable. As such, this challenge speaks not only to current researchers in historical linguistics but also to the coming generations that they will be training.
The datasets analysed during the current study are available in the Dataverse repository: https://doi.org/10.7910/DVN/IIHRZ3 .
Baayen RH (2008) Analyzing linguistic data: a practical introduction to statistics using R. Cambridge University Press, Cambridge
Book Google Scholar
Brinton LJ, Honeybone P, Kortmann B, Seoane E (2021) 25 years of English Language and Linguistics: a celebration and analysis. English Lang Linguist 25(4):677–685
Article Google Scholar
Campbell L (2021) Historical linguistics: an introduction. Edinburgh University Press, Edinburgh
Fischer O (2004) What counts as evidence in historical linguistics? Stud Lang 28(3):710–740
Harrington J, Gubian M, Stevens M, Schiel F (2019) Phonetic change in an Antarctic winter. J Acoust Soc Am 146(5):3327–3332
Article ADS Google Scholar
Harrison S (2003) On the limits of the comparative method. In: Joseph BD, Janda RD (eds.) The handbook of historical linguistics. Blackwell, Malden, MA, pp. 213–243
Chapter Google Scholar
Hitchcock DB (2009) Yates and contingency tables: 75 years later. Electron J Hist Probab Stat 5:1–14
MATH Google Scholar
Janda LA (2013) Quantitative methods in cognitive linguistics: an introduction. In: Janda LA (ed) Cognitive linguistics—the quantitative turn. De Gruyter, Mouton, Berlin/Boston, pp. 1–32
Jenset GB (2022) TreeNet—a computational system for discovering constructions in historical parsed corpora. In: International Conference In Historical Linguistics 25 (Book of abstracts). Oxford https://ichl.ling-phil.ox.ac.uk/abstracts/209
Jenset G, McGillivray B (2017) Quantitative Historical Linguistics. A corpus framework. Oxford University Press, Oxford
Johnson K (2008) Quantitative methods in linguistics. Blackwell
Joseph BD (2008) The Editor’s Department: last scene of all. Language 84(4):686–690
Karsdorp F, Kestemont M, Riddell A (2021) Humanities data analysis: case studies with Python. Princeton University Press
Kortmann B (2021) Reflecting on the quantitative turn in linguistics. Linguistics 59(5):1207–1226
Labov W (1972) Some principles of linguistic methodology. Language Soc 1(1):97–120
Lemercier C, Zalc C (2019) Quantitative methods in the humanities: an introduction. University of Virginia Press
List JM (2021) Computer-assisted approaches to historical language comparison. Habilitation Thesis, Friedrich Schiller University, Jena
Macaulay RKS (2009) Quantitative methods in sociolinguistics. Bloomsbury
McGillivray B, Alex B, Ames S, Armstrong G, Beavan D, Ciula A et al (2020) The challenges and prospects of the intersection of humanities and data science: a White Paper from The Alan Turing Institute. figshare https://doi.org/10.6084/m9.figshare.12732164.v5
McGillivray B, Toth G (2020) Applying language technology in humanities research. Design, application, and the underlying logic. Palgrave Macmillan
McGillivray B, Marongiu P, Pedrazzini N, Ribary M, Wigdorowitz M, Zordan E (2022) Deep Impact: a study on the impact of data papers and datasets in the humanities and social sciences. Publications 10(4):39. https://doi.org/10.3390/publications10040039
Moore GA (1991) Crossing the chasm: marketing and selling high-tech products to mainstream customers. Harper Business, New York
Penke M, Rosenbach A (2007) What counts as evidence in linguistics? An introduction. In: Penke M, Rosenbach A eds What counts as evidence in linguistics. John Benjamins, Amsterdam, pp. 1–49
Pierce M, Boas HC (2019) Where was historical linguistics in 1968 and where is it now?. In: Pierce M, Boas HC (eds) New directions for historical linguistics. Brill, pp. 1–41
Rij van J, Vaci N, Wurm LH, Feldman LB (2020) Alternative quantitative methods in psycholinguistics: implications for theory and design. In: Pirrelli V, Plag I, Dressler WU (eds) Word knowledge and word usage. A cross-disciplinary guide to the mental lexicon. De Gruyter Mouton
Roberts AK et al. (2020) CHIELD: the causal hypotheses in evolutionary linguistics database. J Language Evol 5(2):101–120. https://doi.org/10.1093/jole/lzaa001
Rydén M (1980) Syntactic variation in a historical perspective. In: Jacobson S (ed) Papers from the Scandinavian symposium on syntactic variation, Stockholm, 18–19 May 1979. Almqvist & Wiksell, Stockholm, pp. 37–45
Sampson GR (2005) Quantifying the shift towards empirical methods. Int J Corpus Linguist 10:10–36
Stevens M, Harrington J (2022) Individual variation and the coarticulatory path to sound change: agent-based modeling of/str/in English and Italian. Glossa 7(1):1–34.
Taylor A (2020) Treebanks in historical syntax. Annu Rev Linguist 6:195–212
Tilton L (2015) Humanities data in R. Exploring networks, geospatial data, images, and text. Springer
Winter B (2022) Mapping the landscape of exploratory and confirmatory data analysis in linguistics. In: Tay D, Pan M (ed) Data analytics in cognitive linguistics: methods and insights. De Gruyter Mouton, Berlin, Boston, pp. 13–48
Authors and affiliations.
King’s College London, London, UK
Springer Nature, London, UK
Gard B. Jenset
You can also search for this author in PubMed Google Scholar
Correspondence to Barbara McGillivray .
GBJ is employed by Springer Nature. BMG declares no competing interests.
This article does not contain any studies with human participants performed by any of the authors.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and Permissions
About this article
Cite this article.
McGillivray, B., Jenset, G.B. Quantifying the quantitative (re-)turn in historical linguistics. Humanit Soc Sci Commun 10 , 37 (2023). https://doi.org/10.1057/s41599-023-01531-2
Received : 19 August 2022
Accepted : 17 January 2023
Published : 30 January 2023
DOI : https://doi.org/10.1057/s41599-023-01531-2
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Explore articles by subject
- Guide to authors
- Editorial policies
Historical Linguistics Essays
Change in sound system, popular essay topics.
- American Dream
- Artificial Intelligence
- Black Lives Matter
- Bullying Essay
- Career Goals Essay
- Causes of the Civil War
- Child Abusing
- Civil Rights Movement
- Community Service
- Cultural Identity
- Cyber Bullying
- Death Penalty
- Depression Essay
- Domestic Violence
- Freedom of Speech
- Global Warming
- Gun Control
- Human Trafficking
- I Believe Essay
- Importance of Education
- Israel and Palestine Conflict
- Leadership Essay
- Legalizing Marijuanas
- Mental Health
- National Honor Society
- Police Brutality
- Pollution Essay
- Racism Essay
- Romeo and Juliet
- Same Sex Marriages
- Social Media
- The Great Gatsby
- The Yellow Wallpaper
- Time Management
- To Kill a Mockingbird
- Violent Video Games
- What Makes You Unique
- Why I Want to Be a Nurse
- Send us an e-mail