More than Words

July 7, 2021

Functional linguistics, including corpus linguistic methods, has a lot to offer the study of legal meanings. (For those new to corpus linguistics, I provide a quick introduction, with a few key examples of some really cool findings, in Part I of this paper.) In this post I’ll outline a couple of key insights legal thinkers can gain from corpus work, and highlight some of the questions and concerns that should stay at the forefront of corpus-based legal inquiries. One overall point here is corpus analysis, like any empirical methodology, requires a lot of decisions and interpretations that themselves require justifications. And one crucial concern is the need to separate (1) claims about how language actually works from (2) claims about how law ought to work—claims that are often conflated in legal interpretation.

Constructing a corpus involves choosing relevant speakers, genres, texts, situations.

Corpus linguistic inquiry needs a body of texts to examine, but not all texts will be relevant to all inquiries. A reliable corpus needs to represent the population of speakers whose speech is at issue in an analysis. If we want to know how American political elites used the term “carry a firearm” in the 1970s, for instance, we would probably not look to seventeenth century Britons as exemplars—to pick on a well-known instance in which a Supreme Court opinion quoted the King James Bible to interpret a modern criminal statute. Academic researchers have to figure out which speech community they want to study and compile a corpus of that community’s speech. In some sense, the question an academic researcher asks determines what the corpus should contain. (Whether they can construct a relevant and representative corpus is a whole other question.)

For legal writers, the issue is more complicated. Legal interpretation aims not just to report on a particular community’s speech patterns but to influence our understandings of law. Exerting such influence is why legal interpreters invoke the language habits of others to begin with. That means that a legal writer using corpus approaches has to decide which language communities are relevant for the interpretation of law. Or to put it a bit more bluntly, we have to decide which communities should have the power to (help) determine the meaning of law. That is not quite the same task as determining which speech community’s speech one wants to examine. Rather, in legal interpertation the decision involves an underlying claim to democratically legitimate power over our understanding of the law.

Take for instance the corpus favored by many legal writers interested in constitutional interpretation: the Corpus of Founding Era American English (COFEA). This corpus includes records from the constitution’s production and ratification, early American laws and administrative materials, legal treatises, and the writings of various founding-era political luminaries. If we use the COFEA to determine what a constitutional provision meant to people at the time of the founding, we posit that the tiny minority this corpus represents—political superstars, lawmakers and government agents, a few legal scholars—is the speech community that gets to determine what the Constitution meant at the time. Someone interested in the document’s original public meaning would probably find this inadequate. It’s more like the original 1% meaning, really.

Of course, one might decide that the original 1% is the most relevant speech community. There may be value in listening to what lawmakers say regarding the laws they write, and even to understanding how lawmakers use language outside the legal context. (Textualists take note.) Corpus linguistics helpfully reminds us that such choices are choices—not givens—and that they require justification. Relevance is not born; it is argued.

This is not to suggest that the COFEA is not useful. There are many questions it can help answer. But it cannot tell us how the majority of contemporaneous speakers would have understood the Constitution, simply because it does not have texts representing them. Indeed, perhaps there are not sufficient texts in existence to do so. After all, we have far fewer records from the enslaved people, women, and non-propertied white men of the day, who vastly outnumbered the men whose writing the COFEA contains. Empirically based research has to take into account the limitations of its evidence base and be clear-eyed about which answers, to which questions, it can yield.

Of course, a similar issue comes up in non-corpus based analysis, where those looking for original understandings tend to turn to elites who had a direct interest in inscribing their own ideas about constitutional meaning. But corpus analysis, which requires users to choose a finite corpus of texts, reminds us that such decision are unavoidable. In the legal realm, they’re also unavoidably normative. In academics as in law, you have to make an argument for the relevance of the texts you use rather than just assume it.

Communication takes more than the written words.

US legal interpretation often focuses on the meaning of words—usually just a couple of words at a time. Writers ask what “carry a firearm” means, or “bear arms.” Corpus linguistics research reminds us that words are not vessels that carry a meaning consistently across contexts, like a vase that holds the same water no matter whether it’s on the windowsill or on the table. Words are more like tools that people use to create meanings together. They have some consistency, sure, like hammers generally work to drive something into something else. But what it is that a hammer hammers, and what it is that the hammering builds, matters. This may be why a lot of academic corpus research asks questions that go way beyond the meanings of words. They ask how language patterns define genres, speech communities, and time periods. They examine how linguistic choice structures information flow, conveys individual perspectives or normative associations, and connects interactional participants.

Not that words don’t matter. But the way words matter depends on how they’re put together—through syntactical relations, repeated patterns, genre conventions, situational regularities, and so on. That’s the sort of stuff that corpus linguistics is great at. Corpus studies thus have the potential to take legal analysis beyond the static, cramped confines of the individual word or little phrase and into the wider world of syntax and pragmatics—the stuff that gives language its communicative oomph. (Karen Sullivan’s article on the Second Amendment, which evaluates how “being” clauses have functioned in English over time, is a great example of what corpus analysis can do when it moves beyond a narrow focus on word meaning in the law.) Making legal writers more aware of linguistic structure’s effect on meaning, and loosening some of our legal tradition’s obsessive focus on words over communicative practice, would be a big contribution in itself.

Corpus work is also great at finding the boundaries of linguistic genres, helping reveal the patterns that characterize different types of communication used for different purposes. That allows us to play one text off others that help define it in its relevant context. How, for instance, should we understand the “right” granted in the Second Amendment? If we look at corpora of law, political theory, and philosophy, will we find rights imagined as absolutes that tower above societal needs, or as flexible and socially embedded? How do such texts draw the bounds of rights of an individual within the needs of society (such as through time, place, and manner limitations)? Corpus inquiry could help ascertain the kind of weighing that typically goes (or went) into assessing the exercise of a right in particular situations.

Corpus methods could also alert legal writers to another crucial aspect of meaning production: the “paradigm” level interaction between words that appear in an utterance and other words that could have been chosen instead—as opposed to the “syntagm” level analysis of words that are co-present in an utterance, to which legal analysis usually limits itself. When I say “I like ice cream,” the words relate to each other syntagmatically, but the word “like” exists in a paradigm set with other words I could have used but chose not to. Your understanding of my feelings about ice cream depends in part on your knowledge that I could have said “love” or “hate” instead of “like.”

The same axis of meaning goes for legal text. What, for instance, might have gone into the paradigm set of the Second Amendment’s “arms” in 1791? Perhaps “musket” or “arrow” (as opposed to, say, “semi-automatic rifle,” not in any 1791 paradigm set). Corpus work can give us a sense of how word boundaries are molded by the words that could take their place—as well as indicate paradigms that words could not participate in at a given time, or in a given genre.

Meaning making is not stable across speech communities (or time).

As I indicated above, a lot of academic corpus linguistics work tracks differences in language use, showing how speech communities are distinguishable through subtle distinctions in their communication habits. That has important implications for legal interpretation as well.

For one thing, again, it suggests that groups that stand in different relations to the law will often use and understand language differently. The Constitution’s writers were likely in a different speech community than many others affected by it, and those others themselves form many different speech communities—upper class and lower class, southern and northern, educated and uneducated, male and female, enslaved and free, and so on. Although we can strive to understand how different groups might have understood the Constitution, given the variety of speech communities involved it is quite likely that there was no single original public understanding, nor one usage of any given form at the relevant time. As I said above, choosing a community to investigate entails a claim that it’s that community that matters.

For another thing, linguistics of all sorts, including corpus study, shows that language usage and meaning changes over time. Phonology changes, of course, as different sounds spread in popularity over different speech communities. But even more so, as history wears on, social relations shift, technology develops, and so on, the things words refer to, the ways people discuss them, and the values they ascribe to them change too. This is now a background assumption in a lot of linguistics research, which aims to show not that language use and meaning-making changes over time, but how it does so in particular instances.

If we want to treat constitutional language the way a linguist would, then, we will assume that linguistic meaning changes over time as everything around it changes—from its readers and expositors and implementers, to the social norms and social structure in which it takes effect, to the legal regime that it functions in. Corpus linguistics, then, works on a rather non-originalist assumption. Linguistics does not recognize a fixation thesis, but rather assumes that language—as a social medium—continually develops along with the society around it.

Corpus linguistics can thus illuminate the divide between the normative, as opposed to the empirical, aspects of legal theory. Believing that the understandings of constitutional contemporaries should define our interpretation of the Constitution today is a normative position. That normative position can lead one to use the corpus technology in particular ways. But there’s nothing in the technology itself that mandates, or really even supports, such an approach to meaning.

Moving beyond the search bar.

I’ve suggested a few ways corpus linguistic research could be useful in constitutional interpretation. Unfortunately, these utilities have not been explored much yet in corpus-based legal interpretation. Instead, that work has—with some notable exceptions—stayed focused on the word-level search for inherent meaning that has characterized legal interpretation for a long time. That’s unfortunate, because a broader look at the methodology and its implications could lead to an empirically-informed turn in legal interpretation—a turn that treats legal language as the purposeful communicative practice that it is.