11th Circuit Concurrence Makes 'Modest Proposal' for Use of AI-Powered Large Language Models in Legal Interpretation | Insights

The Eleventh Circuit’s opinion in Snell v. United Specialty Ins. Co., 102 F.4th 1208 (11th Cir. 2024)—which resolved a garden-variety dispute over insurance coverage—was itself not noteworthy from an artificial intelligence and litigation perspective. Judge Kevin Newsom’s concurrence in the opinion, however, is a different story. Judge Newsom used his concurrence to talk through his thoughts on whether and how AI-powered large language models (LLMs) might assist with the interpretation of words and phrases in legal instruments.

The issue before the Eleventh Circuit was whether an insurance company breached its policy when it refused to defend the plaintiff, James Snell, after Snell was sued for negligent installation of a ground-level trampoline, accompanying retaining wall, and decorative wooden cap (the “ground-level trampoline project”).

Snell argued that the installation of the ground-level trampoline project constituted “landscaping,” a covered event in his insurance policy, and thus the insurance company was obligated to defend him. The insurance company rejected Snell’s demand, asserting that installing a ground-level trampoline project was not landscaping. Ultimately, the Eleventh Circuit found it could reject Snell’s claim without deciding whether the ground-level trampoline project was, or was not, “landscaping.”^[1]

But as the Eleventh Circuit was coming to that conclusion, Judge Newsom spent “hours and hours (and hours)” on the question of whether the ground-level trampoline project was “landscaping.” It was during these hours that “[p]erhaps in a fit of frustration, and most definitely on what can only be described as a lark,” Judge Newsom instructed one of his clerks to ask a popular LLM, “What is the ordinary meaning of ‘landscaping’?” Finding the LLM’s response “more sensible than [he] had thought it might—and definitely less nutty than [he] had feared,” Judge Newsom followed up with the ultimate question: “Is installing an in-ground trampoline ‘landscaping?’” Then, for good measure, Judge Newsom posed the same questions to a different LLM. Each LLM’s response indicated that the installation of the ground level trampoline “just might be landscaping.”

Although the Eleventh Circuit concluded that it could resolve the appeal without deciding whether the ground-level trampoline project constituted “landscaping,” Judge Newsom’s LLM exercise compelled him to consider further LLMs’ place in the interpretation of legal texts. Judge Newsom first sketched out the five primary benefits he saw in using LLMs “as one implement among several in the textualist toolkit to inform ordinary-meaning analyses of legal instruments.” Judge Newsom then turned to, in his opinion, the four primary drawbacks of using LLMs. Finding that the benefits of using LLMs generally outweighed drawbacks, Judge Newsom finished his concurrence with concluding thoughts on how to maximize the value of LLMs in interpreting legal texts. Although Judge Newsom’s concurrence is worth reading in its entirety, his main points are summarized below.

As noted above, Judge Newsom first discusses the primary benefits of using LLMs to interpret legal texts.

1. ‘LLMs train on ordinary-language inputs.’

– According to Judge Newsom, this is the best reason to believe that LLMs may be useful. LLMs train on “mind-bogglingly enormous amount of raw data taken from the internet”—from Ph.D. dissertations to gossip rags.

– Given this, LLMs can provide useful statistical predictions about how ordinary people ordinarily use words and phrases. How ordinary people use ordinary words and phrases is, of course, the core of the “ordinary meaning rule.”

2. ‘LLMs can “understand” context.’

– LLMs essentially convert language into math that computers can understand. Through this “mathematization,” LLMs can absorb and assess the use of terminology in context and detect language patterns at a granular level.

– LLMs’ understanding of context has become so sophisticated that they can easily discern, for example, a prompt regarding the winged mammal “bat” and the wooden baseball “bat.”

3. ‘LLMs are accessible.’

– The availability and use of LLMs has proliferated in recent years and there is no reason to think it will not continue to do. This allows for “democratization” of the interpretative enterprise both by and for ordinary people.

– LLMs can also be an inexpensive research tool. The searches that Judge Newsom ran regarding the meaning of landscaping did not cost him anything.

4. ‘LLM research is relatively transparent.’

– While we lack perfect knowledge about what LLMs train on, we do broadly know they are trained on “tons and tons of internet data.”

– Dictionaries, that indispensable tool of textualists, are not necessarily more transparent than LLMs. Not all dictionaries explain how a word obtains its definition or how the dictionary was put together.

– Further, there are always multiple dictionaries to consult, and legal practitioners and judges seldom explain why they chose one definition over another.

5. ‘LLMs hold advantages over other empirical interpretive methods.’

– Judge Newsom’s final point in LLMs’ favor is that LLMs compare favorably to other non-dictionary textualist tools. Empiricists have critiqued the dictionary-focused approach to plain meaning interpretation and suggested other methods, including surveys of ordinary citizens as well as corpus linguistics, which “aims to gauge ordinary meaning by quantifying the patterns of words’ usages and occurrences in large bodies of language.”

– However, surveys may be too time- and resource-consuming to be practical, and corpus linguistics relies on the discretion of those compiling the data. LLMs, on the other hand, do not share those flaws.

After Judge Newsom walks through the benefits of using LLMs to discern the ordinary meaning of terms in legal documents, he turns to the potential drawbacks of their use, as well as why, in his opinion, those drawbacks should not preclude the use of LLMs.

1. ‘LLMs can “hallucinate.”’

– LLMs “hallucinate” when they produce false, inaccurate, or nonsensical information. Hallucinations are considered among the most serious objections to using LLMs.

– Judge Newsom reasons that LLMs are continuously improving, and we may see fewer hallucinations in the future. Further, hallucinations seem less likely to occur when asking for the ordinary meaning of a word, as opposed to asking a specific question for a specific answer.

– Additionally, lawyers can “hallucinate” too—at times shading facts, finessing, or omitting adverse authorities, etc.

2. ‘LLMs don’t capture offline speech, and thus might not fully account for underrepresented populations’ usages.’

– People living in communities with limited internet access may contribute less to the LLM’s online sources.

– Judge Newsom asserts that we shouldn’t overreact against this concern because all sources have limitations, noting that Merriam-Webster editors also rely on hard-copy sources to determine terms’ ordinary meanings.

– Further, LLMs are trained using extremely large data sets that include not only data that was “born” online but also material that was created in the physical world and then digitized and uploaded to the internet, thus better accounting for offline speech.

3. ‘Lawyers, judges, and would-be litigants might try to manipulate LLMs.’

– Lawyers and judges may use LLMs to strategically reverse engineer their prompts so as to result in the definition they want.

– This concern is not unique to LLMs. Lawyers and judges can also shop for the dictionary definitions that best fit their desired outcomes.

– It is also possible that prospective litigants, including AI companies promulgating the LLMs, may seek to corrupt the inputs upon which the LLMs train. However, this seems both incredibly difficult (LLMs train on “billions” of words) and, in the case of AI companies, against their long-term interests. Further, the larger concern may be mitigated by querying multiple different LLMs.

4. ‘Reliance on LLMs will lead us into dystopia.’

– Finally, Judge Newsom asks if consideration of LLM outputs in interpreting legal texts will inevitably put us on some dystopian path toward “robo judges” algorithmically resolving human disputes.

– Judge Newsom determines this is an unlikely outcome, stressing that he is “not, not, not” advocating that a judge query an LLM concerning the ordinary meaning of a word and then mechanistically apply that meaning to their facts to render judgment.

– As Chief Justice Roberts recently observed, the law will always concern gray areas that require the application of human judgment.

Judge Newsom finishes his concurrence with four thoughts on how to maximize LLMs’ utility. First: clarify the objective of the prompt. In his view, the best use of an LLM model is to discern “how normal people use and understand language, not in applying a particular meaning to a particular set of facts to suggest an answer to a particular question.” Second: try different prompts, and even better, in different models, and report the prompts and the range of results they obtain. Third: clarify the desired output—for instance, asking the LLM the same question multiple times and seeing how often the result remains constant. Fourth: consider the temporal dimensions of the request—what a word means in 2024 might not be what it meant in 1787, 1868, or 1964.

Beyond being a thoughtful addition to the caselaw discussing artificial intelligence in litigation, Judge Newsom’s concurrence is entertaining and engaging throughout—it is well worth a read for anyone interested in this area.

^[1] In his insurance application, which Alabama considers part of the insurance contract, Snell had denied that his work included “any recreational or playground equipment construction or erection,” mandating a decision in favor of the insurer regardless of whether the ground-level trampoline installation project was “landscaping.”

Attachments

GT Alert_11th Circuit Concurrence Makes ‘Modest Proposal’ for Use of AI-Powered Large Language Models in Legal Interpretation