Skip to main content

Federal Judge Approves Predictive Coding Technology for e-Discovery

In an eagerly anticipated decision, Magistrate Judge Andrew Peck of the Southern District of New York issued the first judicial opinion formally approving the use of “predictive coding” technology, which offers the promise of substantially reducing costs associated with exhaustive, human review of electronically stored information (ESI).1 Recognizing the Bar has been hesitant to embrace this technology, Judge Peck wrote: “What the Bar should take away from this Opinion is that computer-assisted review is an available tool and should be seriously considered for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review. Counsel no longer have to worry about being the “first” or “guinea pig” for judicial acceptance of computer assisted review … [which] can now be considered judicially-approved for use in appropriate cases.”2

What is Predictive Coding?

The term “predictive coding” (also commonly referred to as “computer-assisted review” or “technology-assisted review”) refers to “tools … that use sophisticated algorithms to enable [a] computer to determine relevance based on interaction with (i.e. training by) a human reviewer.”3 The process involves a senior attorney, with extensive knowledge of the case issues, reviewing and coding a “seed set” of documents sampled from the universe of collected data. This individual document review “trains” the computer to recognize other relevant documents so that it can “predict” the reviewer’s coding for the entire dataset. “When the system’s predictions and the reviewer’s coding sufficiently coincide, the system has learned enough to make confident predictions for the remaining documents.”4

Predictive coding technology properly implemented permits quickly narrowing the pool of potentially relevant electronic documents and obviating the need for timeconsuming, expensive, manual review of hundreds of thousands, if not millions, of documents. Indeed, Judge Peck recognized that in a case involving large quantities of documents, a “linear manual review is simply too expensive.”5

Independent studies suggest that computer-assisted review is as effective, if not more effective, than conventional e-discovery tools, such as keyword and conceptual searches followed by manual, individual document review. Judge Peck noted that “while some lawyers still consider manual review to be the ‘gold standard,’ that is a myth, as statistics clearly how that computerized searches are at least as accurate, if not more so, than manual review.”6 Citing several judicial opinions criticizing “keyword searches” commonly used by e-discovery practitioners, Judge Peck observed that parties selecting keywords to cull down potentially responsive records resembles the child’s game “Go Fish,” noting such searches are at once over-inclusive (locating numerous irrelevant documents) and under-inclusive (citing a study suggesting that keywords returned only 20 percent of relevant documents).7

The Predictive Coding Protocol Employed in Da Silva Moore

Da Silva Moore is a putative class action brought by female employees of “one of the world’s ‘big four’ advertising conglomerates,” alleging claims of gender and pregnancy discrimination. Defendant MSL collected approximately 3 million documents from custodians agreed upon by the parties. The parties agreed generally with Defendant’s plan to use predictive coding, but Plaintiffs objected to the specific, proposed methodology. Judge Peck ordered the parties to submit an ESI protocol, including predictive coding protocols, to which Plaintiffs objected.

The parties’ ESI protocol is noteworthy for its collaborative approach. The Court required that documents reviewed and issue coded by Defendant’s counsel as the “seed set” to “train” the system include a 2,399 document random sample plus other relevant documents identified by keyword searches suggested by both the Defendant and Plaintiffs. The parties agreed to run seven “iterative rounds,” whereby defense counsel would review a sample of 500 documents in each round to test whether the computer was accurately returning relevant documents. At the end of this process, Defendant would review another sample of documents coded as nonrelevant as a quality control check. The protocol also required Defendant to provide its adversary all documents (less privileged documents), including relevance and issue tag codes, at each stage of the process.8

Judge Peck’s Ruling

Judge Peck found that predictive coding technology furthers the goal of Rule 1 of the Federal Rules of Civil Procedure to “secure the just, speedy, and inexpensive determination” of lawsuits, as well as the “proportionality doctrine” set forth in Rule 26(b)(2)(C).9 He concluded that predictive coding was appropriate in Da Silva Moore in light of: (1) the parties’ agreement to use predictive coding; (2) the vast amount of ESI to be reviewed; (3) the superiority of computer-assisted review to the alternatives; (4) the need for cost effectiveness and proportionality under Rule 26(b)(2)(C); and (5) the transparent process proposed by Defendant.10 The Court also emphasized the importance of the parties embracing transparency when approaching an ESI protocol and “highly recommend[ed] that counsel in future cases be willing to at least discuss, if not agree to, such transparency in the computer-assisted review process.”11

Lessons for the Future

Judge Peck’s opinion — intended for a broader audience than the parties before him — provides litigants with four “lessons for the future” when considering implementing predictive coding technology:

  1. With respect to predictive coding protocols, the process needs to be “quality control verified.” 
  2. e-Discovery consultants have a constructive role to play, including appearing at discovery hearings to address the Court when these issues present (jokingly referred to as “bring your geek to court day”).12 
  3. Consider “staging” discovery, by starting with the most relevant sources of data, as a useful way to control discovery costs.
  4. Parties requesting documents with existing knowledge of the producing party’s systems should cooperate with their adversary to identify potential sources of ESI and inform the adversary’s records review and collection. The Court encouraged both sides to engage in “strategic proactive disclosure of information” in order to achieve a more efficient discovery process.13

As the first official judicial acceptance of predictive coding technology, Judge Peck's ruling in Da Silva Moore paves the way for clients to embrace and adopt this technology. Whether this technology is appropriate for discovery in any given case depends on a variety of factors, including the volume of documents at issue and the amount in controversy in the dispute. It was important to Judge Peck that the parties agreed to use predictive coding technology, noting that if one party objects to predictive coding, it is a "slightly more difficult question" for a court to approve using this technology. But with Da Silva Moore on the books, parties with large datasets at issue have formidable support to encourage Courts to authorize this type of cost-saving technology, even over objections. Predictive coding technology also shows cost-saving promise for purely internal document reviews, such as internal investigations, that do not require buy-in from adversaries.



1Monique Da Silva Moore v. Publicis Groupe & MSL Group, No. 11 Civ. 1279 (ALC)(AJP), Dkt. No. 96 (Slip Op.) (S.D.N.Y. Feb. 24, 2012) (“Da Silva Moore”).
2Da Silva Moore, slip op. at 25-26.
3Id. at 3.
4Id. at 4.
5Id. at 18.
7Id. at 20.
8Id. at 9-12.
9Id. at 21-22.
10Id. at 22.
11Id. at 23. Plaintiffs promptly appealed Judge Peck’s order to the District Court. We anticipate that the District Court will defer to Judge Peck’s expertise and affirm the Magistrate Judge’s ruling.
12 Id. at 23-24.
13Id. at 24.