- Thread Author
- #1

Millions of books were physically purchased under Anthropic's plan, dubbed "Project Panama," and then shredded after being used for AI training.
The extensive project, launched by AI startup Anthropic in early 2024 and which it tried to keep secret from the public, came to light through documents unsealed as part of a copyright lawsuit.
The plan, referred to as "Project Panama" in the company's internal communications, aimed to physically scan a large portion of the world's books.
An internal planning document, revealed last week when court records were unsealed, stated: "Project Panama is our effort to destructively scan all the books in the world. We don't want it to be known that we are working on this project."
MILLIONS OF DOLLARS SPENT
According to court documents published by the Washington Post, Anthropic spent millions of dollars over approximately one year to purchase millions of books, cut their bindings, scan their pages, and use them to train its Claude language models.
Details about Project Panama, which had not previously been disclosed to the public, emerged in a 4,000-page document filed in the copyright lawsuit brought against Anthropic by authors. The company, valued at $183 billion by investors, settled the lawsuit for $1.5 billion in August. However, a federal judge's decision last week to unseal the documents revealed Anthropic's aggressive data collection efforts regarding books in detail.
These documents, along with records from similar lawsuits against other major tech companies like Meta, Google, and OpenAI, show how far AI companies have gone to acquire massive data sets to "train" their software.
"BOOKS WERE A GOLDMINE"
According to court records, books were considered a critical resource for AI companies. In a January 2023 document, one of Anthropic's co-founders argued that models trained on books could "learn to write well" instead of mimicking "low-quality internet language."
An email sent from within Meta in 2024 described access to digital book archives as "vital" for competing with rival AI companies.
However, the documents indicate that companies did not find it practical to obtain direct permission from publishers and authors. According to the lawsuits, Anthropic, Meta, and others sought ways to acquire books en masse without the authors' knowledge, including downloading pirated copies.
The documents also show that Meta employees were repeatedly warned that unauthorized downloading of millions of books could constitute copyright infringement. A December 2023 email stated that this practice was approved "after an escalation to MZ." "MZ" is believed to refer to Meta CEO Mark Zuckerberg.
LIBGEN DOWNLOADS
According to a newly unsealed court document, Anthropic co-founder Ben Mann downloaded fiction and non-fiction books from the pirated library called LibGen for 11 days in June 2021. Screenshots filed in the case show Mann using file-sharing software.
Anthropic, in its defense submitted to the court, stated that LibGen data was not used to train revenue-generating commercial models.
WHAT DO THE COURTS SAY?
Google, Microsoft, and OpenAI also face similar copyright lawsuits. While most of these cases are ongoing, in two early decisions, judges ruled that using books for AI training could be legal in some cases under "fair use" in copyright law.
In June, Judge William Alsup stated that Anthropic used the books in a "transformative" way, likening the process to teachers teaching students how to write. Around the same time, Judge Vince Chhabria, who presided over the Meta case, ruled that the authors could not prove that Meta's AI models harmed book sales.
However, companies can still face accusations due to the controversy over how they acquired the books. It was ruled that Anthropic's free download and storage of millions of pirated books could constitute copyright infringement.
Therefore, the judge allowed authors to file a class-action lawsuit for the books Anthropic downloaded from shadow libraries like LibGen. The company agreed to pay $1.5 billion without admitting guilt. Authors may claim approximately $3,000 per book.
HOW DID THEY GET SO MANY BOOKS?
According to the documents, the company considered obtaining books from libraries or used bookstores. The documents show that the famous Strand Bookstore in New York was among the options, but Strand stated that it does not sell books.
Anthropic eventually purchased millions of books, often in batches of tens of thousands. During the scanning process, it is stated that the books were cut with hydraulic machines, the pages were digitized with high-speed scanners, and then the books were sent for recycling.


















