Nvidia sued for using copyrighted books to train AI framework

Chip giant Nvidia has been sued in a US court for using copyrighted books without seeking permission from their authors to train its AI framework NeMo. Three authors Brian Keene, Abdi Nazemian and Stewart O'Nan have claimed that their books were part of a dataset of about 196,640 books used to train NeMo to simulate written language, according to a report in Reuters.

Last October, after the authors raised a copyright infringement claim, three books- "Ghost Walk" by Brian Keene, "Like a Love Story" by Abdi Nazemian, and "Last Night at the Lobster" by Stewart O'Nan - were removed from the dataset.

The three authors are now taking Nvidia to court and have filed a class action lawsuit seeking unspecified damages in a San Francisco federal court. They believe that the takedown of their books shows that the company had used them for training NeMo.

The recent breakthrough in generative AI and the ensuing frenzy among enterprises to tap it has helped not just companies like Open AI that created ChatGPT but even the likes of Nvidia that provide GPUs to run and train LLMs like it.

Last month, Nvidia's valuation reached $ 2 trillion from $1 trillion in just nine months after the demand for its chips soared on the back of the generative AI boom.

To speed up the adoption of generative AI, Nvidia released a NeMo AI framework last year and made it open source so researchers can improve it. The framework provides training and inferencing tools to build generative AI models in a fast and cost-effective manner.

Copyright lawsuits are turning out to be a major concern for companies that are building LLMs and need large volumes of data to train them.

In January 2023, a group of artists filed a class action suit in a California court against three tech firms Stability AI, DeviantArt, and Midjourney for using their copyrighted images without permission to train Stable Diffusion, another generative AI model that concerts texts into images.

US-based lawyer Matthew Butterick, who filed the lawsuit on behalf of artists-- Sarah Andersen, Kelly McKernan, and Karla Ortiz, said in a blog post that Stable Diffusion uses unauthorized copies of millions of copyrighted images without the knowledge or consent of the artists.

A few months before that in November 2022, a lawsuit was filed against GitHub, Microsoft, and OpenAI for using public GitHub repositories to train OpenAI’s Codex, the underlying model behind GitHub Copilot, which can write codes based on text prompts.

The lawsuit claims that the three firms have violated the legal rights of a vast number of creators who posted their codes under certain open-source licenses on GitHub.