top of page
Staff Writer

Nvidia sued for using copyrighted books to train AI framework


Nvidia headquarters

Chip giant Nvidia has been sued in a US court for using copyrighted books without seeking permission from their authors to train its AI framework NeMo. Three authors Brian Keene, Abdi Nazemian and Stewart O'Nan have claimed that their books were part of a dataset of about 196,640 books used to train NeMo to simulate written language, according to a report in Reuters.

 

Last October, after the authors raised a copyright infringement claim, three books- "Ghost Walk" by Brian Keene, "Like a Love Story" by Abdi Nazemian, and "Last Night at the Lobster" by Stewart O'Nan - were removed from the dataset.


The three authors are now taking Nvidia to court and have filed a class action lawsuit seeking unspecified damages in a San Francisco federal court. They believe that the takedown of their books shows that the company had used them for training NeMo.


The recent breakthrough in generative AI and the ensuing frenzy among enterprises to tap it has helped not just companies like Open AI that created ChatGPT  but even the likes of Nvidia that provide GPUs to run and train LLMs like it. 

Last month, Nvidia's valuation reached $ 2 trillion from $1 trillion in just nine months after the demand for its chips soared on the back of the generative AI boom. 


To speed up the adoption of generative AI, Nvidia released a NeMo AI framework last year and made it open source so researchers can improve it. The framework provides training and inferencing tools to build generative AI models in a fast and cost-effective manner. 


Copyright lawsuits are turning out to be a major concern for companies that are building LLMs and need large volumes of data to train them. 


In January 2023, a group of artists filed a class action suit in a California court against three tech firms Sta­bil­ity AI, DeviantArt, and Mid­jour­ney for using their copyrighted images without permission to train Sta­ble Dif­fu­sion, another generative AI model that concerts texts into images. 


US-based lawyer Matthew Butterick, who filed the lawsuit on behalf of artists-- Sarah Andersen, Kelly McKernan, and Karla Ortiz, said in a blog post that Sta­ble Dif­fu­sion uses unau­tho­rized copies of mil­lions of copy­righted images with­out the knowl­edge or con­sent of the artists.


A few months before that in November 2022, a lawsuit was filed against GitHub, Microsoft, and OpenAI for using pub­lic GitHub repos­i­to­ries to train OpenAI’s Codex, the underlying model behind GitHub Copilot, which can write codes based on text prompts. 

The lawsuit claims that the three firms have vio­lated the legal rights of a vast num­ber of cre­ators who posted their codes under cer­tain open-source licenses on GitHub. 



bottom of page