If the end of last year was like a dream come true for OpenAI and Microsoft and their artificial intelligence, the end of this year can be seen as nothing but a nightmare. It started with the change at the top and the saga of the firing and reinstatement of Sam Altman, but will end with the first lawsuit for copyright infringement.
The New York Times has filed a lawsuit in New York District Court alleging that OpenAI used millions of their articles to train its large language models. Microsoft used publisher content to train the model behind Microsoft Copilot. With this, the defendants violated the intellectual property right.
The New York Times ranks third in the representation of content on which GPT has been trained. Only Wikipedia and the US patent database are ahead of them. The Common Crawl dataset includes 16 million unique pieces of content from The Times, from News, Cooking, Wirecutter and Athletic, and over 66 million records of content from The Times.
TAs a result, ChatGPT, as well as Bing Chat, can transmit entire sections of text without even a comma error. Some of this content is available only to subscribers, but now, it’s possible to anyone who can come up with a “prompt” for ChatGPT to run the news. This is the second point in the lawsuit in which the New York Times accuses Open AI and Microsoft of doing serious damage to quality journalism.
Practically, ChatGPT and Bing Chat disable content protection and monetization. Instead of a subscription, people will always be able to visit either of the two assistants and will receive exactly the same information. This will significantly reduce the income of the New York Times, but also of all other news agencies. On the other hand, AI models make boatload of money from content that they didn’t have to pay anything for.
The New York Times spent the last few months trying to find a way for both companies, Microsoft and OpenAI, to pay for the use of their content. So far there is no solution, and as a result, the Times is now seeking billions of dollars in copyright infringement and damages.
The publisher is asking OpenAI and Microsoft to remove all content the AI was trained with. In addition, they request that their content is no longer used to train artificial intelligence. Publishers are divided when it comes to AI and LLM. Some of them, like the BBC and the New York Times, have closed access to OpenAI crawlers. Others, such as the Associated Press, have signed agreements with OpenAI to allow access to AI training content.
Although the New York Times is the first publisher to sue OpenAI, similar lawsuits have already been served to the creators of ChatGPT. A group of writers including comedian Sarah Silverman filed a lawsuit in September for using their work to train artificial intelligence without permission. The same lawsuit against OpenAI was filed by another group of writers led by John Grisham and Game of Thrones author George R. R. Martin.