2023 marked the rise of generative AI and 2024 could well be the year its makers reckon with the technology’s fallout of the industry-wide arms race. Currently, OpenAI is aggressively pushing back against recent lawsuits’ claims that its products including ChatGPT are illegally trained on copyrighted texts. What’s more, the company is making some bold legal claims as to why their programs should have access to other people’s work.
[Related: Generative AI could face its biggest legal tests in 2024.]
In a blog post published on January 8, OpenAI accused The New York Times of “not telling the full story” in the media company’s major copyright lawsuit filed late last month. Instead, OpenAI argues its scraping of online works falls within the purview of “fair use.” The company additionally claims that it currently collaborates with various news organizations (excluding, among others, The Times) on dataset partnerships, and dismisses any “regurgitation” of outside copyrighted material as a “rare bug” they are working to eliminate. This is attributed to “memorization” issues that can be more common when content appears multiple times within training data, such as if it can be found on “lots of different public websites.”
“The principle that training AI models is permitted as a fair use is supported by a wide range of [people and organizations],” OpenAI representatives wrote in Monday’s post, linking out to recently submitted comments from several academics, startups, and content creators to the US Copyright Office.
In a letter of support filed by Duolingo, for example, the language learning software company wrote that it believes that “Output generated by an AI trained on copyrighted materials should not automatically be considered infringing—just as a work by a human author would not be considered infringing merely because the human author had learned how to write through reading copyrighted works.”…
Read the full article here