Lists containing the names of more than 16,000 artists allegedly used to train the Midjourney generative artificial intelligence (AI) programme have gone viral online, reinvigorating debates on copyright and consent in AI image creation. Among the names are Frida Kahlo, Walt Disney and Yayoi Kusama.
Outrage among artists on X (formerly Twitter) was first provoked by the posting of a Google spreadsheet named “Midjourney Style List”, supposedly retrieved from Midjourney developers during a process of refining the programme’s ability to mimic works of specific artists and styles. While access to the web document (which remains partially visible on the Internet Archive) was swiftly restricted, many of the artists and prompts which appeared also feature in publicly accessible court documents for a 2023 class-action lawsuit, within a 25-page list of names referenced in training images for the Midjourney programme.
Even though the practice of using human artists’ work without their permission to train generative AI programmes remains in uncertain legal territory, controversies surrounding documents like the “Midjourney Style List” shed light on the actual processes of converting copyrighted artwork into AI reference material.
In a series of posts on X, the artist Jon Lam (who works for the video-game developer Riot Games) shared screenshots of a chat in which Midjourney developers purportedly discuss preloading artist names and styles into the programme from Wikipedia and other sources, guaranteeing that selected artists’ work would be available for mimicry and prevalently featured as reference material for image creation. One screenshot features an apparent post by Midjourney’s chief executive, David Holz, in which he welcomes the addition of 16,000 artists to the programme’s training. Another contains a message in which a chat member sarcastically addresses the issue of copyright, saying that “all you have to do is just use those scraped datasets and the [sic] conveniently forget what you used to train the model. Boom legal problems solved forever”. (Four members of the group responded to this with an enthusiastically affirmative “100” emoji.)
The “scraped” datasets mentioned in the chat are a central feature of the class-action lawsuit, also gaining attention online, which seeks to win compensation from Stability AI, Midjourney and DeviantArt for the non-consensual use of human artists’ work in training generative AI programmes. While the original lawsuit was partially dismissed by a federal judge in October for being “defective in numerous respects”, it was amended and refiled in November, adding several plaintiffs to the suit as well as the video generator Runway AI to the list of defendants.
Lam has urged artists who found their names among the list of more than 16,000 to sign on as additional plaintiffs, saying: “Gen AI techbros would have you believe the lawsuit is dead or thrown out, no, the lawsuit is still alive and well, and more evidence and plaintiffs have been added to the casefile.”
The updated case file notes that “the Court denied Stability AI’s attempt to dismiss plaintiffs’ most significant claim, namely the direct copyright-infringement claim for misappropriation of billions of images for AI training”. Midjourney’s attempt to dismiss the claim was also denied.
Central to the claim that Midjourney is guilty of copyright infringement is its programme's use of the LAION-5B dataset, a collection of 5.85 billion images collected from the internet, including copyrighted works. While all iterations of LAION were made public with the request that they “should only be used for academic research purposes”, the lawsuit alleges that Midjourney knowingly used the collection in its monetised services, training the company’s generative AI programme on LAION images. The case also claims that Midjourney’s use of Stability AI’s Stable Diffusion text-to-image software constitutes copyright infringement, as the programme was itself trained on a collection of uncredited, copyrighted works.
Tools for artists to combat copyright infringement have been mentioned in nearly all discussions of generative AI, with the University of Chicago’s Glaze programme among the most popular. With a stated goal of protecting artists from programmes like Midjourney and Stable Diffusion, Glaze alters the digital data of an image so that it “appears unchanged to human eyes, but appears to AI models like a dramatically different art style”. While imperfect, the free system has been increasingly recommended in response to new concerns for targeted style mimicry—a post on X following the “Midjourney Style List” urging artists to “Glaze” their work received more than 1,000 likes and 400 reposts.
The website haveibeentrained.com has also been widely shared amongst artists, offering the opportunity to see whether one’s work has been included as a training image in a generative-AI programme. It also has a Do Not Train Registry, which precludes works from inclusion in cooperating datasets.