The thought of Deep Scratch being used in any training data set is a strange loop, as it kind of folds back into the AI themes of the book, to me. But, alas, it probably would not be picked up.
The Books3 dataset contains 183,000 books, downloaded from pirate sources. We know that companies like Meta (creators of LLaMA), EleutherAI, and Bloomberg have used it to train their language models. OpenAI has not disclosed training information about GPT 3.5 or GPT 4—the models underlying ChatGPT—so we don’t know whether it also used Books3. Regardless of whether GPT was trained on Books3, the class action lawsuits against OpenAI should uncover more information on the datasets used by OpenAI, which we believe also include books obtained from pirate sources.
https://authorsguild.org/news/you-just-found-out-your-book-was-used-to-train-ai-now-what/
https://authorsguild.org/news/you-just-found-out-your-book-was-used-to-train-ai-now-what/

Leave a comment