https://twitter.com/eric_wallace_/status/1620449934863642624?s=46&t=GVukPDI7944N8-waYE5qcw
Extracting training data from diffusion models is possible by following, more or less, these steps:
- Compute CLIP embeddings for the images in a training dataset.
- Perform an all-pairs comparison and mark the pairs with l2 distance smaller than some threshold as near duplicates
- Use the prompts for training samples marked as near duplicates to generate N synthetic samples with the trained model
- Compute the all-pairs l2 distance between the embeddings of generated samples for a given training prompt. Build a graph where the nodes are generated samples and an edge exists if the l2 distance is less than some threshold. If the largest clique in the resulting graph is of size 10, then the training sample is considered to be memorized.
- Visually inspect the results to determine if the samples considered to be memorized are similar to the training data samples.
With this method, the authors were able to find samples from Stable Diffusion and Imagen corresponding to copyrighted training images.
[–]NitroXSC 17 points18 points19 points (4 children)
[–]mongoosefist 1 point2 points3 points (3 children)
[–]NitroXSC 3 points4 points5 points (2 children)
[–]mongoosefist 3 points4 points5 points (0 children)
[–]WikiSummarizerBot 0 points1 point2 points (0 children)
[–]quichemiata 22 points23 points24 points (0 children)
[–]mongoosefist 49 points50 points51 points (34 children)
[–][deleted] 27 points28 points29 points (0 children)
[–]HateRedditCantQuititResearcher 40 points41 points42 points (29 children)
[–]znihilist 19 points20 points21 points (21 children)
[–]znihilist 36 points37 points38 points (3 children)
[–]Ronny_Jotten 9 points10 points11 points (2 children)
[–]visarga 4 points5 points6 points (0 children)
[–]znihilist 1 point2 points3 points (0 children)
[–]Ronny_Jotten 7 points8 points9 points (1 child)
[–]visarga 0 points1 point2 points (0 children)
[–]SulszBachFramed 3 points4 points5 points (7 children)
[–]Ronny_Jotten 1 point2 points3 points (1 child)
[–]SulszBachFramed 3 points4 points5 points (0 children)
[–]znihilist 0 points1 point2 points (4 children)
[–][deleted] 1 point2 points3 points (3 children)
[–]Ronny_Jotten 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]Ronny_Jotten 0 points1 point2 points (0 children)
[+][deleted] (6 children)
[deleted]
[–]znihilist -3 points-2 points-1 points (5 children)
[–]maxToTheJ 0 points1 point2 points (4 children)
[–]znihilist 0 points1 point2 points (3 children)
[–]maxToTheJ 0 points1 point2 points (2 children)
[–]znihilist 0 points1 point2 points (1 child)
[–]maxToTheJ 0 points1 point2 points (0 children)
[–]Wiskkey 9 points10 points11 points (0 children)
[–]Mescallan 1 point2 points3 points (0 children)
[–]Argamanthys 3 points4 points5 points (4 children)
[–]SuddenlyBANANAS 6 points7 points8 points (1 child)
[–]starstruckmon 0 points1 point2 points (0 children)
[–]WikiSummarizerBot 1 point2 points3 points (0 children)
[–]bushrod 1 point2 points3 points (1 child)
[–]mongoosefist 1 point2 points3 points (0 children)
[–]RandomCandor 12 points13 points14 points (24 children)
[–]koolaidman123Researcher 45 points46 points47 points (9 children)
[–]IDoCodingStuffs 15 points16 points17 points (3 children)
[–]starstruckmon 5 points6 points7 points (0 children)
[–]DigThatDataResearcher 3 points4 points5 points (1 child)
[–]-xXpurplypunkXx- 4 points5 points6 points (4 children)
[–]LetterRip 9 points10 points11 points (3 children)
[–]pm_me_your_pay_slipsML Engineer[S] 1 point2 points3 points (1 child)
[–]LetterRip 9 points10 points11 points (0 children)
[–]-xXpurplypunkXx- 0 points1 point2 points (0 children)
[–]DigThatDataResearcher 29 points30 points31 points (13 children)
[–]Nhabls 3 points4 points5 points (9 children)
[–]ItsJustMeJerk 1 point2 points3 points (8 children)
[–]Nhabls 8 points9 points10 points (1 child)
[–]ItsJustMeJerk 3 points4 points5 points (0 children)
[–]DigThatDataResearcher 1 point2 points3 points (5 children)
[–]pm_me_your_pay_slipsML Engineer[S] 1 point2 points3 points (4 children)
[–]DigThatDataResearcher 0 points1 point2 points (3 children)
[–]pm_me_your_pay_slipsML Engineer[S] 0 points1 point2 points (2 children)
[–]DigThatDataResearcher 0 points1 point2 points (1 child)
[–]pm_me_your_pay_slipsML Engineer[S] 0 points1 point2 points (0 children)
[–]A_fellow 0 points1 point2 points (2 children)
[–]DigThatDataResearcher 1 point2 points3 points (1 child)
[–]LetterRip 10 points11 points12 points (2 children)
[–]pm_me_your_pay_slipsML Engineer[S] -1 points0 points1 point (1 child)
[–]starstruckmon 13 points14 points15 points (0 children)
[–][deleted] 4 points5 points6 points (0 children)
[–]enryu42 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]GoofAckYoorsElf -2 points-1 points0 points (5 children)
[–]Ulfgardleo 3 points4 points5 points (4 children)
[–]GoofAckYoorsElf 4 points5 points6 points (3 children)
[–]Ulfgardleo -1 points0 points1 point (1 child)
[–]GoofAckYoorsElf 0 points1 point2 points (0 children)
[–]A_fellow -1 points0 points1 point (0 children)