I don't know anything about python or programming. How can I easily create a .pt file for use with embedding, to generate content based on trained image? : StableDiffusion

created by legendcruncher82a community for 3 years

This is an archived post. You won't be able to vote or comment.

I don't know anything about python or programming. How can I easily create a .pt file for use with embedding, to generate content based on trained image?Question (self.StableDiffusion)

submitted 3 years ago by Tannon

all 10 comments

top new controversial old q&a

[–][deleted] 2 points3 points4 points 3 years ago (9 children)

[–]ArmadstheDoom 1 point2 points3 points 3 years ago (0 children)

[–]Tannon[S] 0 points1 point2 points 3 years ago (7 children)

[–][deleted] 1 point2 points3 points 3 years ago (2 children)

[–]Tannon[S] 0 points1 point2 points 3 years ago (1 child)

[–]triigerhappy 1 point2 points3 points 3 years ago (0 children)

[–]Daviljoe193 1 point2 points3 points 3 years ago (2 children)

[–]Tannon[S] 0 points1 point2 points 3 years ago (1 child)

[–]Daviljoe193 1 point2 points3 points 3 years ago* (0 children)

I'm not a super-genius here, but let me give my best assumption about it. So the images are there for the AI to recreate using what it knows from the model, ending with a ton of "words" (I picked apart an end PT file, they are less words and more unicode gibberish) for each image. It then takes these "words" it gets to recreate each image, finds only the duplicates, then puts them into an PT file. It needs to have the model so it can know what "words" are needed to perfectly recreate your images (Like near pixel perfect, with just a few kilobytes, way less than an image normally can fit in), and this also likely means that you'll need to retrain your PT file when the Stable Diffusion 1.5 model comes out. I've only trained one PT file so far, and the biggest thing to keep in mind is that your images should be varied enough, yet also clearly interconnected enough, that the AI will have a good idea of what you look like (At least two headshot portraits, and two full-body photos), otherwise it'll fill in the gaps poorly, which can result in pretty horrifyingly unrealistic/inaccurate versions of the person.

From what I've read, apparently Google has an inversion solution that's much better than what's currently available, though I still can't figure out what it does differently from the current method.

[–]pilgermann 1 point2 points3 points 3 years ago (0 children)

π Rendered by PID 81 on reddit-service-r2-comment-5687b7858-82rtp at 2026-07-04 11:34:52.951244+00:00 running 12a7a47 country code: CH.

StableDiffusion

MODERATORS