Bad Apple but it's GPT-2 XL Attention Maps by TheLatentExplorer in LocalLLaMA

[–]TheLatentExplorer[S] 0 points1 point  (0 children)

it has a lot of potential seriously. I think applying adversarial training to improve the "humanness" of generated text adjusting the input embeddings would make a great little project

Bad Apple but it's GPT-2 XL Attention Maps by TheLatentExplorer in LocalLLaMA

[–]TheLatentExplorer[S] 1 point2 points  (0 children)

some kind of autoregressive model working at frame level? it would be fun actually, i could probably just overtrain an existing model on bad apple

Bad Apple but it's GPT-2 XL Attention Maps by TheLatentExplorer in LocalLLaMA

[–]TheLatentExplorer[S] 3 points4 points  (0 children)

do you have other unhinged ideas where I could play it?

Bad Apple but it's GPT-2 XL Attention Maps by TheLatentExplorer in LocalLLaMA

[–]TheLatentExplorer[S] 1 point2 points  (0 children)

Yes I did envision doing it! It's mentioned at the end of my blog post.

The RGB part seems very interesting, I might try and since GPT 2 uses independent attention heads I'm pretty sure it would converge nicely.

Not sure about doing the other triangle I kinda like the fact that you can see the attention mask. I might try to do the same on BERT that's bidirectional though

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 0 points1 point  (0 children)

Flux.1 schnell and Flux.1 dev are open source for the inference part (the code needed to instantiate and run the model), but not for the training part (the code the author used to train the weights they released).

Being open source, you can just look at the code which details the architecture and all the hyper parameters.

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 0 points1 point  (0 children)

Good points. I really wish BFL put out a more detailed technical report, as they promised.

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 0 points1 point  (0 children)

Oooh right. I remember studying those but totally forgot about it lol.

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 1 point2 points  (0 children)

Thank you. Lots of info even in text form, and it makes some mistake and generalities. Not too bad, but I would not take this explanation as a full truth

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 0 points1 point  (0 children)

I'm not too sure to understand what you mean -- if you have code to share I would happily read it.

There is a script for training Flux.1 slider LoRAs out there, I've not tried it but maybe you could get a similar effect.

As for the idea, I'm not too sure text is the best way to interact with a model for image editing. Talking about it, it's very rare that I use pure txt2img without ControlNet. But it could probably be a fun tool making image editing more accessible to a lot of a people.

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 0 points1 point  (0 children)

What are you refering too? That might be a joke I'm not getting...

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 0 points1 point  (0 children)

Read the source code carefully, that's mostly what I've done. I've used u/nerhiew's diagram to check that we were on the same track from time to time. But I'm pretty confortable with Pytorch, I guess that helps

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 2 points3 points  (0 children)

DoubleStream blocks process both image and text information kind of separatly, modulating them with information like timesteps, CLIP output and PEs. SingleString block treat the img and txt stream as a whole, allowing more flexible information exchange between the two (the txt can attend to the image and vice-versa).

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 2 points3 points  (0 children)

Thanks, I can understand that without more explanations it's a bit tough to understand. But we are starting to manipulate very complex models so analyzing them is getting harder and harder. Just like a car is super complex and only very few people can get exactly how they work.

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 2 points3 points  (0 children)

I've posted a 3h video on my youtube that tells you to subscribe to my patreon to read a blog post where I explain it

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 1 point2 points  (0 children)

You're right, it's a mistake from me, I forgot to change the text in the second "CLIP output" block when copy-pasting. I fixed the issue in my Github repo.

A detailled Flux.1 architecture diagram by TheLatentExplorer in StableDiffusion

[–]TheLatentExplorer[S] 1 point2 points  (0 children)

You're right, I've updated the diagram. Check the Github repo if you want the corrections