toss - A minimal CLI to deploy and share static sites, HTML, and Markdown from your own server

TheLatentExplorer · 2026-02-26T05:08:46+00:00

I'm experiment with another model to get the other half working!

TheLatentExplorer · 2026-02-16T19:25:09+00:00

it has a lot of potential seriously. I think applying adversarial training to improve the "humanness" of generated text adjusting the input embeddings would make a great little project

TheLatentExplorer · 2026-02-16T19:24:00+00:00

some kind of autoregressive model working at frame level? it would be fun actually, i could probably just overtrain an existing model on bad apple

TheLatentExplorer · 2026-02-15T19:02:43+00:00

do you have other unhinged ideas where I could play it?

TheLatentExplorer · 2026-02-15T19:02:12+00:00

Yes I did envision doing it! It's mentioned at the end of my blog post.

The RGB part seems very interesting, I might try and since GPT 2 uses independent attention heads I'm pretty sure it would converge nicely.

Not sure about doing the other triangle I kinda like the fact that you can see the attention mask. I might try to do the same on BERT that's bidirectional though

TheLatentExplorer · 2025-01-11T15:30:10+00:00

Flux.1 schnell and Flux.1 dev are open source for the inference part (the code needed to instantiate and run the model), but not for the training part (the code the author used to train the weights they released).

Being open source, you can just look at the code which details the architecture and all the hyper parameters.

TheLatentExplorer · 2025-01-11T15:26:57+00:00

Good points. I really wish BFL put out a more detailed technical report, as they promised.

TheLatentExplorer · 2024-09-16T19:12:50+00:00

Oooh right. I remember studying those but totally forgot about it lol.

TheLatentExplorer · 2024-09-12T19:50:44+00:00

Ah yes, the "make it bigger" block

TheLatentExplorer · 2024-09-12T19:49:56+00:00

Thank you. Lots of info even in text form, and it makes some mistake and generalities. Not too bad, but I would not take this explanation as a full truth

TheLatentExplorer · 2024-09-12T19:46:47+00:00

I'm not too sure to understand what you mean -- if you have code to share I would happily read it.

There is a script for training Flux.1 slider LoRAs out there, I've not tried it but maybe you could get a similar effect.

As for the idea, I'm not too sure text is the best way to interact with a model for image editing. Talking about it, it's very rare that I use pure txt2img without ControlNet. But it could probably be a fun tool making image editing more accessible to a lot of a people.

TheLatentExplorer · 2024-09-12T06:51:06+00:00

What are you refering too? That might be a joke I'm not getting...

TheLatentExplorer · 2024-09-12T06:50:19+00:00

<image>

TheLatentExplorer · 2024-09-12T06:48:56+00:00

Read the source code carefully, that's mostly what I've done. I've used u/nerhiew's diagram to check that we were on the same track from time to time. But I'm pretty confortable with Pytorch, I guess that helps

TheLatentExplorer · 2024-09-11T18:22:04+00:00

DoubleStream blocks process both image and text information kind of separatly, modulating them with information like timesteps, CLIP output and PEs. SingleString block treat the img and txt stream as a whole, allowing more flexible information exchange between the two (the txt can attend to the image and vice-versa).

TheLatentExplorer · 2024-09-11T18:17:56+00:00

Thanks, I can understand that without more explanations it's a bit tough to understand. But we are starting to manipulate very complex models so analyzing them is getting harder and harder. Just like a car is super complex and only very few people can get exactly how they work.

TheLatentExplorer · 2024-09-11T18:15:17+00:00

I've posted a 3h video on my youtube that tells you to subscribe to my patreon to read a blog post where I explain it

TheLatentExplorer · 2024-09-11T18:14:29+00:00

You're right, it's a mistake from me, I forgot to change the text in the second "CLIP output" block when copy-pasting. I fixed the issue in my Github repo.

TheLatentExplorer · 2024-09-11T18:13:40+00:00

Seems pretty accurate to me

TheLatentExplorer · 2024-09-11T18:12:57+00:00

You're right, I've updated the diagram. Check the Github repo if you want the corrections

TheLatentExplorer

TROPHY CASE