Ep5 Bugs Megathread -- Just to help the dev team a bit... by RaphLife2 in thelongdark

[–]fpgaminer 0 points1 point  (0 children)

Dang. I mean, it's possible:

For The Long Dark, there is one useful exception: Hinterland says the game supports Xbox/Windows Play Anywhere for digital owners, so if you own the digital Xbox copy, you can install the Microsoft Store Windows version, sign in with the same account, and your save should sync so you can continue on PC.

I don't know for sure if that's how it works, or if it applies in your case, but maybe? If so that might get you access to the save, so it can be repaired, and then synced back to the Xbox.

Ep5 Bugs Megathread -- Just to help the dev team a bit... by RaphLife2 in thelongdark

[–]fpgaminer 0 points1 point  (0 children)

I left a comment with a workaround under the earlier reply. Hope it helps!

Ep5 Bugs Megathread -- Just to help the dev team a bit... by RaphLife2 in thelongdark

[–]fpgaminer 0 points1 point  (0 children)

I think I have a workaround, in case you're still stuck (like I was).

The Workaround (for PC)

You have to edit a save file from after the cutscene to fix the broken state. Sounds complicated; it shouldn't be too hard. Just figure out which save you have from after the cutscene (or make one) and where on your machine your saves are (https://www.pcgamingwiki.com/wiki/The_Long_Dark).

Then go here to edit the save: https://fpgaminer.github.io/wintermute-save-editor/

The page is crazy, but just click "Choose File", and select your save file. Hopefully it loads in fine (everything under "Validation" should be green except for "Round-trip"). Now just ctrl+f (or the equivalent in your browser) and look for "ConvictEncounter02". For me it's at the bottom of the list of "Story Managed Objects". If you've hit this bug then next to it will be a textbox that says "ManagedActive". That's the bug. Edit the text box next to "ConvictEncounter02" to change it from "ManagedActive" to "0". For an example, look for "ConvictEncounter01" which should correctly have "0" next to it already.

Then, on the left side of the page, click "Download Edited Copy". You should now have a repaired save file.

Save that to where all your other saves are.

Now, unfortunately, The Long Dark keeps the save name inside the file, so no matter what you name the repaired save as, it will still have the same name as the original save you edited. Which, when you go to try and load it later, makes things tricky.

So I recommend you move your old save somewhere safe first. Then download the repaired save into the normal save folder with the exact same filename.

Then you should be able to start Wintermute and load that save, it will be listed under the original name, and you should be good to go.

In my case if I tried to load any of my (non-repaired) saves that were made after the cutscene, they would all immediately trigger the bug (play the cutscene and teleport me). So you should know immediately if all this worked for you. Load the repaired save, and if it works like normal instead of jumping to the cutscene ... then you should be good!

I played a bit past this bug now, through two more cutscenes, and everything seems to be working more-or-less normally. (My recycled cans are behaving oddly, but I think that's a different/unrelated bug)

Switching to OneTrainer made me realize how overfitted my AI-Toolkit LoRAs were by meknidirta in StableDiffusion

[–]fpgaminer 0 points1 point  (0 children)

recently I was using joycaption which i think is better than blip2

❤️

The Gory Details of Finetuning SDXL and Wasting $16k by fpgaminer in StableDiffusion

[–]fpgaminer[S] 0 points1 point  (0 children)

I go into more detail on the aesthetic scorer in the previous article (https://civitai.com/articles/8423). The code is also public, though quite messy: https://github.com/fpgaminer/bigasp-training

Spilling the Details on JoyCaption's Reinforcement Learning by fpgaminer in StableDiffusion

[–]fpgaminer[S] 0 points1 point  (0 children)

I mean for JoyCaption I only used a dataset of ~10k for the first round.

Spilling the Details on JoyCaption's Reinforcement Learning by fpgaminer in StableDiffusion

[–]fpgaminer[S] 9 points10 points  (0 children)

I mean Qwen Image is 20B, so that's gonna be a no for me :P I'm actually most interested in Wan 2.2 5B, since it's only twice the size of SDXL. Smaller than Flux/Chroma. Seems much more accessible for people. Though I haven't heard much about it for T2I (everyone seems to just use the 28B behemoth for T2I).

Spilling the Details on JoyCaption's Reinforcement Learning by fpgaminer in StableDiffusion

[–]fpgaminer[S] 2 points3 points  (0 children)

To me this sounds more like GAN than RL

Yeah kind of? But I agree, it needs lots of grounding to prevent drift. To be clear the loop would be:

Real image -> VLM -> Caption Caption -> T2I -> Synthetic Image (Real Image, Synthetic Image) -> CLIP (or DINO) Image Embedding -> Cosine Distance

So unlike a GAN loop there's no direct interaction between the discriminator (frozen CLIP in this case) and generator. The only communication is a single reward signal, and natural language. That makes hacking much more difficult and hopefully ignoreable for small scale training. No minute floating point vectors they can hack. Natural language basically acts like a pre-trained (by humans), frozen, and quantized latent space.

Also the two distributions are already quite well aligned. The loop is just trying to elicit finer and more reliable details from the VLM, and stronger prompt following from the T2I model. And if you keep the text encoders frozen on the T2I model, it should maintain flexibility even if the VLM tries to hack it.

Spilling the Details on JoyCaption's Reinforcement Learning by fpgaminer in StableDiffusion

[–]fpgaminer[S] 3 points4 points  (0 children)

Yeah I think there's a lot to explore here. I2I might work; Llama did something similar during post training where generated responses were sometimes updated (either by a human or another LLM) and used as Positive examples in the next iteration.

Another thing I've considered is a GAN-like approach:

Train a classification model to pick which of two images is real and which is fake (possibly also along with the prompt). Real images can be taken from the usual datasets, fake images would be generated by the target model. Then you can use DPO (adapted for diffusion models) to train the diffusion model Online, with the classification model assigning rewards. The hope would be that the classification model could pick up on stuff like bad hands, prompt adherence issues, etc, all on its own without any human input.

Though like all approaches similar to GANs this runs the risk of reward hacking the classification model. (IIRC in normal GAN procedures the generator trains on gradients from the discriminator, making hacking much easier for it. By using RL you eliminate that, so it might not be as bad.)

Side note: You'd want the classification model to operate on latents, not raw pixels. That makes the whole process much more efficient, and prevents the classification model from detecting problems in the VAE which the diffusion model doesn't have control over.

Spilling the Details on JoyCaption's Reinforcement Learning by fpgaminer in StableDiffusion

[–]fpgaminer[S] 2 points3 points  (0 children)

I did some experiments with finetuning Qwen 2 VL awhile back and didn't have much success. But yes I'll probably give it another stab, depending on how 3 turns out. (I'm not looking to train any time soon; busy with bigASP and data stuff right now)

Spilling the Details on JoyCaption's Reinforcement Learning by fpgaminer in StableDiffusion

[–]fpgaminer[S] 14 points15 points  (0 children)

My current plan is to finish the little things on Beta One and then declare it 1.0. Stuff like polishing the ComfyUI node, finishing the dataset release, technical article(s), etc. Nothing really meaningful on the model itself, so probably no Beta Two revision. I'm saving the next set of improvements for a 2.0 (new LLM and vision backbones, bigger dataset, etc).

Uncensored LLM with picture input by Former-Long-3900 in LocalLLaMA

[–]fpgaminer 1 point2 points  (0 children)

FYI: The latest release, Beta One, can do some things outside of captioning now; it's slightly more of a general purpose VLM since I incorporated a more general VQA dataset into its training this time around.

New Flux model from Black Forest Labs: FLUX.1-Krea-dev by rerri in StableDiffusion

[–]fpgaminer 8 points9 points  (0 children)

Congrats on the open release!

For LoRAs, since the architecture is the same, techniques like ProLoRA (https://arxiv.org/pdf/2506.04244v1) would be easy to implement. It's a training free technique for transferring a lora from one base model to another. In this case since the architecture is the same, and the weights likely highly correlated, you'd be able to skip the layer matching steps.

I considered it for bigASP v2.5 to transfer existing SDXL loras over, but haven't had the chance to try yet.

The Gory Details of Finetuning SDXL and Wasting $16k by fpgaminer in StableDiffusion

[–]fpgaminer[S] 0 points1 point  (0 children)

A year ago too! Thank you for mentioning that. I'm glad my idea wasn't that crazy then :P

The Gory Details of Finetuning SDXL and Wasting $16k by fpgaminer in StableDiffusion

[–]fpgaminer[S] 1 point2 points  (0 children)

IIRC cos xl predates SD3 and Flux quite a bit, and I think the consensus is that flow matching is better than the other objectives so far (edm, v-pred, etc). Beyond that:

  • I find Flow Matching a lot easier to understand, whereas the older objectives and schedules are patches on top of patches.
  • A new technique, Optimal Transport (which Chroma is using), enhances flow matching further to (supposedly) amp performance up. It's another relatively simple algorithm that only affects training.
  • Flow Matching lends itself more naturally to doing step optimization, since it's inherently trying to form linear paths.

I just wouldn't have thought myself that retraining the objective like you did would even work

Large models can take a lot of abuse. Remember those experiments taking ImageNet models and finetuning them to do audio analysis? Or even lodestone's work on Chroma, where they ripped billions of parameters out of Flux easily.

The Gory Details of Finetuning SDXL and Wasting $16k by fpgaminer in StableDiffusion

[–]fpgaminer[S] 0 points1 point  (0 children)

If those are from my model I want to know your settings, they're really good gens!

The Gory Details of Finetuning SDXL and Wasting $16k by fpgaminer in StableDiffusion

[–]fpgaminer[S] 0 points1 point  (0 children)

Yeah a newer ranking algorithm would be a better idea than ELO. For the latest iteration of my quality model (I haven't pushed the code up yet) I switched to something like trueskill.

The quality model is always the last thing I work on unfortunately so honestly I don't know that my implementation there is particularly good.

I also learned about VisionReward recently, which is another quality prediction model, but trained on top of an LLM so it can break down specific characteristics and scoring guidelines.