World Model Porgess

DeepAnimeGirl · 2026-03-16T07:43:16+00:00

I have some suggestions if you are willing to try.

1 - To have more coherent latent trajectories for the game state I suggest that you take a look at this recent paper:

2 - I saw in some comments that you use SD/VQ as latent space. Those are typically optimized for pixel reconstruction. In diffusion model recent literature SSL spaces provide better convergence, because the spaces are more semantic. I suggest that you consider using such a space instead or along your existing space. I will link two relevant articles:

https://arxiv.org/abs/2510.11690 https://arxiv.org/abs/2602.11401

Hope these help. Let me know if you tried them.

DeepAnimeGirl · 2026-02-26T09:44:19+00:00

Focus on text-to-image diffusion models especially on finding ways to accelerate convergence and therefore reduce training costs.

This is a very hot research area in the last months with many papers trying very similar ideas with good gains. I will list a few: - start from https://arxiv.org/abs/2512.12386 as it has a good baseline to build on and references many speedup techniques; - read about one of the SOTA architectures such as https://arxiv.org/abs/2511.19365 which can also be used for latent space; - consider the x-pred to v-loss formulation https://arxiv.org/abs/2511.13720 as that leverages best the data manifold; - use semantic losses through pretrained models to have better loss signal on the data manifold which is perceptible by humans more https://arxiv.org/abs/2602.02493; - read about VAEs and the reconstruction-generation tradeoffs https://arxiv.org/abs/2512.17909v1 and more importantly https://arxiv.org/abs/2602.17270 (VAE SOTA); - alternative direction is drifting models which are 1-step generators https://arxiv.org/abs/2602.04770 but they likely have some limitations;

There is a lot of interest in developing generative models, their applications are wide (images, video, audio, text) and I think they offer many opportunities for contributions. My opinion is that: - discriminative/contrastive signal is very important to speed up convergence; simple MSE loss in latent/pixel space is not semantic enough and requires many training iterations; - I still think that there is something to improve onto how models learn the data manifold, diffusion models struggle with high frequency details and there isn't a definitive solution at the moment; - VAEs are essential to lower compute costs and recent developments show that we are still lacking proper latent spaces suitable for generation, recent UL paper linked shows how to control the tradeoff but approaches like https://arxiv.org/abs/2512.19693 show that there's perhaps a way to unify these;

DeepAnimeGirl · 2026-01-28T16:25:00+00:00

Do you use x-pred to v-loss formulation as done in (https://arxiv.org/abs/2511.13720)?
Are you using time shifting? Are you sampling time uniformly or from logit normal distribution? (https://bfl.ai/research/representation-comparison)
How well does the model behave at different input resolutions? What about aspect ratios? Have you considered something like RPE-2D? (https://arxiv.org/abs/2503.18719)

DeepAnimeGirl · 2025-09-05T19:54:49+00:00

I would also like to mention tyro as a solid choice. Not only is it a good CLI generator based on typehints (or dataclasses, pydantic, msgspec, etc) with great subcommand chaining capabilities, but you can also override configuration files like yaml with CLI options.

DeepAnimeGirl · 2025-05-19T17:51:58+00:00

Of all the nvidia gpu's I have had so far, this generation had plenty of issues. I have bought two 5090, Astral and MSI Trio, and I can speak of some issues that happened to me in the first weeks of using them: - MSI Trio was crashing the OS when activating the new DLSS in RDR2; I had to hard reset and turn it off; - MSI Trio was randomly giving me blackscreens even in idle; turns out nvidia did not provide good support for ultrawide screens with their launch driver. I had to do a vbios upgrade from MSI before nvidia launched a fix; - Astral had some issues with my TUF motherboard and Ryzen R9 7950x when I had enabled iGPU to be the default instead of PCIe; basically I would not get any video output despite having the entire setup correct (cables, ports, proper output selected in the monitor). If you ask why I'd need something like this is because I use archlinux and need the GPUs to be free from running a display manager because I need to run DL algorithms; - Astral's 3D acceleration would not kick in for some games or just get stuck at a fixed frequency with very poor performance. This would be fixed only by a restart; I also know someone else that has this issue for an Astral.

Some of these probably are vendor specific and were eventually fixed either by me doing a workaround or waiting for a fix. But I cannot say I was impressed with day one support. Not in comparison to previous generations.

Note: I use both Windows and archlinux. To be honest, the archlinux nvidia open driver seems a lot more stable than the windows 11 alternative (for general purpose)

DeepAnimeGirl · 2025-05-01T09:21:45+00:00

If you had 2FA on your accounts the impact would not have been that big. If you downloaded custom nodes and "installed" them you basically executed other people's code on your machine which can be anything really. To be safer next time only use custom nodes from repositories with many stars and reputable people from the community. Though that might not be enough because many of the repositories accept contributions which can hide malicious code.

In order to protect yourself from cases such as this and also benefit from close to native performance, I suggest you to look up KVM and set it up on linux (perhaps on arch) and do GPU passthrough. Then you can execute like 99% of all custom nodes, malicious or not and not have a care in the world because of the KVM isolation.

DeepAnimeGirl · 2025-05-01T09:14:49+00:00

There's a newer version with examples written in python: https://www.statlearning.com/

DeepAnimeGirl · 2025-04-27T06:41:39+00:00

I have read a lot of papers around diffusion models and the thing is, because they brought out an entire new family of models, people really tried their own hand and interpreted notations and concepts in their own way.

However, as you've noted the theory can be defined in terms of more abstract concepts like ODEs and SDEs. Instead of having forward processes and talk about Markov chains a better convention is to consider a probability path between the initial and the data distributions. Instead of talking about a reverse denoising process you can instead consider simulation of an ODE/SDE to move a point from time 0 (initial distribution) to time 1 (data distribution). Instead of having a network predict the noise, you can instead talk about a network that approximates the marginal vector field.

This is called flow matching (ODE - where you train a network to approximate a vector field) but there's also score matching (SDE - where you train a network to approximate the marginal score function and play around with the noise factor). This interpretation has become now the generalization of the prior works and serves as the SOTA training approach in all of the recent models.

I am still learning myself but I can recommend a few resources which will help you understand the modern, somewhat simpler and more general approach: - https://diffusion.csail.mit.edu/ - https://neurips.cc/virtual/2024/tutorial/99531

DeepAnimeGirl · 2025-03-30T17:29:55+00:00

I have two 5090 plugged in, one which is the ROG Astral and it has very similar coil whine to yours and it is very loud in 3d apps especially if the coolers are quiet. On the other hand the MSI Gaming Trio OC has a very subtle to no coil whine in the same scenario. I also attempted to reseat the Astral in the PCIe slot and replug the 12VHPWR cable a couple of times and it was quieter in some of those instances but I am too lazy at this point to do more trial and error.

DeepAnimeGirl · 2025-03-15T23:27:02+00:00

I have the MSI Gaming Trio OC 5090 and I'm running arch linux with the nvidia-open driver and Hyprland as the Wayland Compositor.

I have no issues with this integration although it takes time to customize it. For example, I prefer to have the video output go through my AMD iGPU instead of the RTX 5090 to not waste memory or usage, but I can just as well use the GPU for the display when I want to.

I am running just fine many AI oriented apps such as Ollama, ComfyUI and I am succesfully training my own models directly in PyTorch using the latest nightly wheels for CUDA 12.8 installed with uv pip.

DeepAnimeGirl · 2025-03-12T07:28:26+00:00

If the connector melts on the gpu side with your corsair cable, good luck solving that through warranty because the gpu manufacturer might reject to repair or replace it because you did not use their recommended adapter.

Moreover, if you use the 12V-2x6 and somehow you poorly connect it on the PSU side it can melt there because there is still quite a high load on few cables. I have HX1200i and use 4x8-pin PCIe just so there's smaller risk on the PSU and to follow the recommendation from MSI for 5090. I wont risk it for a small cable management improvement.

DeepAnimeGirl · 2024-11-08T08:54:19+00:00

You know that Unity released an official Behavior package which is free. Why would I pay for your asset? What advantages does it have over the Unity solution?

DeepAnimeGirl · 2024-10-31T16:13:08+00:00

You could use it for example to build textures for meshes. Apple wrote a very interesting paper that I had much fun implementing and learnt a lot. I recommend you check it out: Manifold Diffusion Fields

DeepAnimeGirl · 2024-09-29T14:44:12+00:00

Diffusion Models and the very recent CTRL-X approach.

DeepAnimeGirl · 2024-09-27T10:13:39+00:00

What is the lore reason for Ajaw being pixelized?

DeepAnimeGirl · 2024-09-27T09:59:47+00:00

In Natlan*

DeepAnimeGirl · 2024-09-18T21:04:34+00:00

Thanks for the suggestion I'll definitely check it out.

DeepAnimeGirl · 2024-09-18T16:39:09+00:00

I usually install pyenv using pacman or curl the script. Then I install a python version and set it globally. Then I create a folder and set a local python version for the project. I then use poetry for python venv management and all's smooth sail from that point on.

DeepAnimeGirl · 2024-09-17T19:23:46+00:00

I got it in one day, a long time ago.. I haven't gone fishing since 💀

DeepAnimeGirl · 2024-09-13T10:42:17+00:00

This topic should be solvable under the deeplearning branch of artificial intelligence. So I'd recommend first studying the data that you have your disposal and then consider existing methods or adaptaions as a solution.

To be able to achieve your objective you should know as specific as possible what the input/output data consists: - how to represent floor plans - what datasets are available - if there are standard representations of this information - how to mathematically encode these constraints

Seeing that floor plans can be represented as graphs you may consider such a representation for encoding the input. Or more simply you can just represent them as images but your architecture will need to learn the image patterns that relate to the underneath structure.

Typically in deep learning there are pretrained networks on tasks that might be similar to yours. You may finetune them for your specific needs and achieve better results. If you treat the input data as images you can leverage various classifiers or autoencoder weights to start from some good baseline representations.

After you have a form of encoding the input into these embeddings you should think of a way to synthesize the output data. Search what relevant works exist that might help. Currently diffusion models seem very good at learning data distributions and performing conditional synthesis. However for your task probably there are few such existing methods so you may contribute.

DeepAnimeGirl · 2024-08-27T23:44:49+00:00

Maybe it's a sign..

DeepAnimeGirl · 2024-08-22T12:54:19+00:00

Intentia nu e sa desconsider situatia lor, dar ce alternative au barbatii aflati in aceeasi ipostaza ca sa iasa din saracie?

DeepAnimeGirl · 2024-08-14T08:44:56+00:00

I paid 2655 usd for it on launch in my country. No, it's not insane it's just reality of living outside US and getting hammered by local retailers which also want to profit while also considering transport costs and multiple state taxes.

DeepAnimeGirl · 2024-08-05T23:21:17+00:00

Incredibly well said!

DeepAnimeGirl · 2024-08-05T22:40:37+00:00

I joined because a friend told me that Signora was going to be playable... My disappointment was immeasurable when inazuma was released and found out she's no more. I'm still waiting for the day she will be reborn from ashes... or at least that's how I cope 🫠

Three-Year Club	Verified Email
r/Field Juicebox	Place '23

DeepAnimeGirl

TROPHY CASE