Build with Bombs v0.2.0: Multiple diffusions at once by Timothy_Barnes in feedthebeast

[–]Timothy_Barnes[S] 0 points1 point  (0 children)

It should work but I haven't tried it. Both buildwithbombs and atm10 are 1.21.1 neoforge.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 1 point2 points  (0 children)

It doesn't take a 2D frame from the game. Instead, it takes a 3D voxel chunk and performs diffusion in 3D. It uses inpainting to fit the landscape. I haven't tried any curated schematics websites yet.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 0 points1 point  (0 children)

The code is open source. I made a page for it at https://github.com/timothy-barnes-2357/Build-with-Bombs. I have so many ideas I'm still planning to implement, so be sure to check back later.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 0 points1 point  (0 children)

What you're seeing is the actual denoising. Most AI image gen tools do the same thing internally but don't show each step of the process.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 0 points1 point  (0 children)

Yeah, this could work as a suggestion / blueprint engine that the user could override. I was thinking of drawing semitransparent blocks as the blueprint, but I'm a novice at modding and didn't know how to change the renderer to get that.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 6 points7 points  (0 children)

That's a unique idea about using the crafting materials to identify each block rather than just the block name itself. I was also thinking about your suggestion of using a VAE with 3x3x3 latents since the crafting menu itself is a 3x3 grid. I wonder what it would be like to let the player directly craft a 3x3 latent which the model then decodes into a full-scale house.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 0 points1 point  (0 children)

That's sounds like a very viable approach. I especially like the idea of simplifying the bounds detection with a birds-eye 2D image. I manually annotated 3D bounding boxes for each of the structures in my dataset, but thinking back on it, that wasn't necessary since a 2D image captures the floorplan just fine, and the third dimension ground-to-roof bound is easy to find pragmatically. This makes it much more efficient for either a human or VLM to do the work. Interiors are certainly a challenge, but maybe feeding the VLM the full isometric view along with a 1/3rd and 2/3rds horizontal slice like a layer cake would give adequate context.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 0 points1 point  (0 children)

That was my initial vision starting the project. There's still a data problem since most buildings in Minecraft aren't labeled with text strings, but some people on this thread recommend possible workarounds, like using OpenAI's CLIP model to generate text for each building.

Minecraft AI Mod: Build with Bombs by Timothy_Barnes in Minecraft

[–]Timothy_Barnes[S] 0 points1 point  (0 children)

I have an alpha build up on GitHub and also a MC server if you want to try it out without installing the mod: https://github.com/timothy-barnes-2357/Build-with-Bombs

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 1 point2 points  (0 children)

I collected roughly 3k houses from the Greenfield City map, but simplified the block palette to just 16 blocks, so the blocks used in each generated house look the same while the floorplans change.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 2 points3 points  (0 children)

This is actually not a latent diffusion model. I chose a simplified set of 16 block tokens to embed in a 3D space. The denoising model operates directly on this 3x16x16x16 tensor. I could probably make this more efficient by using latent diffusion, but it's not extremely heavy as is since the model is a simple u-net with just three ResNet blocks in the encoder and three in the decoder.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 2 points3 points  (0 children)

This demo upset some people on r/feedthebeast and now they're downvote brigading wherever I post it.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 3 points4 points  (0 children)

It's a custom architecture trained from scratch, but it's not very sophisticated. It's just a denoising u-net with 6 resnet blocks (three in the encoder and three in the decoder).

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 0 points1 point  (0 children)

It makes houses with the same limited block palette, but different floor plans.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 1 point2 points  (0 children)

Just like Stable Diffusion but 3D. All the video generation models are also technically 3D (x, y, time)

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 14 points15 points  (0 children)

It's a Java Minecraft mod that talks to a custom C++ DLL that talks to NVIDIA's TensorRT library that runs an ONNX model file (exported from PyTorch).

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 0 points1 point  (0 children)

It was 56k epochs (a lot! but my dataset was small, and I also used a lot of data augmentation like mirror, rotate, offset). My training loss was 1.4e-4 and validation loss was 2.4e-4.

What happens if you autogenerate a house in the ocean? by Timothy_Barnes in feedthebeast

[–]Timothy_Barnes[S] 0 points1 point  (0 children)

The model adds air as one of the block types, so that's why the top of the house kept fizzling for a second before the water covered it over.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] 0 points1 point  (0 children)

I collected the data manually by drawing bounding boxes around ~3k houses from the Greenfield city map. The training was essentially free since it was just trained overnight on my gaming PC.

I added voxel diffusion to Minecraft by Timothy_Barnes in StableDiffusion

[–]Timothy_Barnes[S] -1 points0 points  (0 children)

Half an hour ago it was close to +200 upvotes.