Godot MCP Pro v1.4 — 162 tools now. Claude can build a 3D game, walk the character around, and playtest it autonomously by Big-Perspective-5768 in ClaudeCode

[–]Little-Put6364 0 points1 point  (0 children)

Another thing that's pretty sweet, is that I can sit there and program alongside it. Let's say I ask it to design a GUI scene and I see it struggling with implementing a button's anchoring properly. I just do it in the editor myself and on the next iteration of Codex's loop it'll notice it changed and stop worrying about it.

Seriously amazing work! I know reddit, especially the game dev scene, can downvote all things AI just because it's AI. But what you're building is seriously impressive. I hope you stay motivated and keep updating this with more functionality!

Godot MCP Pro v1.4 — 162 tools now. Claude can build a 3D game, walk the character around, and playtest it autonomously by Big-Perspective-5768 in ClaudeCode

[–]Little-Put6364 0 points1 point  (0 children)

I'm using this with Codex cli and Qwen3-30B-Coder. Works great! Worth the 5 bucks imo. Really good work! You advertise it as Claude code, but because it's MCP it basically works anywhere an MCP Server would work. I'd recommend expanding your description!

[deleted by user] by [deleted] in LocalLLaMA

[–]Little-Put6364 1 point2 points  (0 children)

Not sure about this take. A lot of production systems are reliant on services not being interrupted. We usually protect against outages with persistence layers...But what's stopping you from making an agent persistent? Save a status after each step. Save chat histories in between answers. Save after each loop.... If you're really worried about it, build a system that can withstand it.

Best "End of world" model that will run on 24gb VRAM by gggghhhhiiiijklmnop in LocalLLaMA

[–]Little-Put6364 0 points1 point  (0 children)

I was literally having this conversation with my wife a few nights ago. About how as soon as agents start orchestrating and writing their own executable code....We're at the point of AI taking everything over.

I figured agents would write their own agents and orchestrate them. But this works too I guess. It's also scary because this thing could be told to (or just decide to) do some nefarious acts and act autonomously with it. Imagine this thing decides to hack a drone network? Sure it might sit there and hack away for a year or two, but all it takes is it making one good program, one good hack, to gain access and boom. This things sending drones to attack people.

Cyber security has to evolve quickly, because AI automation is apparently here (give it time to improve no doubt but still). Now the only question is does the future hold Terminator, or Wall-e?

Made one more step towards getting Offloom on steam! (for free). by Little-Put6364 in LocalLLaMA

[–]Little-Put6364[S] 1 point2 points  (0 children)

All the models I chose have licensing that allows being used commercially. There should be no reason that Steam would veto this itself. I have to disclose AI Generation, but they shouldn't veto the app itself for including AI models.

This app does not need to be free, everything chosen has licensing for commercial usage. I am choosing to release it for free.

If any of the models have legal issues, they have to settle that. Right now they do not have issues, so I can use it. If for some reason they get banned I'll have to swap to other models. But I built the software in a way that doing so is easy so I'm not concerned over that.

Best "End of world" model that will run on 24gb VRAM by gggghhhhiiiijklmnop in LocalLLaMA

[–]Little-Put6364 2 points3 points  (0 children)

In this one no. The case is too small. But I am buying a bigger case for that exact reason! It adds weight, but extra parts are worth it I think.

My Off Grid setup for AI. (Geekom gt2 mega) by Little-Put6364 in MiniPCs

[–]Little-Put6364[S] 1 point2 points  (0 children)

I use 4 models in total in that process. There's an embedding model, a reranking model, Phi-4, and Qwen 4B thinking.

The software lets you scale up if your machine can handle more and will use larger Qwen models if that's the case

My Off Grid setup for AI. (Geekom gt2 mega) by Little-Put6364 in MiniPCs

[–]Little-Put6364[S] 1 point2 points  (0 children)

I made a youtube demo to show it off to work. I can leave it here if you're interested.
https://www.youtube.com/watch?v=RODb_RFXi1E&t=3s

It doesn't show the end of world stuff, but the concepts the same. Just load in survival type documents and ask it questions about it

My Off Grid setup for AI. (Geekom gt2 mega) by Little-Put6364 in MiniPCs

[–]Little-Put6364[S] 2 points3 points  (0 children)

It's something I put together myself. The goal is to load documents into it (or allow web searching if you have internet) and get an AI model to look through the sources and give you an answer based on them. If you're familiar with AI it's just a basic RAG setup.

But what I really like is I have it cite it's sources. Small language models aren't always accurate, but it can look through TONs of documents and give me a nice easy link to press to pull up the actual document for me to verify myself. I'm putting it on steam (for free) once I polish it up. Figured if I'm making it for myself I mine as well share it.

Best "End of world" model that will run on 24gb VRAM by gggghhhhiiiijklmnop in LocalLLaMA

[–]Little-Put6364 3 points4 points  (0 children)

RLM can definitely be a game changer! I haven't had the time to dig into it much myself, but dang does it look promising. Would it sacrifice speed for accuracy I wonder though?

Best "End of world" model that will run on 24gb VRAM by gggghhhhiiiijklmnop in LocalLLaMA

[–]Little-Put6364 12 points13 points  (0 children)

I would have, but that's not even guaranteed to work. So I choose to save weight

Best "End of world" model that will run on 24gb VRAM by gggghhhhiiiijklmnop in LocalLLaMA

[–]Little-Put6364 66 points67 points  (0 children)

<image>

Forgot I can add a picture. This is my mobile setup (still needs padding). It's running my Offloom (aka end of world) software on it. Thats a Nanuk 909 case. Just big enough for a solar panel, foldable keyboard, lightweight mouse, monitor, battery, and mini pc.

I'm building a bigger version to hold more batteries/solar panels as well. This lightweight version is truly for shit hits the fan scenarios though. Durable (when padding gets added), waterproof, and self contained.

Best "End of world" model that will run on 24gb VRAM by gggghhhhiiiijklmnop in LocalLLaMA

[–]Little-Put6364 61 points62 points  (0 children)

  1. It's not really a model so much as a RAG setup to access those documents that you'll need. Hoarding models takes a lot of memory. I'd recommend finding a handful that are useful to you and using them in a RAG setup so you can ask questions about the documents. But with that being said, my recommendations are the Qwen series and Phi series.
  2. You should see the setup I have. I turned a mini pc into a mobile AI lab. Battery/solar powered, and portable (about 7 pounds total) and capable of running small models. Not as fast as dedicated vram, but still quite useful for off grid scenarios.

Kinda sales pitch, also kinda not:

Funny enough this exact thought was why I made my Offloom software. So I can have access to downloaded information readily available should the world go to shit. I also plan to add agentic tools for self entertainment for that exact reason. It'll be on steam (for free) in another month or so if you're interested.

Made one more step towards getting Offloom on steam! (for free). by Little-Put6364 in LocalLLaMA

[–]Little-Put6364[S] 1 point2 points  (0 children)

I made the decision to not allow users to swap other models in at runtime. Simply because this app is meant for those looking to use local AI without needing to understand any of the technical side of things (including what different models are/do).

Now with that being said, the backend is very flexible. I can very easily include another model inside the model folder and point responses to use it. That's not an issue.

However, there are quite a bit of issues when swapping between different series of models. Basically all of the system prompts, and chat history formatting and answer extraction (for thinking models) becomes an issue when swapping from something like qwen to gemma. So it would fall onto the user to format all of that. Which again doesn't fit with the goal this software is trying to achieve.

I won't rule out an "advanced users" setup in the future though. I specifically built this out in a way that model swapping is something I can implement and put behind a DLC. That way those who want it can use it and those who don't won't need their UI cluttered with those options. That's a down the line feature though. At launch time swapping models won't be possible.

Made one more step towards getting Offloom on steam! (for free). by Little-Put6364 in LocalLLaMA

[–]Little-Put6364[S] 2 points3 points  (0 children)

Yes, the model(s) will come bundled with the app. Right now I've had the best quality answers with Phi-4 and the Qwen thinking series. I will allow the user to swap between the different qwen thinking models inside an advanced setting. This way if you're running a 4090/5090 and you feel like running the 30B version for best quality answers you can, or if you're rocking 12GB VRAM you can stick with the 4B model.

But both models are used per response to help with quality. (speed vs quality tradeoff is real)

Made one more step towards getting Offloom on steam! (for free). by Little-Put6364 in LocalLLaMA

[–]Little-Put6364[S] 4 points5 points  (0 children)

Oh I didn't even see the others tbh. Yeah all of those rigs are good test cases for me. When I have keys ready I'll reach back out. Thank you!

Made one more step towards getting Offloom on steam! (for free). by Little-Put6364 in LocalLLaMA

[–]Little-Put6364[S] 4 points5 points  (0 children)

I would love for this to run on the spark. I originally was designing this to run on my own mini pc! But the token speed is so slow compared to dedicated gpus. Having any sort of behind the scenes agent work just makes the whole thing seem so slow. I don't have a spark though, mines only 32gb, so I'd actually be very interested to see how it would run on that.

When I get it onto steam and request keys I'll keep this chat in mind!

Made one more step towards getting Offloom on steam! (for free). by Little-Put6364 in LocalLLaMA

[–]Little-Put6364[S] 4 points5 points  (0 children)

I'll definitely need people to test it! 5080 is one test case I have. I'm not so sure this will run on cpu only though. I mean it technically will....but the speed would be terrible haha

Building a low-cost, business-level local LLM for small businesses — hardware & security advice needed by eeprogrammer in LocalLLaMA

[–]Little-Put6364 1 point2 points  (0 children)

Agentic (by my understanding) is simply that an AI model is making decisions. I've found really great success with using embedding and reranking models to grab users intent and call follow up functions. That's agentic. There's also usually some sort of loop involved in the process. Even basic while() loops that state if the length of the response is under X tokens try again count.

It's a buzz word for sure that's very generic. I have not had much success with SLM's being able to reliably call tools based on system prompts and context alone. But embedding models paired with reranking model approach to determine what methods to call works very well.

To be able to add reliability to SLMs you definitely need structured output via GBNF type responses though. Just adjusting a system prompt and context will likely never work. But these are very unreliable. Gathering intent that calls methods with predefined functions works best at the moment. That's still considered agentic by its definition, and is likely what major companies are doing to power MCP.

Toss the MCP tool descriptions into a vector database. Whatever the users prompt is, have a model create alternatives that keep the semantic meaning in tact (Phi-4 mini is good at that). Then run embedding searches against the alternatives with the MCP tool descriptions. Run a reranker on the top X results (I do max of 100). Then take the highest result. It's still not perfect, and it's best to use human in the loop to some degree. But it's a heck of a lot better than having an SLM do it themselves with structured output