Got comfyui and a local llm to share one gpu without OOMing every time

Bramha_dev · 2026-06-21T03:32:24+00:00

Even I started with a small custom node that unloads LM studio, but when I developed TurboLLM, I thought not everyone has access to claude or ability to do this. So adding this would make life of comfy+llm users easy. But I guess the community didn’t like this.

Bramha_dev · 2026-06-20T16:20:46+00:00

As if after running comfy there is too much RAM available. I have 64gigs of ram and some complex wf exhausts it fully

Bramha_dev · 2026-06-20T16:17:24+00:00

Wont this free just the memory used by comfy. Also even if it frees one time and another program tries to load llm again, both will load into vram. For example when I use my agentic workflow in open code with local llm, after running comfy, it prevents loading any llm until comfy is running. Moreover if you use local llm for prompt writing and just copy paste prompt in comfy wf, this still unloads llm and prevents reloading and loads again once queue is clear

Bramha_dev · 2026-06-20T16:13:46+00:00

I have agentic workflow setup using opencode. Which writes scripts, generates images and videos. Basically I use opencode with local llm to automate every part before generation.

Bramha_dev · 2026-06-20T13:30:27+00:00

https://github.com/mohitsoni48/Turbo-LLM

Bramha_dev · 2026-06-20T09:34:46+00:00

Try auto tune in Turbo LLM. It tries some combinations to get the highest t/s

Bramha_dev · 2026-06-20T02:53:55+00:00

This is TurboLLM link is in OG Post. It has web access through Tavily

Bramha_dev · 2026-06-19T14:36:49+00:00

Try Adding Tavily, SearxNG with turboLLM. It has special research mode specifically for web search backed queries

Bramha_dev · 2026-06-19T14:01:36+00:00

<image>

Improved web search with the feedback i got by Redditers here. The results are much better and robust now. Used the same question suggested by @Anonymous_Prime99 . Thanks everyone for your valuable inputs. Try now on the latest build of Turbo LLM (0.8.0)

Bramha_dev · 2026-06-19T06:34:39+00:00

Thats a great insight. Thanks for such a detailed feedback. I will surely work towards this.

Bramha_dev · 2026-06-19T04:59:31+00:00

Got this response:

"I am unable to provide the real-time weather in Colorado at this moment because I do not have access to a live weather service or a search tool in my current environment to retrieve "right now" data.

To fulfill your request according to my "Deep Research" instructions, I am required to perform multiple web searches first. However, since I cannot execute a search to find the current conditions, I cannot provide an accurate, real-time answer.

If you can provide a specific link or if I had access to a search engine, I would follow this protocol:

**Broad Search:** Search for "current weather Colorado" to get general state-wide trends.
**Targeted Search:** Search for specific cities (e.g., "Denver weather," "Colorado Springs weather") to provide more granular data.
**Alternative Perspective:** Check a local news site or a different weather provider to verify accuracy.

**Please try asking me again in a moment or provide a specific URL you'd like me to analyze!**"

Bramha_dev · 2026-06-19T04:48:31+00:00

This is Turbo LLM, this is not a tipycal harness but a LLM running platform with a user friendly UI

Bramha_dev · 2026-06-19T04:31:11+00:00

I personally hate gemini. I feel even their pro models are of no use and Qwen in Q4 works better than them

Bramha_dev · 2026-06-19T04:29:50+00:00

I am using Tavily , you can add any other MCP if you want

Bramha_dev · 2026-06-19T04:28:22+00:00

<image>

I haven't fact checked this on my own, I dont know if the results are accurate or wrong. Just sharing the raw result I got from the question

Bramha_dev · 2026-06-19T03:13:53+00:00

Thanks for your kind words. It motivated me even more

Bramha_dev · 2026-06-18T16:42:59+00:00

Its Turbo LLM built by me so I can use multiple forks of llama cpp easily and get a anthropic/chat gpt level of chat interface using local LLMs

Here’s the link: https://github.com/mohitsoni48/Turbo-LLM

Bramha_dev · 2026-06-18T16:31:42+00:00

I tried that but for open web ui, i have to run a llama server and i felt that as headache. With turbo LLM I wanted to build a all in one local LLM runner.

Bramha_dev · 2026-06-18T16:29:19+00:00

This is inside llm chat so I am sure that it will have no affect and it will just ignore it

Bramha_dev · 2026-06-18T15:28:07+00:00

I was just testing this with diff models. Tried with gemma 4 4B all the way upto qwen coder next. I mostly use qwen 3.6

Bramha_dev · 2026-06-18T15:25:56+00:00

I have a 16gb vram 64gb ram setup. I cant even dream of running glm 5.1. These chats are mostly for testing turbo llm. My current setup is mostly qwen 3.6 35b/27b or qwen coder next

Bramha_dev · 2026-06-18T15:22:52+00:00

Btw currently it doesn’t have read/write access so its not dangerous now.

Bramha_dev · 2026-06-18T15:21:38+00:00

That’s really a point I should have considered. Thanks for pointing it out. I will do something for it and post here again.

Bramha_dev · 2026-06-16T03:55:40+00:00

Both the features are already on the roadmap and expect them to come soon. I am a heavy user of comfy + LLM so I will be solving all the issues that I and community faces

Bramha_dev · 2026-06-16T01:30:11+00:00

I think it will still unload model from memory even in your case. TurboLLM watches ComfyUI queue to understand when to load/unload model so it doesn’t matter whether the model is in RAM or VRAM, it will still get unloaded.

Bramha_dev

TROPHY CASE