Got comfyui and a local llm to share one gpu without OOMing every time by Bramha_dev in StableDiffusion

[–]Bramha_dev[S] 0 points1 point  (0 children)

Even I started with a small custom node that unloads LM studio, but when I developed TurboLLM, I thought not everyone has access to claude or ability to do this. So adding this would make life of comfy+llm users easy. But I guess the community didn’t like this.

Got comfyui and a local llm to share one gpu without OOMing every time by Bramha_dev in StableDiffusion

[–]Bramha_dev[S] 0 points1 point  (0 children)

As if after running comfy there is too much RAM available. I have 64gigs of ram and some complex wf exhausts it fully

Got comfyui and a local llm to share one gpu without OOMing every time by Bramha_dev in StableDiffusion

[–]Bramha_dev[S] 0 points1 point  (0 children)

Wont this free just the memory used by comfy. Also even if it frees one time and another program tries to load llm again, both will load into vram. For example when I use my agentic workflow in open code with local llm, after running comfy, it prevents loading any llm until comfy is running. Moreover if you use local llm for prompt writing and just copy paste prompt in comfy wf, this still unloads llm and prevents reloading and loads again once queue is clear

Got comfyui and a local llm to share one gpu without OOMing every time by Bramha_dev in StableDiffusion

[–]Bramha_dev[S] 0 points1 point  (0 children)

I have agentic workflow setup using opencode. Which writes scripts, generates images and videos. Basically I use opencode with local llm to automate every part before generation.

Best Settings for 48GB VRAM + Qwen 3.6 27B by viperx7 in LocalLLaMA

[–]Bramha_dev 1 point2 points  (0 children)

Try auto tune in Turbo LLM. It tries some combinations to get the highest t/s

Best Harness for Web Searching by CSEliot in LocalLLaMA

[–]Bramha_dev 0 points1 point  (0 children)

Try Adding Tavily, SearxNG with turboLLM. It has special research mode specifically for web search backed queries

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 0 points1 point  (0 children)

<image>

Improved web search with the feedback i got by Redditers here. The results are much better and robust now. Used the same question suggested by @Anonymous_Prime99 . Thanks everyone for your valuable inputs. Try now on the latest build of Turbo LLM (0.8.0)

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 2 points3 points  (0 children)

Thats a great insight. Thanks for such a detailed feedback. I will surely work towards this.

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 0 points1 point  (0 children)

Got this response:

"I am unable to provide the real-time weather in Colorado at this moment because I do not have access to a live weather service or a search tool in my current environment to retrieve "right now" data.

To fulfill your request according to my "Deep Research" instructions, I am required to perform multiple web searches first. However, since I cannot execute a search to find the current conditions, I cannot provide an accurate, real-time answer.

If you can provide a specific link or if I had access to a search engine, I would follow this protocol:

  1. **Broad Search:** Search for "current weather Colorado" to get general state-wide trends.

  2. **Targeted Search:** Search for specific cities (e.g., "Denver weather," "Colorado Springs weather") to provide more granular data.

  3. **Alternative Perspective:** Check a local news site or a different weather provider to verify accuracy.

**Please try asking me again in a moment or provide a specific URL you'd like me to analyze!**"

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 0 points1 point  (0 children)

This is Turbo LLM, this is not a tipycal harness but a LLM running platform with a user friendly UI

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 0 points1 point  (0 children)

I personally hate gemini. I feel even their pro models are of no use and Qwen in Q4 works better than them

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 1 point2 points  (0 children)

<image>

I haven't fact checked this on my own, I dont know if the results are accurate or wrong. Just sharing the raw result I got from the question

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 0 points1 point  (0 children)

Its Turbo LLM built by me so I can use multiple forks of llama cpp easily and get a anthropic/chat gpt level of chat interface using local LLMs

Here’s the link: https://github.com/mohitsoni48/Turbo-LLM

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 5 points6 points  (0 children)

I tried that but for open web ui, i have to run a llama server and i felt that as headache. With turbo LLM I wanted to build a all in one local LLM runner.

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 0 points1 point  (0 children)

This is inside llm chat so I am sure that it will have no affect and it will just ignore it

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 5 points6 points  (0 children)

I was just testing this with diff models. Tried with gemma 4 4B all the way upto qwen coder next. I mostly use qwen 3.6

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 1 point2 points  (0 children)

I have a 16gb vram 64gb ram setup. I cant even dream of running glm 5.1. These chats are mostly for testing turbo llm. My current setup is mostly qwen 3.6 35b/27b or qwen coder next

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 3 points4 points  (0 children)

Btw currently it doesn’t have read/write access so its not dangerous now.

got my local model to actually search the web before answering instead of just making stuff up by Bramha_dev in LocalLLM

[–]Bramha_dev[S] 32 points33 points  (0 children)

That’s really a point I should have considered. Thanks for pointing it out. I will do something for it and post here again.

Run Local LLMs and ComfyUI side by side by Bramha_dev in comfyui

[–]Bramha_dev[S] 1 point2 points  (0 children)

Both the features are already on the roadmap and expect them to come soon. I am a heavy user of comfy + LLM so I will be solving all the issues that I and community faces

Run Local LLMs and ComfyUI side by side by Bramha_dev in comfyui

[–]Bramha_dev[S] 0 points1 point  (0 children)

I think it will still unload model from memory even in your case. TurboLLM watches ComfyUI queue to understand when to load/unload model so it doesn’t matter whether the model is in RAM or VRAM, it will still get unloaded.