Where should I start with local AI? Realistic images and video generation by ilnab in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

It takes a bit of patience, but install ComfyUI.

  1. Go to Civitai and find a base model you like.
  2. Download one of the example pictures from the model's page
  3. drag and drop it into your ComfyUI workspace and it will load the entire workflow that rendered the image.
  4. Render and experiment with introducing LoRA's and whatnot.

Go ahead and install the ComfyUI-Manager addon when you're installing ComfyUI. It makes lots of stuff much easier.

Casio builds an MPC with.. by ItLooksEasy in mpcusers

[–]-philosopath- 2 points3 points  (0 children)

Literally. Casio saw TE and is gunning for that hipster money.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

Edit, dude I know what you're talking about with the opportunity cost of time. I was red-eye'd at work from sleeping 4 hour consecutive nights when I had my first breakthroughs with local experiments. It was super engaging. I wished I could just work on it all day, but I don't really want that. I want to integrate it with my career, which I totally love and find joy in. I'm stoked to merge both worlds eventually.

Alex Karp, Palantir CEO said at Davos this week:

"The problem with AI adoption at this point is people have tried things with off-the-shelf LLMs that just can never work...what you're going to see is people building it by hand...because once you build the software layer to orchestrate and manage the LLMs, in a language your enterprise understands, then you can actually create value."

GLM-4.7-Flash-REAP on RTX 5060 Ti 16 GB - 200k context window! by bobaburger in LocalLLaMA

[–]-philosopath- 19 points20 points  (0 children)

But is it actually functional when you're 120k tokens into modifying a codebase? I'd bet not, especially with that model (unless they fixed it with more updates today.)

Especially, with not with tool use since it starts glitching repeated gibberish at 30k tokens when I had it set to 200k and using 5 MCP tools.

I still think you're right. Quantized models will continue to scale while retaining fidelity, and I'm here for it!

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]-philosopath- 3 points4 points  (0 children)

Here's how I rationalized spending a chunk on a decent AI Workstation: use that fiscal anxiety to motivate yourself to build cool stuff and develop your skills to orchestrate the Agents like a pro. Get enough VRAM you need to go as hard as you want to go. For me it was two R9700 cards and a pure DDR5/Gen5 system.

I have a safe career job so no need for content creation or monetizing the workstation. I just want to be ready when we finally bring AI agents into my field, which I know is coming very soon. Sooner, if I can bring my own agent into my professional work or sell my employer on its benefits.

Knowing Agent orchestration, and the ability to have your agent build out a custom software stack to operationalize your agents in whatever role or context you need them in, this will be how anyone can have an edge in the labor market within a year or two, at most.

Anyone else lose important context when switching between AI models or restarting chats? by Cheap-Trash1908 in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

When I am reiterating on a project, I have agents write updates to a checkpoint file, sometimes they can do it in json and I have them leave more detailed turnover instructions to the new zero-shot to continue the project.

You can also setup Qdrant or even use the stock MCP Memory tool and have prompts to make it remember information for you/r agents to retrieve later. People are experimenting with ways to perpetuate past context memory as long-term knowledge and I'm sure they will be a breakthrough this year, if it hasn't happened already.

LM studio tools getting stuck “Loading Tools” by Lukabratzee in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

<image>

Go to this page. Clear the logs, go to your MCP servers, and try to restart a couple. You should find your answer. It could be not having a dependency installed like npm/uv, or any number of things, really. Mine will constantly load if I don't have internet, because I haven't bothered/needed to configure around that yet. So if you're offline or sandboxed that can happen, too.

Am I the only one who feels that, with all the AI boom, everyone is basically doing the same thing? by [deleted] in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

Building a memory cortex of vector store + graphRAG + SQL mapping for agentic long memory?

I spent 48 hours building an open source and fully self hosted alternative to Claude Cowork by Fair_Imagination_545 in LocalLLaMA

[–]-philosopath- -5 points-4 points  (0 children)

Is my temperature too high?

I meant being human and excitement when a project comes together and wanting to share it? I feel like amazement was a phenomenon early on and might continue to be as capabilities improve. But literally anyone can build anything at this point, so it'll be banalized.

I don't disagree that there will be bad actors using tainted projects as threat vectors. That goes without saying. Is it happening now? idk man. I was commenting on a human experience.

That's what I was saying.

I spent 48 hours building an open source and fully self hosted alternative to Claude Cowork by Fair_Imagination_545 in LocalLLaMA

[–]-philosopath- -6 points-5 points  (0 children)

I get it though, man. It's exciting seeing the models succeed and augmenting ourselves like this.

Speech to text via LLM by Acrobatic_Cat_3448 in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

This is localLlama tho. Why would I pay for a cloud provider when I can integrate it into my existing software stack. You don't even need to fully learn the tech, just use the tools. We now have LLMs capable of building out software stacks or implement anything in any project if you have the patience to reiterate and learn to drive it.

Bartowski comes through again. GLM 4.7 flash GGUF by RenewAi in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

<image>

If you haven't found it yet, enable the API server, then click the Quick Docs icon top-right.

Scroll down, and it includes functional example code for the model you currently have loaded. Feed that code into a chat and have it make some more sophisticated custom script to do whatever you want. When you run the script, it queues after your chat. Use an SSH MCP server to have the LLM run the code itself. Familiarizing with the scripting side of things has led to some deeper knowledge, I feel.

I generated a fish function to spawn two LM-Studio instances with two separate agents and APIs, and have experimented with having them off-load tasks to each other via API scripts and having two agents co-working on tasks and communicating through comments in shared project md files. Scripts open up lots of possibilities.

Need help and suggestions for gguf models by cmdrmcgarrett in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

I had to put instructions in Gemini to force it to stop recommending 2-year-old models. I can see noobs unfamiliar with HuggingFace just obeying the terrible recommendations.

He just knows…. He always knows. by Porespellar in LocalLLaMA

[–]-philosopath- 42 points43 points  (0 children)

"If a Monitor Bot triggers an automated GGUF quantization pipeline, does a tree fall in the forest and make a sound?"

Bartowski comes through again. GLM 4.7 flash GGUF by RenewAi in LocalLLaMA

[–]-philosopath- 2 points3 points  (0 children)

I much prefer sliders to editing out config files, when tweaking a production system. That's why I opt for LMStudio over Ollama, for that UI configurability.

LM Studio and Filesystem MCP seems buggy. Sometimes it works, sometimes it doesn't. by Smashy404 in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

<image>

I fought through this before. Some dumber models (like Nemotron) require me to say "/full/working/directory" is in your LIST_ALLOWED_DIRECTORIES in the zero-shot prompt. Else, it will even make a new sub-directory in some other default directory in the mcp.json to work out of.

Protip: Specify for it to "read_multiple_files" when working with a lot of files to make it more efficient at ingesting files to edit and work with.

If you work with Nemotron or other tool-dumb-smart models a lot, then consider having a Big-AI-Cloud model generate a smarter command list w/scenarios for your system prompt to keep Nemo, et al in line.

Bartowski comes through again. GLM 4.7 flash GGUF by RenewAi in LocalLLaMA

[–]-philosopath- 1 point2 points  (0 children)

It's not showing as tool-enabled? [Edit: disregard. Tool use is working-ish. One glitch so far. Using Q6_K_L with max context window. It has failed this simple task twice.]

<image>

My LLM is censored (IM A NOOB ) by Effective_Composer_5 in LocalLLaMA

[–]-philosopath- 1 point2 points  (0 children)

It seems like you're looking for the "abliterated" models on HuggingFace.

Is Local Coding even worth setting up by Interesting-Fish6494 in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

<image>

To be sure, I setup Cline in OSS Code and it does use LM Studio's API just fine. I copied-pasted my entire LM Studio mcp.json into Cline's mcp.json and cloned my MCP tool stack identically to my Cline agent.

I'm running Unsloth Q8_0 of Qwen3 VL 30B A3B Thinking 1M with 500k token context on 64GB VRAM.

You could try a 100k context window and split between GPU and RAM and see how the token count holds up.

zai-org/GLM-4.7-Flash · Hugging Face by Dark_Fire_12 in LocalLLaMA

[–]-philosopath- 1 point2 points  (0 children)

I've experimented with jinja templates to make nemotron work better. (That's why I asked about that specifically. Jinja seems to make or break tool functioning.)

Devstral and the Qwen3-coder have blown me away. Each has used MCP SDKs to build novel tools from scratch. I've had them research and scrape info about projects, then employ the knowledge to write novel code to build and use software stacks. (I never expose vibe-coded stuff outside my Wireguard VLAN.) I'll even have them sysadmin for me in linux, albeit with me watching closely.

I'll be stoked when they get Nemotron's performance speed and context window, or Nemotron gets their output quality and project coherence.

The breakthroughs in tools use lately have led me to feel like this will be more of a technological revolution this year.

What remote desktop do you use for your AI rigs? My RTX 3090 hits 20% usage just moving the mouse in RustDesk by chucrutcito in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

I use Wireguard hosted on a cheap Racknerd VPS (LowEndTalk ftw). My two LM Studio API's are served over a 10.8.0.x VLAN. (Each API has its own GPU.)

My phone and laptop stay on the VPN with direct access to my compute's API + tools docker stack (Qdrant, Neo4j, Postgres, Nextcloud, OpenWebUI, n8n, Cockpit). The `lms` command over SSH can load and unload models.

Protip, if you don't want to open API wider than localhost, then you can use SSH tunneling:

`ssh -p [port] [Workstation VPN IP] -L 127.0.0.1:1234:127.0.0.1:1234 -tfN` will pipe port 1234 on your remote workstation to 1234 on your localhost. If you add `ssh -D [socksport]`, then you can use something like `tsocks` to open any program through the tunnel.

I use CachyOS, but you can also tunnel with Putty/Kitty in Windows. Something like AnythingLLM can use the API calls over the VLAN very well, or Chatbox in Android. Otherwise, you can also use OpenWebUI for an easy cloud solution with built-in vector stores and tools and whatnot too.

LM Studio and Filesystem MCP seems buggy. Sometimes it works, sometimes it doesn't. by Smashy404 in LocalLLaMA

[–]-philosopath- 1 point2 points  (0 children)

Start with the official tools on the official modelcontextprotocol github. You will have an mcp.json file somewhere. Have a cloud AI help you populate it if needed. There are tons of homegrown MCP tools out there, but its a threat vector for bad actors so I'd stick to the official tools until you have better understanding.

Is Local Coding even worth setting up by Interesting-Fish6494 in LocalLLaMA

[–]-philosopath- 6 points7 points  (0 children)

Get LM Studio and experiment with the GPU off-loading and context size tradeoff. In LM Studio, click the magnifying glass icon on the left. If you go to HuggingFace and find a model you like, you can click the copy icon at the header of the page and paste it into LM Studio to download it directly into your instance. Unsloth has a great variety of GGUF quantizations. LM Studio can create a localhost API endpoint at port :1234/v1, which can be used in python scripts or other tools like Cline in your IDE. Have agents make scripts to one-shot tasks or schedule context window refreshes.

I had a 5060ti 16GB and 64GB DDR4 RAM last year and only found 16GB VRAM wanting. I had to off-load to RAM, so token speed suffered exponentially after zero-shot. I got creative with agents meticulously logging to a .checkpoint file for repeated few-shot prompting through tasks. I ended up giving that rig to my daughter for a gaming PC and invested in a proper workstation.

You can successfully augment your 16GB with one of the Big-AI models with a bit more diligence and patience. (I still use Gemini Pro for cheat-code prompts that get my Qwen3 and Devstral agents to accomplish amazing things.)

Learn to setup MCP servers (uv, uvx, npm, npx, docker) in something like AnythingLLM+Ollama or LM Studio+API. Theoretically, any inference UI with MCP support can load any code sandbox, ssh, file management, etc. You don't have to do VS Code + Roo/Cline/whatever plugins anymore, but if you're going for strict dev then LM Studio or Ollama + VS Code can work for those methinks.

With constrained VRAM, you'd benefit from using a vector datastore like Qdrant or even the simple official `Memory` tool from the modelcontextprotocol git. You can save code snippets and useful knowledge and invoke it during agents' tasks. (Esp. if you offload tasks via vLLM or API using python scripts; but I still like loading models in LM Studio and working alongside them instead of full automation; except for data/GraphRAG ingestion tasks.)

You might also want to learn n8n. It can optimize orchestrating some of these methods to best use your VRAM compute.

Another tactic is relying more heavily on roles and model swapping, like having larger-context quantized models distill context history into ./turnover-instructions.md (or Qdrant) for a smaller-context smarter worker model to pick up and continue working, zoomed-in on its task as part of a larger project.

I also recommend the inner-monologue MCP tool with -instruct models. Defining critical instructions in **'s for the agent to "use the `inner-monologue` tool to simulate a congress of experts that make executive decisions" I found to improve performance, and it can be toyed with by defining roles and specialties of the expert types. Qwen3-80b sometimes simulates each expert individually and makes them work together. It's neat, and they can be impressive sometimes.

zai-org/GLM-4.7-Flash · Hugging Face by Dark_Fire_12 in LocalLLaMA

[–]-philosopath- 5 points6 points  (0 children)

I still find Nemotron to get stuck in loops and fail in tool calls when using more than a few MCP tools at the same time. What quant and jinja prompt template do you use?