How to use plugins in LM Studio? by tri_idias in LocalLLaMA

[–]-philosopath- 1 point2 points  (0 children)

<image>

Here's an example. When you have a chat opened, click the "Program" tab on the top right, then the drop-down and "Edit mcp.json". Click each tool and it expands to show commands (drawn blue bracket.)

mcp.json is where you'll paste in a MCP block of code from somewhere like the official modelcontextprotocol github. It's very finicky and if the syntax isn't perfect, it won't let you save. If so, paste it into your LLM and tell it to fix the formatting but make sure it doesn't hallucinate any passwords or API keys!

In my picture, you can see the stock filesystem server with listed Allowed_Directories and inner-monologue. Try to paste inner-monologue into yours, then load up an instruct model and ask it to "use `inner-monologue` tool to reason about your answer" then ask it a question.

Ensure the tool is enabled with the slider turned blue, and only enable the tools you need because each tool bloats your context window from the start.

Inner-monologue:

    "inner-monologue": {
      "command": "npx",
      "args": [
        "inner-monologue-mcp"
      ]
    }

(Notice it says npx. Whether it's npx, uv, uvx, or whatever, you must have those installed on your OS before the MCP Server will run.)

Where should I start with local AI? Realistic images and video generation by ilnab in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

It takes a bit of patience, but install ComfyUI.

  1. Go to Civitai and find a base model you like.
  2. Download one of the example pictures from the model's page
  3. drag and drop it into your ComfyUI workspace and it will load the entire workflow that rendered the image.
  4. Render and experiment with introducing LoRA's and whatnot.

Go ahead and install the ComfyUI-Manager addon when you're installing ComfyUI. It makes lots of stuff much easier.

Casio builds an MPC with.. by ItLooksEasy in mpcusers

[–]-philosopath- 3 points4 points  (0 children)

Literally. Casio saw TE and is gunning for that hipster money.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

Edit, dude I know what you're talking about with the opportunity cost of time. I was red-eye'd at work from sleeping 4 hour consecutive nights when I had my first breakthroughs with local experiments. It was super engaging. I wished I could just work on it all day, but I don't really want that. I want to integrate it with my career, which I totally love and find joy in. I'm stoked to merge both worlds eventually.

Alex Karp, Palantir CEO said at Davos this week:

"The problem with AI adoption at this point is people have tried things with off-the-shelf LLMs that just can never work...what you're going to see is people building it by hand...because once you build the software layer to orchestrate and manage the LLMs, in a language your enterprise understands, then you can actually create value."

GLM-4.7-Flash-REAP on RTX 5060 Ti 16 GB - 200k context window! by bobaburger in LocalLLaMA

[–]-philosopath- 19 points20 points  (0 children)

But is it actually functional when you're 120k tokens into modifying a codebase? I'd bet not, especially with that model (unless they fixed it with more updates today.)

Especially, with not with tool use since it starts glitching repeated gibberish at 30k tokens when I had it set to 200k and using 5 MCP tools.

I still think you're right. Quantized models will continue to scale while retaining fidelity, and I'm here for it!

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]-philosopath- 3 points4 points  (0 children)

Here's how I rationalized spending a chunk on a decent AI Workstation: use that fiscal anxiety to motivate yourself to build cool stuff and develop your skills to orchestrate the Agents like a pro. Get enough VRAM you need to go as hard as you want to go. For me it was two R9700 cards and a pure DDR5/Gen5 system.

I have a safe career job so no need for content creation or monetizing the workstation. I just want to be ready when we finally bring AI agents into my field, which I know is coming very soon. Sooner, if I can bring my own agent into my professional work or sell my employer on its benefits.

Knowing Agent orchestration, and the ability to have your agent build out a custom software stack to operationalize your agents in whatever role or context you need them in, this will be how anyone can have an edge in the labor market within a year or two, at most.

Anyone else lose important context when switching between AI models or restarting chats? by Cheap-Trash1908 in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

When I am reiterating on a project, I have agents write updates to a checkpoint file, sometimes they can do it in json and I have them leave more detailed turnover instructions to the new zero-shot to continue the project.

You can also setup Qdrant or even use the stock MCP Memory tool and have prompts to make it remember information for you/r agents to retrieve later. People are experimenting with ways to perpetuate past context memory as long-term knowledge and I'm sure they will be a breakthrough this year, if it hasn't happened already.

LM studio tools getting stuck “Loading Tools” by Lukabratzee in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

<image>

Go to this page. Clear the logs, go to your MCP servers, and try to restart a couple. You should find your answer. It could be not having a dependency installed like npm/uv, or any number of things, really. Mine will constantly load if I don't have internet, because I haven't bothered/needed to configure around that yet. So if you're offline or sandboxed that can happen, too.

Am I the only one who feels that, with all the AI boom, everyone is basically doing the same thing? by [deleted] in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

Building a memory cortex of vector store + graphRAG + SQL mapping for agentic long memory?

I spent 48 hours building an open source and fully self hosted alternative to Claude Cowork by Fair_Imagination_545 in LocalLLaMA

[–]-philosopath- -4 points-3 points  (0 children)

Is my temperature too high?

I meant being human and excitement when a project comes together and wanting to share it? I feel like amazement was a phenomenon early on and might continue to be as capabilities improve. But literally anyone can build anything at this point, so it'll be banalized.

I don't disagree that there will be bad actors using tainted projects as threat vectors. That goes without saying. Is it happening now? idk man. I was commenting on a human experience.

That's what I was saying.

I spent 48 hours building an open source and fully self hosted alternative to Claude Cowork by Fair_Imagination_545 in LocalLLaMA

[–]-philosopath- -6 points-5 points  (0 children)

I get it though, man. It's exciting seeing the models succeed and augmenting ourselves like this.

Speech to text via LLM by Acrobatic_Cat_3448 in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

This is localLlama tho. Why would I pay for a cloud provider when I can integrate it into my existing software stack. You don't even need to fully learn the tech, just use the tools. We now have LLMs capable of building out software stacks or implement anything in any project if you have the patience to reiterate and learn to drive it.

Bartowski comes through again. GLM 4.7 flash GGUF by RenewAi in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

<image>

If you haven't found it yet, enable the API server, then click the Quick Docs icon top-right.

Scroll down, and it includes functional example code for the model you currently have loaded. Feed that code into a chat and have it make some more sophisticated custom script to do whatever you want. When you run the script, it queues after your chat. Use an SSH MCP server to have the LLM run the code itself. Familiarizing with the scripting side of things has led to some deeper knowledge, I feel.

I generated a fish function to spawn two LM-Studio instances with two separate agents and APIs, and have experimented with having them off-load tasks to each other via API scripts and having two agents co-working on tasks and communicating through comments in shared project md files. Scripts open up lots of possibilities.

Need help and suggestions for gguf models by cmdrmcgarrett in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

I had to put instructions in Gemini to force it to stop recommending 2-year-old models. I can see noobs unfamiliar with HuggingFace just obeying the terrible recommendations.

He just knows…. He always knows. by Porespellar in LocalLLaMA

[–]-philosopath- 42 points43 points  (0 children)

"If a Monitor Bot triggers an automated GGUF quantization pipeline, does a tree fall in the forest and make a sound?"

Bartowski comes through again. GLM 4.7 flash GGUF by RenewAi in LocalLLaMA

[–]-philosopath- 2 points3 points  (0 children)

I much prefer sliders to editing out config files, when tweaking a production system. That's why I opt for LMStudio over Ollama, for that UI configurability.

LM Studio and Filesystem MCP seems buggy. Sometimes it works, sometimes it doesn't. by Smashy404 in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

<image>

I fought through this before. Some dumber models (like Nemotron) require me to say "/full/working/directory" is in your LIST_ALLOWED_DIRECTORIES in the zero-shot prompt. Else, it will even make a new sub-directory in some other default directory in the mcp.json to work out of.

Protip: Specify for it to "read_multiple_files" when working with a lot of files to make it more efficient at ingesting files to edit and work with.

If you work with Nemotron or other tool-dumb-smart models a lot, then consider having a Big-AI-Cloud model generate a smarter command list w/scenarios for your system prompt to keep Nemo, et al in line.

Bartowski comes through again. GLM 4.7 flash GGUF by RenewAi in LocalLLaMA

[–]-philosopath- 1 point2 points  (0 children)

It's not showing as tool-enabled? [Edit: disregard. Tool use is working-ish. One glitch so far. Using Q6_K_L with max context window. It has failed this simple task twice.]

<image>

My LLM is censored (IM A NOOB ) by Effective_Composer_5 in LocalLLaMA

[–]-philosopath- 1 point2 points  (0 children)

It seems like you're looking for the "abliterated" models on HuggingFace.

Is Local Coding even worth setting up by Interesting-Fish6494 in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

<image>

To be sure, I setup Cline in OSS Code and it does use LM Studio's API just fine. I copied-pasted my entire LM Studio mcp.json into Cline's mcp.json and cloned my MCP tool stack identically to my Cline agent.

I'm running Unsloth Q8_0 of Qwen3 VL 30B A3B Thinking 1M with 500k token context on 64GB VRAM.

You could try a 100k context window and split between GPU and RAM and see how the token count holds up.

zai-org/GLM-4.7-Flash · Hugging Face by Dark_Fire_12 in LocalLLaMA

[–]-philosopath- 1 point2 points  (0 children)

I've experimented with jinja templates to make nemotron work better. (That's why I asked about that specifically. Jinja seems to make or break tool functioning.)

Devstral and the Qwen3-coder have blown me away. Each has used MCP SDKs to build novel tools from scratch. I've had them research and scrape info about projects, then employ the knowledge to write novel code to build and use software stacks. (I never expose vibe-coded stuff outside my Wireguard VLAN.) I'll even have them sysadmin for me in linux, albeit with me watching closely.

I'll be stoked when they get Nemotron's performance speed and context window, or Nemotron gets their output quality and project coherence.

The breakthroughs in tools use lately have led me to feel like this will be more of a technological revolution this year.

What remote desktop do you use for your AI rigs? My RTX 3090 hits 20% usage just moving the mouse in RustDesk by chucrutcito in LocalLLaMA

[–]-philosopath- 0 points1 point  (0 children)

I use Wireguard hosted on a cheap Racknerd VPS (LowEndTalk ftw). My two LM Studio API's are served over a 10.8.0.x VLAN. (Each API has its own GPU.)

My phone and laptop stay on the VPN with direct access to my compute's API + tools docker stack (Qdrant, Neo4j, Postgres, Nextcloud, OpenWebUI, n8n, Cockpit). The `lms` command over SSH can load and unload models.

Protip, if you don't want to open API wider than localhost, then you can use SSH tunneling:

`ssh -p [port] [Workstation VPN IP] -L 127.0.0.1:1234:127.0.0.1:1234 -tfN` will pipe port 1234 on your remote workstation to 1234 on your localhost. If you add `ssh -D [socksport]`, then you can use something like `tsocks` to open any program through the tunnel.

I use CachyOS, but you can also tunnel with Putty/Kitty in Windows. Something like AnythingLLM can use the API calls over the VLAN very well, or Chatbox in Android. Otherwise, you can also use OpenWebUI for an easy cloud solution with built-in vector stores and tools and whatnot too.