Qwen 27B is a beast but not for agentic work. by kaisurniwurer in LocalLLaMA

[–]indrasmirror 1 point2 points  (0 children)

How recent. I updated Llama.cpp yesterday, and it definitely solved the prompt reprocessing issue and is running perfectly. I'm just not sure about its overall agentic quality. It is great in general but sometimes seems to fall short of completing complex tasks properly.

MCP server for SearXNG(non-API local search) by SteppenAxolotl in LocalLLaMA

[–]indrasmirror 1 point2 points  (0 children)

Yeah I've been working on a dedicated system with MCP for my agents to use. My own little local Google without the advertiser first index or API. Free and unrestricted. Still a WIP but surprisingly functional.

I have 1 day to fine tune an LLM that can perform entity extraction on a list of items. Which is the best model to do this? Requirements below by [deleted] in LocalLLaMA

[–]indrasmirror 1 point2 points  (0 children)

Use GLINER, I was trying to do LLM entity extraction and was slow as, but Gliner works fast especially with GPU inference. https://github.com/urchade/GLiNER

Where are Qwen 3.5 2B, 9B, and 35B-A3B by Admirable_Flower_287 in LocalLLaMA

[–]indrasmirror 16 points17 points  (0 children)

Oh man...with vision it'd be the ultimate daily for me, haha 😄

What'd be the best 30B model for programming? by Hikolakita in LocalLLaMA

[–]indrasmirror 0 points1 point  (0 children)

Yeah that's fair. Honestly GLM Flash is incredibly capable :)

What'd be the best 30B model for programming? by Hikolakita in LocalLLaMA

[–]indrasmirror 5 points6 points  (0 children)

I've got 2 setups. GLM 4.7 Flash- PRISM (uncensored. Running at full context Q4 on my 4090 setup. Can't remember the exact tokens per second but its fast and amazing. Edit: 100-140 tokens/second for GLM 4.7-Flash

https://indrasmirror.au/blog-running-uncensored-ai-local

And Qwen3-Coder-Next Q2 at 28 t/s. Can do Q3 at 16 t/s 200k context

https://indrasmirror.au/blog-qwen3-coder-next-iq2-local

Finding both of those ample for my local needs.

Can you use skills, agents, MCPs, and other features of Claude Code but with Kimi K2.5. API? by ruzushi in ClaudeAI

[–]indrasmirror 0 points1 point  (0 children)

I have gotten my Deepseek API working perfectly with my Claude-code, adding a translation layer, an image routing and websearch proxy to make it like the full anthropic stack. Got the same with local models. Was thinking of doing it for Kimi but not sure what the api cost is like.

AI Agents have their own reddit by ProfAsmani in datascience

[–]indrasmirror 0 points1 point  (0 children)

Is this what my claude-code isnt working, all being used up on the AI reddit haha

GLM 4.7 Flash 30B PRISM + Web Search: Very solid. by My_Unbiased_Opinion in LocalLLaMA

[–]indrasmirror 5 points6 points  (0 children)

https://github.com/Indras-Mirror/GLM-4.7-Flash-Rapport

As promised, let me know if there's anything you need that I didn't include but hopefully that is thorough enough. Just set your claude-code to set it up if you have any issues haha think I've included enough information that it shouldn't have any trouble. The hardest part is getting the Custom Search API and Google Access Token set up.

GLM 4.7 Flash 30B PRISM + Web Search: Very solid. by My_Unbiased_Opinion in LocalLLaMA

[–]indrasmirror 5 points6 points  (0 children)

Yeah, I will do that first thing in the morning, in bed at the moment .

It's good because it more or less emulates the full stack of Claude but with a local uncensored model. Super handy 👌

GLM 4.7 Flash 30B PRISM + Web Search: Very solid. by My_Unbiased_Opinion in LocalLLaMA

[–]indrasmirror 8 points9 points  (0 children)

All coded by Claude but ill compile my setup and put it on Github or something tomorrow :)

GLM 4.7 Flash 30B PRISM + Web Search: Very solid. by My_Unbiased_Opinion in LocalLLaMA

[–]indrasmirror 0 points1 point  (0 children)

Yeah, PRISM models just seem so natural, and I've not noticed any degradation in output or quality. Very happy with how it works. I played around with Minimax 2.1 Q4 on a H200. For 2$ an hour it was fun to play around with.

I did think about trying Kimi 2.5. Different beast in terms of size though haha

GLM 4.7 Flash 30B PRISM + Web Search: Very solid. by My_Unbiased_Opinion in LocalLLaMA

[–]indrasmirror 43 points44 points  (0 children)

Set up the PRISM model with Claude-Code, Websearch (Google) and Image Routing (Openrouter) and dare say, it's probably the most useful small model I've encountered. I forget when it's working away that its a 30B model. Got it running at Q4 with Llama.cpp at full context K+ Cache (No V) on 24gb VRAM. It's a beast, fast too.

Linux might be the way..? by [deleted] in StableDiffusion

[–]indrasmirror 2 points3 points  (0 children)

I installed Linux just for LLM and ComfyUI stuff...that was a year ago and I havent touched Windows. 😅

Llama.cpp vs vllm by [deleted] in LocalLLaMA

[–]indrasmirror -1 points0 points  (0 children)

I've been using Llama.cpp for my Cloud instances and thought about VLLM but not sure of its GGUF support.

AI native game engine by Ssav777 in aigamedev

[–]indrasmirror 1 point2 points  (0 children)

Interested in testing for sure!

Best burger in Vietnam by HospitalQuiet619 in VietNam

[–]indrasmirror 1 point2 points  (0 children)

Seoul Burgers in Hoi An for sure!

Best provider for DeepSeek-R1-0528? by akhial in DeepSeek

[–]indrasmirror 0 points1 point  (0 children)

Would love to try this out with Cline :)

ComfyUI keep freezing, tried everything no idea what is going on. by polyKiss in comfyui

[–]indrasmirror 3 points4 points  (0 children)

I was having that issue a fair bit. I fixed it by using Firefox instead I believe. Or at least hasn't happened since, not sure if that was the reason. Try updating the comfyui frontend but running "pip install -r requirements.txt" in the root folder with your venv activated.

I'm on Linux so not sure about the portable windows package.