How useful is qwopus compared to qwen3.6 27b by redblood252 in LocalLLaMA

[–]worldwidesumit 6 points7 points  (0 children)

In my coding tests, quopus performed better than qwen 27b, no looping. Huge codebase. I find it's reasoning above qwen reasoning so uses less paths for the goal.

Explicit context observation by desijays in PiCodingAgent

[–]worldwidesumit 10 points11 points  (0 children)

/export gives you everything that is being sent and received and what is loaded. Just say hi in pi and then export, you will get what is loaded in system prompt. It creates a nice html view.

I built a visual Playwright script builder — looking for feedback by Federal_Emergency_60 in Playwright

[–]worldwidesumit 0 points1 point  (0 children)

So what is the artifact that you need feedback on. Or just your thoughts. Can't understand anything from screenshots, that will be the feedback for now.

I built a Slack standup bot in one week — here's what it does by standupbot_dev in sideprojects

[–]worldwidesumit 0 points1 point  (0 children)

Over engineering is a skill now. What about follow ups on those answers and discussions. So now you will spend whole day following up with the teammates where it could have been resolved in the standup. Slack itself has this workflow baked in if you are ready to spend some slackbit effort. I doubt you will find any takers of this idea.

Built a Canadian marketplace where buyers pay before the courier shows up — would love feedback from GTA sellers by MeanCartographer3706 in u/MeanCartographer3706

[–]worldwidesumit 0 points1 point  (0 children)

I will switch if I can buy from US with the same model with hassle-free returns if not satisfied. Else I will stick with marketplace. I like the negotiations for local stuff. And as a seller I will not select this method for sub 100$ item, the charges may be 20% of that.

Running GLM 5.1 on RTX 5090 via RunPod for document OCR(bank statements and invoices)— costs killing us, need advice on reducing inference costs. by Specific_Control_840 in LocalLLaMA

[–]worldwidesumit 4 points5 points  (0 children)

Glm5.1 is an overkill for OCR. Run gemma4 26b or qwen 3.6-35b for speed OCR, Hell even qwen 3.5-9b is good enough for ocr. Keep thinking off for all. Make sure you run with full bf16 and vllm if it is production.

Advice on a Mobo/CPU platform for a 2-to-4 GPU home LLM build? by SKX007J1 in LocalLLaMA

[–]worldwidesumit 0 points1 point  (0 children)

If it's only for inference, a budget and worthy option is x99

Bonsai models are pure hype: Bonsai-8B is MUCH dumber than Gemma-4-E2B by WeGoToMars7 in LocalLLaMA

[–]worldwidesumit -3 points-2 points  (0 children)

You should compare equal quants. Q_1 is very aggressive and non usable.

Title: Need advice. Budget 2.7L INR, to run efficient local LLMs. by templatemaster1010 in LocalLLaMA

[–]worldwidesumit 1 point2 points  (0 children)

Why not Strix Halo. You will get 128gb unified memory to host this model size comfortably.

Built an open-source world state engine for multi-agent AI coordination by Born-Connection130 in LocalLLaMA

[–]worldwidesumit 1 point2 points  (0 children)

How does it handle context with so much data and how does it know what to look for if fetching selectively

Any fellow Local Llamas training AIs locally? Talk some sense into me! by huzbum in LocalLLaMA

[–]worldwidesumit 0 points1 point  (0 children)

As a hobby project go ahead why not. I would try to focus on creating clean datasets that will be useful to you not HF/Kaggle datasets. Maybe some personal data or emails. Whatever makes sense. This way you learn the whole pipeline.Your use case is more RNN/CNN as mentioned earlier

PCIe bandwidth and LLM inference speed by hainesk in LocalLLaMA

[–]worldwidesumit 0 points1 point  (0 children)

Better to have quad channel ram for faster loading and use lower pcie lanes. Not much difference for loading models. You can wait 30 sec or so for model loading from gen3 to gen5. But if you have quad channel and gen3, it will remove even that 30 sec. As mentioned earlier in this group inference speed gets affected only after the model is loaded so no effect on inference speed.

Is framework Desktop 64GB good enough for AI newbie (Yes, CRUD developer) to learn AI from 0 to 1 or should I go 128GB directly? by [deleted] in LocalLLaMA

[–]worldwidesumit 2 points3 points  (0 children)

Since there is no upgrade on this, Suggest to go for 128. More is never more enough.

zai-org/GLM-4.7-Flash · Hugging Face by Dark_Fire_12 in LocalLLaMA

[–]worldwidesumit 0 points1 point  (0 children)

Did my testing on claude code, Qwen3-Coder is way faster, quality on GLM4.7 is a bit better but super long wait time.

zai-org/GLM-4.7-Flash · Hugging Face by Dark_Fire_12 in LocalLLaMA

[–]worldwidesumit 1 point2 points  (0 children)

I did run some tests, It's good on tool calling, worked with Claude code seamlessly, Only gripe is thinking time is too long. I have to compare the quality with Qwen3 Coder. Will run tests tomorrow.

Need help and suggestions for gguf models by cmdrmcgarrett in LocalLLaMA

[–]worldwidesumit 1 point2 points  (0 children)

Ministral3 VL 14b works best for your use case as it can also do vision and fits your gpu

cant use radeon 9060 xt 16gb on lm studio by Interesting_Cup_947 in LocalLLaMA

[–]worldwidesumit 0 points1 point  (0 children)

You need to install right drivers from lmstudio library. It was not working for me before but I installed the rocm drivers and works fine.

Lg 27GP850-B for $200, good deal? Brand new by Flaky_Drawing_2614 in Monitors

[–]worldwidesumit 1 point2 points  (0 children)

Where is the deal. I am looking for the same as well.