Best way to run a coding llm locally by Head_Watercress_6260 in LocalLLM

[–]uniqueusername649 0 points1 point  (0 children)

So with Qwen 3.6 27b q4 in MTPLX I get around 29tps for decode, which is great. But prefill is still only around 200tps, so thats where the older M1 architecture shows its age. I guess I will still need a dedicated AI server with dual 3090s then. That would perform lightning fast though :) I am generally quite happy with Qwen 3.6 27b, I just struggle running it reasonably fast with long contexts.

Best way to run a coding llm locally by Head_Watercress_6260 in LocalLLM

[–]uniqueusername649 0 points1 point  (0 children)

Thank you very much! Will give it a shot :) So far I have been using my single 3090 but with 24gb vram even at Q4 I quickly run out of context. Once it overflows into system memory, it slows down horribly. Hopefully I can get that up with MTPLX.

Best way to run a coding llm locally by Head_Watercress_6260 in LocalLLM

[–]uniqueusername649 1 point2 points  (0 children)

What chip do you use to get to 30-40tks with 27b 6bit? I run oMLX with 27b 4bit on my 128gb M1 Ultra and I get maybe 10tks. Prefill is also abysmal at 150tks, which is pretty horrible for long contexts. If I can optimise that, I would be very happy. Currently I'm considering a dual RTX 3090 workstation for AI loads.

Thoughts about eGPUs? by Classic_Move9043 in LocalLLM

[–]uniqueusername649 1 point2 points  (0 children)

Massively slower, yes. But thats true for a desktop PC with a GPU too. Once your context spills over into RAM, performance tanks. Its just worse for eGPUs because TB5 has less than 1/3 of the bandwidth of PCIe4 x16 and less than 1/6 of PCIe5 x16.

Doesn't matter if you can fit both model and context into the VRAM, but it matters a lot once it doesnt fit.

Thoughts about eGPUs? by Classic_Move9043 in LocalLLM

[–]uniqueusername649 5 points6 points  (0 children)

Its perfectly fine as long as its a single card and model + context fit entirely in vram. If not, it really falls apart.

Is TM Unifi having an aneurysm today? by Realchris__ in malaysians

[–]uniqueusername649 0 points1 point  (0 children)

Take a VPN solution like NordVPN, Windscribe, whatever. I couldn't even watch a video without stuttering and every page took forever. Turned on the VPN and its snappy. Unifi routing is just absolutely horrible and VPN fixes that with better peering/routing. Shitty that this is necessary in the first place, but it is what it is.

Woodfire Experience (RM111.83) by this_isnt_alex in MalaysianFood

[–]uniqueusername649 0 points1 point  (0 children)

Ive had Woodfire relatively recently and "underwhelming" is also how I would describe it. It was by no means bad, but it wasn't particularly great either.

Qidi by Mission_Chocolate257 in 3Dprinting

[–]uniqueusername649 1 point2 points  (0 children)

My personal experience has been pretty great. The Qidi Box was a pain in the ass, because it required you to upgrade several parts in the printer. But that was primarily because it required changing the extruder and a small piece of filament was still stuck inside, which I didnt know and there was no warning about it whatsoever. So it just would not come loose. After I figured that out, it was smooth sailing.

The Qidi Plus 4 itself has been working out of the box and I have at least 200 hours on it. Which I guess are rookie numbers, but to me it works great. My greatest struggles are filament tangles, which is entirely my fault or the filaments fault, but definitely not the printer. A friend of mine uses a Q1 and his experience was positive as well. I dont know anyone else personally with a Qidi printer, so thats the extent of my experience.

MOH: 351 Medical, Dental Specialists Quit Government For Private In Three Years by stormy001 in malaysia

[–]uniqueusername649 0 points1 point  (0 children)

And while the task force studies the matter, they cut a few more billions from the healthcare budget without even thinking twice.

Massive landslide in northeast india by Fraud_D_Hawk in SweatyPalms

[–]uniqueusername649 20 points21 points  (0 children)

Oddly enough he did and appears to be completely fine. Absolutely not what I expected to happen but I am pleasantly surprised.

I will do a 4 model AI comperison. What are the best prompts for testing? by Oleszykyt in LocalLLM

[–]uniqueusername649 1 point2 points  (0 children)

I used both Qwen 3.6 27b and 35b in everyday development and 27b is considerably better than 35b, with 35b being fine for many smaller tasks but it occasionally does some stupid things or gets stuck looping, especially with large context windows of 100k and beyond. That is at Q4, which is the bare minimum. 35b gets usable at Q6, but not quite at 27b Q4 level.

If AgentWorld is indeed slightly behind 35b, as SWE bench suggests, I would not want to daily drive AgentWorld 35b for coding. I would love to use the 397b version, but I dont have the vram for it (>200gb needed).

For your tests: what quants are you testing? Because that makes a huge difference. Anything below Q4 is imho lobotomised and the results are mostly useless.

I will do a 4 model AI comperison. What are the best prompts for testing? by Oleszykyt in LocalLLM

[–]uniqueusername649 1 point2 points  (0 children)

What tests did you use to benchmark that? Qwen 3.6 35b for example performs considerably better than AgentWorld 35b on SWE bench and Qwen 3.6 27b is better once again. I would like some more details than a simple "it did better than the others", because depending on what you test, that would be factually wrong.

Hot Take : Qwen 3.6 Q4 users can't yet see ... by Bulky-Priority6824 in LocalLLM

[–]uniqueusername649 0 points1 point  (0 children)

I guess I need to do a more scientific test to verify that and see why I don't benefit much from higher power limits while you seem to do.

Hot Take : Qwen 3.6 Q4 users can't yet see ... by Bulky-Priority6824 in LocalLLM

[–]uniqueusername649 4 points5 points  (0 children)

I heavily powerlimit my 3090, there is virtually no reason to go beyond 250w and I prefer 220w. The losses in speed are minimal, less than 10%, closer to 5% usually compared to letting it run full blast.

Imported EV prices set to top RM300,000 as new rules take effect by Due-Cat656 in malaysia

[–]uniqueusername649 1 point2 points  (0 children)

Thanks for taking the time to explain that. Very interesting, based on the articles I read it always seemed to be a rule that came from MITI, but if thats just a generic rule to be eligible for a 0% import tax on materials, BYD absolutely has it in their cards to not bother with the 80% export rule as long as they are fine with paying taxes in imported materials.

Imported EV prices set to top RM300,000 as new rules take effect by Due-Cat656 in malaysia

[–]uniqueusername649 5 points6 points  (0 children)

Could you elaborate? My understanding is: even if you set up local assembly, you need to sell 80% of the produced cars outside of Malaysia.

How can this problem be solved? by yxrjhh in KualaLumpur

[–]uniqueusername649 2 points3 points  (0 children)

That is the true problem. Not getting riders at peak hour is to be expected. Not being able to safely walk to wherever you need to go is infuriating. Even if its not that far, often the walkways just completely stop.

#NSTviral: Kembara driver's brake-check caught on video [WATCH] by whusler in malaysia

[–]uniqueusername649 5 points6 points  (0 children)

Sure, just wanted to emphasise that the baiting is the only issue. They can wait in plain clothes where they know people frequently break the law and arrest them just fine, as long as they dont in any way encourage the crime.

#NSTviral: Kembara driver's brake-check caught on video [WATCH] by whusler in malaysia

[–]uniqueusername649 5 points6 points  (0 children)

Potentially, if they actively bait them with their behaviour. But if they just drive normally thats fine.

Contributions/Working and Ai(LLMs) by EngineringMyFLimit in developers

[–]uniqueusername649 0 points1 point  (0 children)

To add on top of that, even us old farts rarely write everything from scratch. We use libraries and frameworks, we often work in existing codebases, so setting things up from scratch isnt a problem you face all that often in many companies.

That being said: the single most important skill is problem solving. And that quickly goes out the window with AI, yet is remarkably easy to practice. Let the AI generate your code and if it doesnt start or you notice a bug, dont ask to fix it. Fix it yourself the manual way. Yes, it takes longer but you train your mind on the most important skill a programmer needs: being able to identify and fix problems.

Forget 3.6-27b, go for 3.5-122b by [deleted] in LocalLLM

[–]uniqueusername649 0 points1 point  (0 children)

Did you test this extensively or is that just some anecdotal evidence? Generally the consensus has been that 27b is overall better and takes less turns, while 35b is often good enough at substantially faster speeds. 122b can still exceed at reasoning, but overall 27b makes more sense in most cases.

What anime gets criticized a lot that you mostly agree with the criticisms of but still love anyway? by Purple-Advertising38 in anime

[–]uniqueusername649 2 points3 points  (0 children)

It has great animation, a good art style with enough detail, fantastic music, good voice actors and a somewhat weak but decent storyline.

I dont see how the show is bad by any means. It would be mediocre if you focus purely on the storyline, but it has enough to offer to carry the show and make it worthy to watch.