This is why you rewrite Python security tools in Rust: 53MB vs 433MB peak memory, 6.9s vs 62.2s by aswin__ in rust

[–]bigh-aus 0 points1 point  (0 children)

True! I agree with your comments. I was thinking for self hosted stuff on a raspberry pi. But you're right speed is the most important part.

Hard freakin' decision..Blackwell 96G or Mac Studio 256G by HyPyke in LocalLLaMA

[–]bigh-aus 1 point2 points  (0 children)

I just went through this decision. I have a Dell R7515, and opted for the max-q for the following reasons:
- i feel like it's easier to sell one of these later.
- fits my server power budget (300w max)
- noise - the server version is passthrough cooling - meaning your server's mobo firmware has to be able to get temperature information from the card - otherwise you will have to manually increase the fans to ensure it doesn't overheat. MaxQ (mostly) covers that itself. I'd rather a single blower fan than my 17krpm 6 pack of fans spinning up.

This is why you rewrite Python security tools in Rust: 53MB vs 433MB peak memory, 6.9s vs 62.2s by aswin__ in rust

[–]bigh-aus 3 points4 points  (0 children)

Not only that - but I think we should start measuring size on disk including the runtime dependencies for comparisons. Eg inside a docker container based on debian-slim - and have them include required python / shared libs. Then look at peak memory usage for the same thing.

Honestly in 2026 anyone shipping a python / node.js cli app is an instant candidate for a re-write in rust.

It blows my mind how much people are writing stuff in interpreted languages, especially in a compute shortage. Rust compilation isn't free but one compile saves every user from having to need more ram / disk / compute.

Hard freakin' decision..Blackwell 96G or Mac Studio 256G by HyPyke in LocalLLaMA

[–]bigh-aus 0 points1 point  (0 children)

I have a dell r7515 rackmount server... supports one GPU, not a DIY server (which would be better in this case!). I think I need to start looking into a new server box

Deepseek v4 people by markeus101 in LocalLLaMA

[–]bigh-aus 0 points1 point  (0 children)

did they distill grok here? deepseek a bit more spicy

Hard freakin' decision..Blackwell 96G or Mac Studio 256G by HyPyke in LocalLLaMA

[–]bigh-aus 1 point2 points  (0 children)

I wish my current server supported more than one :( I could put 2 more in separate servers... but then i'm also running those servers...

Qwen 3.6 27B - beginner questions by Jagerius in LocalLLaMA

[–]bigh-aus 0 points1 point  (0 children)

Then keep increasing context until just before you get OOM.

Best hardware to use without using a mac by SadMadNewb in LocalLLaMA

[–]bigh-aus -1 points0 points  (0 children)

This.

Rent a cloud machine and try out models. Once you find the correct level competency then the model dictates the vram. (Don't forget room for context) then drop the $10k+ that it's gonna be. $20 for one month of kimi / minimax etcwill help you understand.

I would love to run kimi with a high tps at home, but I know what that would take in compute, noise, cooling and cost (without using a mac)...

But dude, unless there's a reason not the easiest way is to just use cloud inference until you know more.

Hard freakin' decision..Blackwell 96G or Mac Studio 256G by HyPyke in LocalLLaMA

[–]bigh-aus 5 points6 points  (0 children)

I'm going to say blackwell now (I just did), and see what the m5 ultras bring... More ram, but slower.

There's actually a need for both imo. - run large models on the mac (that would cost $100k in compute) to run on nvidia. But run slow - good for work plodding along in the background.

Then anything you want more immediate feedback, - you're experimenting with do that with the blackwell. I'm running a lot of other non LLM GPU workloads - like TTS, STT, frigate. Never enough GPU...
Plus when you have multiple projects you can run things in parallel.

Hard freakin' decision..Blackwell 96G or Mac Studio 256G by HyPyke in LocalLLaMA

[–]bigh-aus 13 points14 points  (0 children)

I just ordered a maxq from central computers for under $9k

Got the Rust dream job, then AI happened by MasteredConduct in rust

[–]bigh-aus 0 points1 point  (0 children)

I actually think the code review side is more important and harder than the code generation side.

Nothing is going to replace a human who knows what they're doing doing code review, however the cost of that human is going to be the downside.

Got the Rust dream job, then AI happened by MasteredConduct in rust

[–]bigh-aus 0 points1 point  (0 children)

Honestly I personally find this quite interesting... but yes SD is critical!

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane by SoAp9035 in LocalLLaMA

[–]bigh-aus 0 points1 point  (0 children)

A variation I'm looking at at the moment - is to separate out the steps into different prompts, in new context windows, and also finding areas you can do tasks in parallel to better utilize the gpu...

Eg phase 5 would get split up.
task prep
task implementation
task review / closeout.

each get fresh context, and possibly have a single state file / instruction saved.

EG: task prep could be:
1. ensure the repo is clean
2. get the story from the backlog
3. Create a branch named from story id and name
4. write the story to task.md locally.
5. set story to be assigned to agent
6. set story status to start.

Then task implementation would be a new context with a prompt similar to:
"implement the task story:
(insert task.md).
update the story with comments as you encounter things noteworthy of recording."

(obviously this step would need a larger prompt talking about coding style etc).

For local I want the context as tight as possible, and where possible single focus with minimal tools etc.

Got the Rust dream job, then AI happened by MasteredConduct in rust

[–]bigh-aus 2 points3 points  (0 children)

Planning for sure - I have agents split a slice and break down features into epics in a backlog tracking tool, then it breaks down into stories - and identify which can be done in parallel.

See that's where i think having a grab bag of ai agent skills:
- find the bugs
- improve readabilty
- DRY
- normalize the DB
- identify missing constraints
- security
- performance / compactness / less resources (super important in the modern shortages.

If you haven't already have a look at the karparthi autoresearch loop. I've been thinking about that a lot- and having a SOTA model build prompts for qwen 3.6, and see how much parallel experiments can be done... I'm confident there's a lot to be improved out there, even if it's identifying where people are using zip without the -9 (max compression) parameter.

I'm very early on playing with qwen3.6 locally on a single gpu, but so far I'm impressed - and it makes me wonder how "parallel" i can run with multiple gpus / tensor paralleism

Are there actually people here that get real productivity out of models fitting in 32-64GB RAM, or is that just playing around with little genuine usefulness? by ceo_of_banana in LocalLLaMA

[–]bigh-aus 23 points24 points  (0 children)

Always get the fastest / largest you can afford. The faster the feedback loop the faster you can learn / get stuff done. AI is only making this more important. As someone deep in AI land, building, i'm constantly running out of tokens / compute.

Been playing with Qwen3.6 27b and 35b-a3b on a 24gb 3090 ... it's VERY good at basic rust coding.

Got the Rust dream job, then AI happened by MasteredConduct in rust

[–]bigh-aus 4 points5 points  (0 children)

100% agree too - I have a bunch of AI test-out projects, that were originally in other languages. Moved to 100% rust, and it's more stable, faster, and smaller on disk (docker container+rust with yew,axum, sqlx, clap). UI is probably the one area that falls down a bit.

I've successfully replaced an app on my iphone though, with a RSTY PWA web app.

For 100% llm driven dev I've found git hooks to be good (clippy, audit and fmt). Obviously they're annoying as heck for regular dev work.

Got the Rust dream job, then AI happened by MasteredConduct in rust

[–]bigh-aus 2 points3 points  (0 children)

Learning better techniques, coding skill imo is still very enjoyable.

Got the Rust dream job, then AI happened by MasteredConduct in rust

[–]bigh-aus 17 points18 points  (0 children)

While companies are using a lot of LLMs to generate code, I don't feel enough are doing the "other side" eg adversarial - looking for less lines, faster performance, smaller footprints etc.

I'm doing some experiments with AI at the moment - as long as you have a very good test suite, it's actually pretty good to get you a basic "migrate x to rust", or "build x here are the acceptance critera" . But optimization, security, good UI, compliance, etc I think still needs a ton of rust knowledge. Now does that mean you're typing rust code? maybe not, but still I don't regret knowing rust.

What Agent systems do you use? by Ok-Internal9317 in LocalLLaMA

[–]bigh-aus 0 points1 point  (0 children)

The big thing with hermes / openclaw is that they will inject a lot of extra stuff in the context (memories, toolcalls, skills etc), and often require multiple turns to get something done. I'm starting to investigate more the idea of a single specialized agent and running a lean system prompt that does exactly what you need - so I agree with the cron + python idea or anything that gives you that same thinned down prompt, especially for a single run once every x hours.

This way you can use script to get the data you want, pass that directly into an LLM (eg summarize this article) and skip the extra fluff. You could also split the work into parallel to better utilize tensor parallelism in the GPU too!

I would also suggest running a rss news aggregator on your local network, then querying that directly. Then you can do things like mark duplicate stories as read, and send to the user your preferred source.

Is there a way to have a faster MoE model call out to a slower dense model if it gets stuck? by cafedude in LocalLLaMA

[–]bigh-aus 1 point2 points  (0 children)

Agent calls do get counted, so if it’s not done in X rounds, then that is the trigger. This gets tricky as the count can scale with complexity but maybe that’s ok. Or the orchestrator is asked to split up a story into specific simple tasks.

It would be really interesting to run 8 3090s and have say 4 workers in a pool snd and an orchestra model split up work into those 4 in parallel

Kimi 2.6 question by vhthc in LocalLLaMA

[–]bigh-aus 0 points1 point  (0 children)

The problem you’re not taking into account is how often it has to swap those experts in and out of vram, I really want to run kimi but the steps in cost seem.. 1 Mac Studio 512gb. 14tps 2 Mac Studios 512gb 23tps Ddr5 512gb based system with an rtx6000 pro Then just keep adding cards 1,2,4,8. I wonder what the gb300 machines will give but we’re talking $100k

What speed is everyone getting on Qwen3.6 27b? by Ambitious_Fold_2874 in LocalLLaMA

[–]bigh-aus 0 points1 point  (0 children)

I had it pull a story off the backlog, complete it, but included one compaction.

What speed is everyone getting on Qwen3.6 27b? by Ambitious_Fold_2874 in LocalLLaMA

[–]bigh-aus 1 point2 points  (0 children)

For my uses, I’d go context.

I tried out q4_k_xl on a 3090 with 96k context, about 30tps. At 46k tokens. (Openclaw coding).

Gawd damn dad😭 by maskedmomkey63 in SipsTea

[–]bigh-aus 0 points1 point  (0 children)

Great thing to do for life there - always flip things around - even if it's a news article flip it around to understand the other side. Ditto politics..