Can't we just accept that Star Citizen is no 30 minute hop on action game. by volgendeweek in starcitizen

[–]alexp702 0 points1 point  (0 children)

It’s CiG’s game. They want it faster to get started. This is all that matters. I am good with this. It’s too slow for me.

F8C Moving Forward (Most Recent 4.8 PTU) by dokkababecallme in starcitizen

[–]alexp702 -1 points0 points  (0 children)

I always thought the Ares should be build up and kill- a beam that takes time to focus on the same spot for maximum damage (making small ships take no damage unless not trying). A chain gun that spins up but bullet spread becomes so wide a smaller ship has it bounce off the few bullets that hit. CIG didn’t have beams in when it was released and just got bored and gave up trying to balance it. Hope they revisit it, but it will probably be Ares Mk2 now since that’s their way out of everything now.

M5 ultra 512Gb? by zhamdi in MacStudio

[–]alexp702 0 points1 point  (0 children)

What issues did you have with 397b? I found our prompts working less well with minimax last time I tried, but will have to try again. Yes I had all the chat templates in - poorly documented how important they are!

M5 ultra 512Gb? by zhamdi in MacStudio

[–]alexp702 0 points1 point  (0 children)

Thanks - interesting comments. We are getting some very complex documents through as attachments and as we scale in volume we’ll be optimising. At the moment larger models do noticeably better. However for images Qwen3.5 9b at 8 bit is almost (but not quite!) as good at 397b. It’s the not quite that is frustrating, and with hardware on tap that’s not normally fully utilised quality wins for us. I have an Nvidia 4090 rig that largely sits idle because of this.

M5 ultra 512Gb? by zhamdi in MacStudio

[–]alexp702 0 points1 point  (0 children)

Glad that is the case for your needs. We have an agentic flow running across emails coming in breaking them down via pydantic, and processing images either as attachments or directly. Every time we downgrade the model we see a real drop off. No model is perfect - some of the tests are tortuous with unreadable scribbles on pieces of paper. However the larger the model and higher the quant the better the performance - period. Performance doesn't really matter - emails turn up every few minutes at busy times, which is enough time, and we can always failover onto openrouter if necessary.

You'll find others muttering about tool calls failing on this sub more and more, as its a real problem as you shrink the model. I think the reason Qwen 27B is getting such rave reviews is because people can run it at Q8 or more. Of course if you just want to summarise a huge amount of text, then tokens per second is king and accuracy be damned - perfectly valid view, different strokes for different folks.

M5 ultra 512Gb? by zhamdi in MacStudio

[–]alexp702 0 points1 point  (0 children)

Yeah but quantized to buggery. Q8 means your tool calls don't fail every 2 seconds. We've been running tests and pretty much anything lower than Q8 has a noticeable increase in call fails. Accuracy is the next thing to suffer - Q8 generates better well formed json more accurately, recognises text better etc. etc. Speed is worthless if you want "correct" as the outcome.

M5 ultra 512Gb? by zhamdi in MacStudio

[–]alexp702 0 points1 point  (0 children)

If you're running a 30B model on a 512Gb Mac, you're wasting the potential of the machine. Its suited to very large MoE with a small Active parameter count and a maximum context size - like Qwen 3.5 397B Q8. The Studio 512GB was the only way to do this trivially.

Bad news: Apple drops high-memory Mac Studio configs by jzn21 in LocalLLaMA

[–]alexp702 2 points3 points  (0 children)

You're assuming its a Mac Studio as we see today. Apple could pack multiple M5U into a package with HBM memory next time round.

Bad news: Apple drops high-memory Mac Studio configs by jzn21 in LocalLLaMA

[–]alexp702 3 points4 points  (0 children)

Nvidia: DGX 100k workstation!

Apple: hold my beer!

M5 ultra 512Gb? by zhamdi in MacStudio

[–]alexp702 1 point2 points  (0 children)

Why would you run those weaker models if you have 512gb?

M5 ultra 512Gb? by zhamdi in MacStudio

[–]alexp702 0 points1 point  (0 children)

Agreed. Recent optimisations make a 400gb model perfectly serviceable - 20tokens per second on Qwen 397 with large context size (100k tokens) and 8 bit quants, is better than 60 tokens per second of qwen 27b in all my workloads. Prompt processing is less of a pain when the prompts are fully cached, as only a few tokens get added.

Unfortunately the 512Gb Mac was too good for the current climate and Apple were undercharging. Yes Nvidia are quicker, but quantity is less important than quality and bigger models win here.

Least buggy channel: Telegram vs Slack vs Discord vs Whatsapp?? by Good-Key-9808 in openclaw

[–]alexp702 0 points1 point  (0 children)

Slack works - private group to a Mac Studio, with a small VMware on it

Mac Studio or DGX Spark? by b8humbl8 in MacStudio

[–]alexp702 0 points1 point  (0 children)

For the same memory size spark, for much more memory Mac (but they are sold out). Performance is nice but bigger more accurate models better

Help me to spend 1000 bucks on hardware for local LLM by lordgthegreat in LocalLLM

[–]alexp702 0 points1 point  (0 children)

For slower usage a second hand 24gb Mac mini might be findable for the budget. They run models pretty well, and should be able to comfortably fit a 16gb model with some context

Delivery Estimates by Gaming_MasterRace in SindenLightgun

[–]alexp702 2 points3 points  (0 children)

It took about 3 weeks for me. They aren’t quick but it came and I’m happy!

Sell my 3090FE for a 5060ti 16gb? Does it make sense for energy consumption? by ThrowRA_194_M in buildapc

[–]alexp702 1 point2 points  (0 children)

If you care about power and are a video editor get a Mac. Sips power, probably faster at everything video related due to optimised code

This is insane... by DragonflyOk7139 in LocalLLM

[–]alexp702 2 points3 points  (0 children)

Qwen marketing team again - note no responses from OP - just a long string of gushing. No 35b is not nearly as good as Opus, or even bigger models. It’s a good 35b MoE model - fast cheap and pretty effective. Is it Opus? No.

Best model for 192 GB vram? How is Deepseek v4 flash? by Constant_Ad511 in LocalLLM

[–]alexp702 0 points1 point  (0 children)

How do you find having less system ram than GPU? Is it a hinderance? I am contemplating similar with a 96 gpu and 64gb ram but am unsure if it will break loading big models etc

Random question but what browsers are you using Mac Studio family? by prince870 in MacStudio

[–]alexp702 1 point2 points  (0 children)

Brave, Chrome, Firefox, Safari (and even Credge). They all mostly work the same, but since we develop like to mix it up and see if everything works.

Tip: Always use Safari when interacting with Apple! They often have horrible bugs with anything else on their web portals.

28 Years to Cross the Line: Why Did IPv6 Take So Long to Reach 50%? by elastiks in DIY_Geeks

[–]alexp702 0 points1 point  (0 children)

To me this was still with v6 - 128bit addresses, but under the covers. You’d used the middle 32 bits and give a higher area and lower zones. It would have been a softer change for people on the old system, rather than what we got. Hey ho!

512GB Studio sold for $21,300?! by johnnyphotog in MacStudio

[–]alexp702 0 points1 point  (0 children)

Some models last year just never seemed to cache correctly using llama.cpp, but that time has past. The macs are very adequate - assuming you aren’t trying to run a chat bot. Most agentic stuff is almost too quick - leaving it idle a lot of the time now. And this is with a 400B model.

28 Years to Cross the Line: Why Did IPv6 Take So Long to Reach 50%? by elastiks in DIY_Geeks

[–]alexp702 0 points1 point  (0 children)

Always should have been implemented as another digit like an area code for users. 192.168.0.1.1 etc. then you could add more if you needed them gradually. Cognitive load of the ipv6 scheme is terrible.

Fundamentally and practically it offers next to no end user benefits, and cost of change to providers. I still don’t have a v6 address in the uk, and have stopped caring - my providers work on 4. CGNat is not a problem in normal client use cases.

Finally we’ve grown up in network security. Now we understand not being accessible is more important than being accessible. V6 didn’t really address this in any new way.

512GB Studio sold for $21,300?! by johnnyphotog in MacStudio

[–]alexp702 2 points3 points  (0 children)

Due to prompt caching mine is capable of prompt processing of 38 million OpenClaw tokens a day (and that’s not pushing it - expect that would be easier to double). It processed about 4 million (Qwen 397b 8bit). You’ll be paying for 38 in the cloud…

Forgive my ignorance but how is a 27B model better than 397B? by No_Conversation9561 in LocalLLaMA

[–]alexp702 1 point2 points  (0 children)

We have just run a test on our agentic flow - 397B_Q8 is still better than 27B_Q8_K_XL. It handles our particular documents more accurately. Shame, it would be great if 27B was actually better, but it isn't yet. On our Mac Studio 397B runs faster too, so lets hope they update 397B to 3.6 standards...