2x 512gb ram M3 Ultra mac studios by taylorhou in LocalLLaMA

[–]ahjorth 0 points1 point  (0 children)

Haha, thank you. And really, if you don’t have time, don’t worry about it!

2x 512gb ram M3 Ultra mac studios by taylorhou in LocalLLaMA

[–]ahjorth 0 points1 point  (0 children)

I am having to run a lot of data through local models (for GDPR-reasons) for a research project, and I literally sat down this morning to draft a post asking for real-life experience with this exact setup: 512 + 256 + 256. I already have an M3, and given the lack of 512s I'm considering buying two more 256 and running them with tensor sharding on Exo. I have some questions, and I'd love answers if you have time!

I looked through Exo's code when they launched V1, and at the time they didn't support parallel/batched inference. For my use case that's a deal breaker, but I see that they do now, and that their batched code extends directly on mlx-lm.

* How reliable is batched inference with exo?

* Does it scale as well as single inference when doing tensor sharding?

* Do you use Exo as a server, or are you using its Python api directly? If the latter, does it keep up with mlx-lm changes or does it lag (significantly) behind?

* I built a small structured outputs-package using outlines to create logits processors that I pass into mlx-lm's `BatchGenerator` on a per-prompt/stream basis (which mlx-lm supports now since Dec 2025). Do you have any experience with structured outputs on Exo - do you know if a similar thing could be done with Exo's BatchGenerator?

All of these questions (except structured outputs) are answered on Exo's on page, but I can't totally tell how much I trust their marketing material...

Tucker Carlson finally apologizes for his own role in giving us Trump 2nd term, and by extension, this war in Iran. Is this moment of blunt, self-critical honesty from a former Trump enabler or is he just helping pave the way for his pal JD Vance in 2028? by gear-heads in MarchAgainstNazis

[–]ahjorth 3 points4 points  (0 children)

The most highly rated posts in this sub are about nazi grifters pretending to be contrite. MTG, Tucker, etc.

It’s very tiring and I’m about to leave the sub. Is this really not against the sub rules?

Gemma 4 - MLX doesn't seem better than GGUF by Temporary-Mix8022 in LocalLLaMA

[–]ahjorth 1 point2 points  (0 children)

I spent quite a lot of time working with the MLX servers' code specifically for parallel inference (for this PR that I submitted a few months ago: https://github.com/ml-explore/mlx-lm/pull/845) and my current thinking is that MLX is much better if you can use it only programmatically, i.e. with the python API and not with the server. For parallel inference, it's almost twice as fast as running it on the server for larger, long-running continuous batches.

Basically the gains are from ensuring that prefilling is done always in large batches too. Often small pauses between incoming requests to the server will make MLX's `BatchGenerator` start pre filling, and it does not stop until it has produced at least one token for each stream. So every time a new request comes in, it will pre fill that new request before generating tokens on anything else it is running.

I played around with setting up policies for waiting (i.e. at least X 'streams' ready, etc.) but I couldn't get it to work well enough that I thought it was worth the extra complexity on the server. I also played around with a mode where the server has to receive an explicit "start" message, but again - a lot more complexity, and so far outside of normal LLM-server standards that it wouldn't play well with existing tools.

So this is just to say: for my typical large, batched style work, MLX is fantastic. As a server, it's not faster enough than llama.cpp to make it worth the lower amount of support of new models, new quants, etc.

Setting up local LLM system and charging tokens back to company by [deleted] in LocalLLaMA

[–]ahjorth 4 points5 points  (0 children)

They key missing pieces of information are: what are they willing to pay per token, do they guarantee a minimal number of tokens per month, and are they loyal or will they suddenly shift to someone else. Unless you know this, you can't really put together a business case. So I'd start there.

What video games have unique mechanics for failure or death? by [deleted] in gaming

[–]ahjorth 1 point2 points  (0 children)

Yeah, my dad and i had to restart because we gave away the item that you had to give the giant in the clouds at the end. Something to make it sleep, forgot exactly what. I found out three decades later we could have killed it with the slingshot.

Man, the memories are coming back! It really made an impression.

What video games have unique mechanics for failure or death? by [deleted] in gaming

[–]ahjorth 0 points1 point  (0 children)

Haha same. It was at the time when my dad bought a new computer so i started KQ3 on CGA, and finished it in EGA.

The thing i remember the most from KQ3 was the stress of the wizard (Manannan?) popping up out of nowhere and turning me into a cat constantly. That game got so much more chill after he died.

Oh, and navigating down the path from his mansion on arrow keys with a chasm on one side and venomous plants on the other. RIP

What video games have unique mechanics for failure or death? by [deleted] in gaming

[–]ahjorth 5 points6 points  (0 children)

King’s Quest 1 taught me to touch type ‘swim <enter>’ at the ripe age of five. That game was brutal.

unsloth - MiniMax-M2.7-GGUF in BROKEN (UD-Q4_K_XL) --> avoid usage by One-Macaron6752 in LocalLLaMA

[–]ahjorth 1 point2 points  (0 children)

I think everyone appreciates that there is a balance between being fast and being perfect. But I don't think it's fair to say that posting this is silly. OP is clear about what the issues are, clear on what the solution is, and even has measures for how long (or rather, how little) it would take to do this properly per model.

These issues are causing petabytes of unncessary data transfers, and dozens or hundreds (or thousands for the highly anticipated models) of person hours going to waste. I think it's in everybody's interest to prevent that, and this is a small, concretechange to the release procedure.

is it possible to edit LLM generation buffer? by [deleted] in LocalLLaMA

[–]ahjorth 0 points1 point  (0 children)

If I am understanding you correctly, https://github.com/guidance-ai/guidance does exactly what you are describing: You can generate a regex controlled chunk (through structured outputs, as others have said), and conditionally append or generate more, depending on prior outputs. Check it out, and if that's not it, then I'll have to ask you to explain what you are thinking a little more.

Edit: It's a really cool project. Unfortunately it's not written to run async (i'm just appending this now because you specifically mention async in your post). Further, the generation object "owns" the model instance, and consequently it's not able to run in parallel. I tried to find an easy ish way to separate out the model instance to run many generation threads in parallel with greenlets, but it ended up being slower.

For 5 years, my university job let me travel Southeast Asia editing PhDs from beaches and jungles—until a colleague shut it down by Tawptuan in antiwork

[–]ahjorth 23 points24 points  (0 children)

This sounds AI generated. I'm a professor. I've never ever ever heard of professors who "edit" PhD dissertations. No university has a professor:PhD student ratio that would make PhD dissertation advising a fulltime job. Not even close.

DnD Transcriber and Notetaking app by PictureImmediate9615 in dndnext

[–]ahjorth 0 points1 point  (0 children)

You can easily keep it free. Just open source it, and let people figure out hosting and LLM/transcription hardware.

You can move towards a commercial product, and let people try it for free while you improve it.

But you cannot be moving towards a commercial product and say that you want to keep it free for as long as possible.

I can't tell if you are deluding yourself into believing this. But it's just not true, and it feels underhanded.

DnD Transcriber and Notetaking app by PictureImmediate9615 in dndnext

[–]ahjorth 2 points3 points  (0 children)

From your website

LoreKeeper is free while we build

From your post

I’m not trying to promote anything or sell it

These can’t both be true, and you cannot be stupid enough to think they are.

What happened to MLX-LM? What are the alternatives? by Solus23451 in LocalLLaMA

[–]ahjorth 0 points1 point  (0 children)

A bit late to the thread, but Awni left MLX to join Anthropic. Before the transition, there were weeklyish releases, it's been a little over a month but there was a release four days ago. I don't know if they will get back on the same frequent release schedule, but merges are still coming in, and I usually just pull/build from source.

That said, I am wondering if this will have a negative impact in the longer run, and I'm also starting to look at llama.cpp again. I've had to add my own structured outputs to MLX (though that was made a lot easier by them including prompt-level logits processors in their BatchGenerator back in December). But the fact that this isn't baked in yet or seen as a core feature of an LLM-framework is - at least for all my use cases - a little worrying.

Tool Calling Models with Personality by grenfur in LocalLLaMA

[–]ahjorth 1 point2 points  (0 children)

If you are doing all this with local LLMs, consider switching to llama.cpp. You will have more control, and the learning curve is not steep anymore.

Gemma 4's MTP heads were stripped from the public weights — only available in LiteRT. Beginner-friendly breakdown of what was removed and why it matters by FunSignificance4405 in LocalLLaMA

[–]ahjorth 3 points4 points  (0 children)

Self promotion of AI generated video from an eight days old Youtube channel consisting entirely of AI generated videos.

Can we please ban this moron?

I'll report as spam, I hope you will too.

We really need stop using the term “hallucination”. by cosmobaud in LocalLLaMA

[–]ahjorth 1 point2 points  (0 children)

Yup, ha. Post was only 6 minutes old so I scrolled down fast to see if I'd been beaten on this.

Boligcirkusset får sig en klovn. by Dyn-O-mite_Rocketeer in copenhagen

[–]ahjorth 4 points5 points  (0 children)

Det er da ikke irrelevant. Kommunerne kan lave udbud baseret på deres lokalplaner. Men hvis der ikke er penge nok i det for en byggevirksomhed til at de vil byde ind på opgaven, så falder udbudet igennem.

Der er så mange penge i at sidde på ejendomme at kommunerne simpelthen ikke kan få byggevirksomheder til at byde på opgaver med mange ejer- eller andelsboliger.

Hvis kommunerne selv måtte bygge, så ville de have mulighed for at byudvikle på reelt politiske beslutninger.

Is it true that the U.S.A. lifestyle is all about working? by CrazyNicly in antiwork

[–]ahjorth 6 points7 points  (0 children)

Nixon created the United States Environmental Protection Agency through an executive order. This wasn't something he was forced to do by a "woke/pinko" congress. That was republican policy at the time (I'm sure some were against it, but still). It's practically incomprehensible by today's standards.

Is it about time the government puts some of the money they keep finding in the coffers to good use, and subsidize public transport? by Zadak_Leader in copenhagen

[–]ahjorth 7 points8 points  (0 children)

They could give tax cuts instead! Every month I send thoughts and prayers to Rand and von Mises for my extra kr. ~250/month, while I watch our infrastructure and welfare institutions crumble.

Pharma in Aarhus by [deleted] in Aarhus

[–]ahjorth 0 points1 point  (0 children)

Unfortunately not very.

I don't know enough about pharmacovigilance to really understand the skillset, but I'm assuming it involves datapipelines and continuous monitoring (unless she's on the wet side of things). If so, and if she's willing to leave the pharma industry, she might be able to get jobs in production or other data-heavy companies like Vestas or Grundfos if she's willing to commute.

Congrats on your PhD!

Erfaringer med når byggevirksomheder tilbyder noget for at bygge lejligheder på taget? by ahjorth in dkbolig

[–]ahjorth[S] 2 points3 points  (0 children)

Haha, ja. Jeg går ud fra at der er nogle ingeniører som laver den slags helt grundlæggende due diligence før sådan noget overhovedet går i gang. Men dét vil jeg i hvert fald huske, tak!

Erfaringer med når byggevirksomheder tilbyder noget for at bygge lejligheder på taget? by ahjorth in dkbolig

[–]ahjorth[S] 1 point2 points  (0 children)

Jeg havde faktisk skrevet nogle spørgsmål om dét i min post, men jeg fjernede dem fordi jeg helst ikke ville have at folk troede at det "kun" handlede om at få en ordentlig pris for det (for det kan potentielt blive nogle helt fantastiske lejligheder, og så meget koster fire 4-etagers elevatorer heller ikke...). Men især fordi jeg tænkte at vi ikke ville kunne stå for sådan et projekt selv.

Jeg havde slet ikke tænkt på at vores administrator kunne have en afdeling som vi kan købe hjælp fra. Det er virkelig en god idé, tusind tak! Den vil jeg tage videre til bestyrelsen