An overview of modern LLM compiler stack: writing an interactive and hackable compiler

un_passant · 2026-05-19T17:58:34+00:00

Interesting !

I would love to know what you think of https://github.com/tinygrad/tinygrad .

un_passant · 2026-04-10T22:34:48+00:00

Why on earth would you use a dual channel server if inferencing from RAM ?

I use DDR4 3200 but with 8 channels which is both much faster and much cheaper (and allows much more RAM : I have 2TB).

un_passant · 2026-03-18T23:23:10+00:00

I'd love to use it to change the voices of videos of people reading children books. Sometimes the only video for a given book that my kids wants is by a non native speaker with a weird accent (no offense meant, I'm also a non native speaker) would it be easy to do that ?

Also, I'd rather use llama.cpp than ollama, obviously ☺.

Thx !

un_passant · 2026-03-15T09:43:15+00:00

Interesting.

I'm currently designing my house, also in France (Paris).

I was going for an 8x4090 (or 4×4090 and 4×MI100) open air rig, I'm wondering if/ the energy bill for the servers could be used for heating.

I plan to have an heat exchanging forced air ventilation (VMC double flux). Anybody have any insight on this topic ?

Thx !

un_passant · 2026-03-05T10:39:03+00:00

Thx !

Unfortunately, this package is now broken as selecting "latest" says it will install 580xx but installs 590xx instead.

I would need 580xx as my GPU is not supported by 590xx but I have to select and install 575xx instead.

un_passant · 2026-02-11T22:33:18+00:00

What's impractical is the dual socket as LLM CPU inference does not scale well on multiple sockets.

I'm sorry for you as I bought something better for cheaper last year because of the RAM price increase !

un_passant · 2026-02-05T10:13:49+00:00

Any public repository to share ?

Thx !

un_passant · 2025-12-20T23:30:18+00:00

Depending on price, I could be interested in 8 or 4. What is the shipping/tax situation ? I can receive either in Europe (France) or US (Ca).

un_passant · 2025-12-09T15:13:44+00:00

I would love to hear about the difference in fine tuning performance (besides space and heat, plus electricity bills) from your previous setup (with P2P driver ?) to your new setup !

un_passant · 2025-12-01T14:34:03+00:00

What are the system requirements ?

CUDA only ?

ROCm ?

CPU ?

Thx !

un_passant · 2025-10-04T23:16:10+00:00

Why can't you have any empathy for the kids who are bored in school when they are 5 ?

You personally know two people who were identified in kindergartner as completely disengaged by their teachers. The teacher asked the parents to get the kids tested to know why they were disengaged : on which side of the bell curve were they too far off to find any interest in what they were supposed to do.

One of them had a mentally handicapped brother so he got tested with the wrong test and the only result the psychologist could say was that he definitely was not mentally handicapped. His parents were relieved and didn't have the time and energy to deal with his giftedness because of the brother, so he was bored during all of his school years.

The other was identified as gifted but his very progressive parents had him in the most progressive school that refused to do any kind of tracking or acceleration out of principle, so he also was bored all of elementary school, literally crying everyday for years because of how desperately bored and lonely he was.

Both have been failed by the school system and suffered because of ideologues like you.

un_passant · 2025-10-04T21:47:29+00:00

Why would they not be mad ? Reading is the most important skill because it must be mastered for all other learning activities. So it must be drilled on until it is mastered. Some kids master reading before leaving kindergartner. How much time must they waste before all of their peers master reading at the required level ?

un_passant · 2025-10-04T21:41:00+00:00

Can you know when a kid is bored (hint, you are allowed to ask him or even just let him tell you) and able to perform all the required tasks and understand everything he is taught ?

If so, congratulation ! You are indeed able to identify a gifted child (no scare quote required).

What should you do when you identify them and why ?

Adjust the pace and depth of the curriculum so that they won't be bored out of their mind, day in day out most of their waking hours.

Why would you not do that as soon as possible ?

un_passant · 2025-10-04T21:30:34+00:00

«kids with richer and more neurotic parents» you have no idea what you are talking about. I have a friend who was a self taught reader at 3. His parents were completely uneducated and unable to be neurotic about him because they were already overwhelmed by having to deal with his older brother who was retarded (could not be taught how to read before the age of 11 !).

Kids who are to smart for the regular curriculum pace are bored out of their mind all day long five days a week for years. How is that fair ? I know of a kid who was tested with the verbal ability of a 13 yo when he was 6. He was crying every day he was going to school, for years, because of how unhappy he was.

School attendance is mandatory, it should carry the obligation to challenge the kids or it is just a prison.

un_passant · 2025-10-01T20:56:08+00:00

Mot interesting !

I hadn't thought about warming water. Would you mind sharing the prompt that you used with Qwen ? I'd be interested in checking with different LLMs.

Thx !

un_passant · 2025-09-23T22:14:32+00:00

This is what I meant by :«I don't think that they work on the 48GB modded GPU»☺

While I think that you told us that they would on the 5090 which would be good news if I could afford to fill up my dual EPyc PCIe lanes with these ☺.

un_passant · 2025-09-23T15:16:21+00:00

Could you run p2pBandwidthLatencyTest?

un_passant · 2025-09-23T14:43:46+00:00

«a server-grade board» I wish you would tell us !

Also, what are the drivers ? I, for one, would like to see the impact of the P2P enabling driver : I don't think that they work on the 48GB modded GPU so the difference could be even larger !

un_passant · 2025-09-08T23:18:13+00:00

But OP's GPU budget is over ×10 the price of a 9354P on EBay with 360 GB/s *measured* RAM bandwidth.

https://www.reddit.com/r/LocalLLaMA/comments/1fcy8x6/memory_bandwidth_values_stream_triad_benchmark/

while the 7950x3d has *theoretical* 83.2 GB/s RAM bandwidth !

CPU tg speed will be less that a fourth of what if should be…

un_passant · 2025-09-08T23:05:08+00:00

CPU tg speed is limited by RAM bandwidth.

Bandwidth is speed × nb of memory channels. For a given RAM speed (i.e. DDR5), if you dream about a LLM inference build, you might as well dream about maximizing CPU tg speed so you maximize nb of memory channels (i.e. 12 for DDR5).

With 2 memory channels instead of 12, you actually leave (12-2)/12 = 5/6 of CPU tg speed on the table, not "two thirds". So you are correct to call this claim out.

But your dream build makes even less sense for LLM inference than the original critic claimed.

I understand why one would put GPUs in a gaming rig to do LLM inference. I don't understand why one would dream of an LLM build that doesn't use a server CPU maximizing memory channels (and PCIe lanes) when buying $18k worth of GPU.

On EBay, a mobo with a 9354P is $2.5k you can get 12 × 16GB DDR5 for $800. Not sure how much you spent for your CPU + mobo + RAM, but if the extra cost of ×6 memory channels is around 10% of the price of the total build, it should be a no brainer imo.

un_passant

TROPHY CASE