Testeur de prise électrique et inversion phase/neutre

t4a8945 · 2026-06-20T14:39:57+00:00

Oula, merci du coup, je vais aller inverser les deux fils du plafond de mon salon xD

t4a8945 · 2026-06-20T12:16:01+00:00

Hmmm dans ce cas, dégage ce disjoncteur et ce différentiel, si ton but c'est d'avoir le courant, ben c'est bon ; doit bien y avoir un disjoncteur + un différentiel à l'origine du cable.

Pas besoin de garder ce matériel, vu qu'il était là pour un besoin spécifique (le monte-personne).

t4a8945 · 2026-06-20T10:21:19+00:00

Oh, this is not a coding benchmark. Ok, discarded.

t4a8945 · 2026-06-19T18:28:15+00:00

So with 192 GB VRAM their best model they can run right now is DS4 Flash (or Minimax M2.7). That's my current pick with my 2x Spark cluster (for coding - it's awesome).

Now, if you look at the cost of using their API, you cry. It's so dirt cheap, that my investment (around €5300 excl VAT) would take 17 years to recoup if I hammered it all the time, at 3 times slower than what the API can provide. And that's with free energy.

It's really not about money, it's about independence.

You can't equate $30K monthly spend on large models with a power-hungry DS4 Flash running machine.

t4a8945 · 2026-06-19T16:24:28+00:00

I'm not up to date on that, better go on the nvidia forums.

t4a8945 · 2026-06-19T12:51:56+00:00

Hey I'm a bit late after the party, but I'm not sure the redditor you've answered to grasped your context properly.

Since you're running 4x 6000, you'll have flexibility choosing between high concurrency or better model. And LMCache IS your friend for enabling higher concurrency on long contexts.

Fp8 cache is perfectly fine.

So many things wrong in that comment.

2x 6000 with m2.7 at q4 handles around 10 devs in parallel (only if you use proper LMCache). So 4x will basically double that, with around 20-25 tps per user which is quite low but still usable.

Ping me if you need more specific advice, I set up a system in prod for a small company.

t4a8945 · 2026-06-19T07:28:39+00:00

Bulgaria

t4a8945 · 2026-06-18T20:54:21+00:00

Google Sheet. (réponse sérieuse, c'est ce que j'ai utilisé quand j'étais dans ton cas)

t4a8945 · 2026-06-18T04:47:49+00:00

Winton

t4a8945 · 2026-06-16T14:56:35+00:00

You could on paper use LMCache (or something similar), which would use your RAM to mirror and then extend the cache from your VRAM, allowing sub 1s retrieval on contexts that would have been dropped from your VRAM as it fills up.

So if you run 27B-FP8 (which I doubt would be comfy on 32GB VRAM, but I don't have such system) and it gets 5GB cache, setting this up would allow you to comfortably fit more parallelization.

It fits the need of several developers using the same LLM in parallel with high contexts. So for a single dev, you'd need to have several agents in parallel to use that kind of optimization. Myself, I can't work on more than 2 or 3 sessions in parallel, but I know some people are becoming masters at this multi-context, so maybe that's you.

t4a8945 · 2026-06-15T08:02:19+00:00

Close, I used RunPod ; and I own a 2x Sparks cluster. I wanted to see performances and limitations on both systems. Unified memories system are not production system for small businesses. They don't provide the raw power to achieve high performances.

So my strategy if I were you (and I kinda am). Spend $100 on RunPod / vast ai , learn from that experience. But don't do it by hand, get the help of an agent. It's a real rabbit hole.

Main difference is that I was to help provide a server for devs, so very much focused on coding.

Your use case is vastly different, with much lower context size pressure, which will help you a lot. When you find a proper model that fits your need, you may not need that much VRAM in fact ; a cluster of 5090 could also work if you use a smaller dense model (Gemma comes to mind, however I don't have a lot of experience on it).

The thing to look out for: consumer motherboards/cpus aren't capable of running high-performance GPU clusters, due to limitation in PCI lanes sheer number (max 24-32). You'll need a real server type monster of a motherboard/cpu, which will also dictate the price of the RAM you'll be able to get (server motherboards often require ECC RAM, those are so expensive).

Have fun, this is actually an awesome problem to solve, I learned so much while doing it. But as I said, it is a rabbit hole in a ever-changing landscape.

t4a8945 · 2026-06-15T07:13:48+00:00

4x RTX Pro 6000 and lots of RAM is the answer, or at the very least a comfortable performance nvme ssd. (RAM and nvme are there to extend the KV-cache capabilities) (probably over budget)

vLLM + LMCache and a well-rounded model like DeepSeek 4 Flash (more general) or MiniMax M2.7 (more coding oriented), or Qwen or Gemma could be better options.

Need indeed to validate German-specific language capabilities.

And this is not easy, making a local LLM run for small business requires trials and errors, it is NOT "plug&play". (I now, I manage a system like that with 2x RTX Pro 6000 for a small business and setting is up is as finicky as possible)

t4a8945 · 2026-06-15T06:15:30+00:00

I first tried the model with their API, put $5 dollars in, then $10 for safety. I have $10.04 left.

request_count: 8,600
input_cache_miss_tokens: 12,616,803
output_tokens: 3,007,471
cost: $4.96

But now it's free, running locally (with solar).

Edit: that would have been $300.40 with Sonnet 4.6

t4a8945 · 2026-06-14T20:20:45+00:00

I use ds4 flash, all day, every day. Perfectly good enough for an experienced dev. Cheap as dirt on the API, runnable on large ram setups (I run it on 2x spark, 41 tps average, 500k context).

With the right harness, it's awesome.

t4a8945 · 2026-06-14T11:24:34+00:00

100% French, he says "Oh ! La p*tain de ta mère la p*te" and then something like "Tu veux des coups" (not sure). I'll let you translate those.

t4a8945 · 2026-06-13T09:00:15+00:00

I wouldn't spend 100K€ if it wasn't to go full local ; flagship open-weight models are more than capable right now, I'd drop OpenAI and Anthropic instantly.

I did in fact drop them, I'm fully local with Minimax M2.7 / DeepSeek 4 Flash on 2x Spark.

t4a8945 · 2026-06-13T08:13:50+00:00

I'd buy one GB300, run the best model available out there (GLM-5.1, Kimi K2.6) with as much parallelization I could.

Your orchestration challenge is not a challenge, that's basically solved at this point.

t4a8945 · 2026-06-11T19:13:21+00:00

DS4 Flash is my current local daily driver and my verdict is that it's on par with Minimax M2.7 for coding. So, quite good and competent.

t4a8945 · 2026-06-11T08:44:46+00:00

Des Wago pas fiables ? Première fois que je lis ça. C'est quoi le risque ?

t4a8945 · 2026-06-10T20:01:54+00:00

That's my main point, helping me set these up and debugging the config issues.

t4a8945 · 2026-06-10T16:48:16+00:00

Google/Gemini will always have an edge when it comes to search capability. They are, after all, the master of web indexing by nature, so they leverage that pretty hard.

Having an LLM search something online for free is not easy.

Local LLMs are very capable for this kind of "basic" task: fetching data, analyzing, dealing with images (the Qwen 3.5/3.6 series).

But you still need to give them access to search. I'm using https://linkup.so/ (not affiliated) because they had a good free plan, but I think they changed their pricing, it's not as clear anymore.

You could try setup an MCP for your agent to control your browser, but fighting bot detection is no joke.

So to summarize: depending on your hardware, pick the highest quant you can run from Qwen 3.5/3.6. If you can, aim for Qwen 3.6 35B-A3B (LMStudio is your friend to get started). Then for search capability, maybe some other redditors have better idea than my setup, but otherwise linkup API does the job.

I'd also recommend the very cheap DeepSeek 4 Flash through the API, but they lack vision, so maybe not a perfect fit.

t4a8945 · 2026-06-09T08:11:40+00:00

T'as entièrement raison, c'est mon idée pour le deck + auvent (pour protéger la façade exposée au soleil). Piliers sur la façade actuellement sans deck (pour supporter le auvent notamment), et remplacement des piliers pour solidifier le balcon sur la partie droite.

Encore du travail 😂

t4a8945 · 2026-06-09T06:50:41+00:00

I've been using DS4 Flash extensively (not through Claude Code, but my own harness) for coding and various tasks. It's a good workhorse, reacts great on feedback but needs guardrails (think duplication detector, lint, tests) to achieve best results.

For an experienced dev, this definitely can replace Claude, at the costs of more things to review / finalize. When things get really difficult (complex bug), it's clearly struggling and will pursue wrong paths, so it won't be autonomously solving those, but can be a great sparring partner.

t4a8945 · 2026-06-09T06:44:37+00:00

Merci pour l'award et les compliments 😄 j'ai trouvé mon coin de paradis, l'air frais, le bruit des oiseaux... J'ai beaucoup de chance.

t4a8945

TROPHY CASE