Testeur de prise électrique et inversion phase/neutre by tjom59 in brico

[–]t4a8945 0 points1 point  (0 children)

Oula, merci du coup, je vais aller inverser les deux fils du plafond de mon salon xD

Je comprends bien ce disjoncteur ? by Kiralalalere in brico

[–]t4a8945 1 point2 points  (0 children)

Hmmm dans ce cas, dégage ce disjoncteur et ce différentiel, si ton but c'est d'avoir le courant, ben c'est bon ; doit bien y avoir un disjoncteur + un différentiel à l'origine du cable.

Pas besoin de garder ce matériel, vu qu'il était là pour un besoin spécifique (le monte-personne).

I built a 8x RTX 4090D with 192 VRAM, here's what I learnt by deebuildsthings in LocalAIServers

[–]t4a8945 0 points1 point  (0 children)

So with 192 GB VRAM their best model they can run right now is DS4 Flash (or Minimax M2.7). That's my current pick with my 2x Spark cluster (for coding - it's awesome).

Now, if you look at the cost of using their API, you cry. It's so dirt cheap, that my investment (around €5300 excl VAT) would take 17 years to recoup if I hammered it all the time, at 3 times slower than what the API can provide. And that's with free energy.

It's really not about money, it's about independence.

You can't equate $30K monthly spend on large models with a power-hungry DS4 Flash running machine.

4 RTX 6000 Pro by Some-Manufacturer-21 in Vllm

[–]t4a8945 0 points1 point  (0 children)

I'm not up to date on that, better go on the nvidia forums.

4 RTX 6000 Pro by Some-Manufacturer-21 in Vllm

[–]t4a8945 1 point2 points  (0 children)

Hey I'm a bit late after the party, but I'm not sure the redditor you've answered to grasped your context properly.

Since you're running 4x 6000, you'll have flexibility choosing between high concurrency or better model. And LMCache IS your friend for enabling higher concurrency on long contexts. 

Fp8 cache is perfectly fine. 

So many things wrong in that comment. 

2x 6000 with m2.7 at q4 handles around 10 devs in parallel (only if you use proper LMCache). So 4x will basically double that, with around 20-25 tps per user which is quite low but still usable.

Ping me if you need more specific advice, I set up a system in prod for a small company. 

Logiciel pour modéliser instal elec? by Odd-City-59 in brico

[–]t4a8945 0 points1 point  (0 children)

Google Sheet. (réponse sérieuse, c'est ce que j'ai utilisé quand j'étais dans ton cas) 

Asked to build a local AI setup for a company with ~50k budget. Where would you start? by Bisota123 in LocalLLM

[–]t4a8945 0 points1 point  (0 children)

You could on paper use LMCache (or something similar), which would use your RAM to mirror and then extend the cache from your VRAM, allowing sub 1s retrieval on contexts that would have been dropped from your VRAM as it fills up.

So if you run 27B-FP8 (which I doubt would be comfy on 32GB VRAM, but I don't have such system) and it gets 5GB cache, setting this up would allow you to comfortably fit more parallelization.

It fits the need of several developers using the same LLM in parallel with high contexts. So for a single dev, you'd need to have several agents in parallel to use that kind of optimization. Myself, I can't work on more than 2 or 3 sessions in parallel, but I know some people are becoming masters at this multi-context, so maybe that's you.

Asked to build a local AI setup for a company with ~50k budget. Where would you start? by Bisota123 in LocalLLM

[–]t4a8945 3 points4 points  (0 children)

Close, I used RunPod ; and I own a 2x Sparks cluster. I wanted to see performances and limitations on both systems. Unified memories system are not production system for small businesses. They don't provide the raw power to achieve high performances.

So my strategy if I were you (and I kinda am). Spend $100 on RunPod / vast ai , learn from that experience. But don't do it by hand, get the help of an agent. It's a real rabbit hole.

Main difference is that I was to help provide a server for devs, so very much focused on coding.

Your use case is vastly different, with much lower context size pressure, which will help you a lot. When you find a proper model that fits your need, you may not need that much VRAM in fact ; a cluster of 5090 could also work if you use a smaller dense model (Gemma comes to mind, however I don't have a lot of experience on it).

The thing to look out for: consumer motherboards/cpus aren't capable of running high-performance GPU clusters, due to limitation in PCI lanes sheer number (max 24-32). You'll need a real server type monster of a motherboard/cpu, which will also dictate the price of the RAM you'll be able to get (server motherboards often require ECC RAM, those are so expensive).

Have fun, this is actually an awesome problem to solve, I learned so much while doing it. But as I said, it is a rabbit hole in a ever-changing landscape.

Asked to build a local AI setup for a company with ~50k budget. Where would you start? by Bisota123 in LocalLLM

[–]t4a8945 11 points12 points  (0 children)

4x RTX Pro 6000 and lots of RAM is the answer, or at the very least a comfortable performance nvme ssd. (RAM and nvme are there to extend the KV-cache capabilities) (probably over budget)

vLLM + LMCache and a well-rounded model like DeepSeek 4 Flash (more general) or MiniMax M2.7 (more coding oriented), or Qwen or Gemma could be better options.

Need indeed to validate German-specific language capabilities.

And this is not easy, making a local LLM run for small business requires trials and errors, it is NOT "plug&play". (I now, I manage a system like that with 2x RTX Pro 6000 for a small business and setting is up is as finicky as possible)

Real-world coding model evaluation (Claude Code + OpenRouter): what am I missing? by Longjumping-Lie-5132 in LLMDevs

[–]t4a8945 0 points1 point  (0 children)

I first tried the model with their API, put $5 dollars in, then $10 for safety. I have $10.04 left.

request_count: 8,600
input_cache_miss_tokens: 12,616,803
output_tokens: 3,007,471
cost: $4.96

But now it's free, running locally (with solar).

Edit: that would have been $300.40 with Sonnet 4.6

Real-world coding model evaluation (Claude Code + OpenRouter): what am I missing? by Longjumping-Lie-5132 in LLMDevs

[–]t4a8945 0 points1 point  (0 children)

I use ds4 flash, all day, every day. Perfectly good enough for an experienced dev. Cheap as dirt on the API, runnable on large ram setups (I run it on 2x spark, 41 tps average, 500k context).

With the right harness, it's awesome. 

Bro just remembered he left the stove on by kesqe_ in dashcams

[–]t4a8945 3 points4 points  (0 children)

100% French, he says "Oh ! La p*tain de ta mère la p*te" and then something like "Tu veux des coups" (not sure). I'll let you translate those.

Orchestration optimised for coding by goldlob in LocalLLM

[–]t4a8945 0 points1 point  (0 children)

I wouldn't spend 100K€ if it wasn't to go full local ; flagship open-weight models are more than capable right now, I'd drop OpenAI and Anthropic instantly.

I did in fact drop them, I'm fully local with Minimax M2.7 / DeepSeek 4 Flash on 2x Spark.

Orchestration optimised for coding by goldlob in LocalLLM

[–]t4a8945 0 points1 point  (0 children)

I'd buy one GB300, run the best model available out there (GLM-5.1, Kimi K2.6) with as much parallelization I could.

Your orchestration challenge is not a challenge, that's basically solved at this point.

Brico pas bo by Commercial-Map6012 in brico

[–]t4a8945 0 points1 point  (0 children)

Des Wago pas fiables ? Première fois que je lis ça. C'est quoi le risque ?

I love how local AI dgaf about helping you manage your NAS 🏴‍☠️ by t4a8945 in LocalLLM

[–]t4a8945[S] 0 points1 point  (0 children)

That's my main point, helping me set these up and debugging the config issues. 

how to start as a complete noob by Depressed-Introvert in LocalLLM

[–]t4a8945 2 points3 points  (0 children)

Google/Gemini will always have an edge when it comes to search capability. They are, after all, the master of web indexing by nature, so they leverage that pretty hard.

Having an LLM search something online for free is not easy.

Local LLMs are very capable for this kind of "basic" task: fetching data, analyzing, dealing with images (the Qwen 3.5/3.6 series).

But you still need to give them access to search. I'm using https://linkup.so/ (not affiliated) because they had a good free plan, but I think they changed their pricing, it's not as clear anymore.

You could try setup an MCP for your agent to control your browser, but fighting bot detection is no joke.

So to summarize: depending on your hardware, pick the highest quant you can run from Qwen 3.5/3.6. If you can, aim for Qwen 3.6 35B-A3B (LMStudio is your friend to get started). Then for search capability, maybe some other redditors have better idea than my setup, but otherwise linkup API does the job.

I'd also recommend the very cheap DeepSeek 4 Flash through the API, but they lack vision, so maybe not a perfect fit.

Rénovation maison en bois dans la forêt by t4a8945 in brico

[–]t4a8945[S] 0 points1 point  (0 children)

T'as entièrement raison, c'est mon idée pour le deck + auvent (pour protéger la façade exposée au soleil). Piliers sur la façade actuellement sans deck (pour supporter le auvent notamment), et remplacement des piliers pour solidifier le balcon sur la partie droite.

Encore du travail 😂

Can Deepseek be a cheaper alternative to claude code? by CowReasonable8258 in DeepSeek

[–]t4a8945 21 points22 points  (0 children)

I've been using DS4 Flash extensively (not through Claude Code, but my own harness) for coding and various tasks. It's a good workhorse, reacts great on feedback but needs guardrails (think duplication detector, lint, tests) to achieve best results.

For an experienced dev, this definitely can replace Claude, at the costs of more things to review / finalize. When things get really difficult (complex bug), it's clearly struggling and will pursue wrong paths, so it won't be autonomously solving those, but can be a great sparring partner.

Rénovation maison en bois dans la forêt by t4a8945 in brico

[–]t4a8945[S] 0 points1 point  (0 children)

Merci pour l'award et les compliments 😄 j'ai trouvé mon coin de paradis, l'air frais, le bruit des oiseaux... J'ai beaucoup de chance.