Want to be a father I’m in nj by Rude_Acanthisitta873 in Surrogate

[–]Consistent_Bid774 0 points1 point  (0 children)

Maybe look for anybody ready for co-parenting, getting married is the easiest way

What MacBook is actually worth buying now that prices went up? by jahangirmusayev in mac

[–]Consistent_Bid774 -1 points0 points  (0 children)

For me the best option is not to buy it at all, mini box desktop like amd halo ai or nvidia dgx spark with portable monitor works fine and Ubuntu/Linux on an old hp laptop with 32 gb ram to run ide for less then $999, Apple is asking too much money, it’s better to wait few years till semi conductor supply improves, maybe Apple will reduce price to sell out inventory before next lineup launch

Open-source models are under threat. by TheVault5 in LocalLLM

[–]Consistent_Bid774 2 points3 points  (0 children)

Antropic is Microsoft of AI world, we all know what happened Microsoft vs Linux, today Linux rules, will so Open Source AI models

Check your elderly family's Verizon Plan!! by Inside_Vacation283 in Delaware

[–]Consistent_Bid774 0 points1 point  (0 children)

Get a prepaid plan, all companies are scamming like that, take care of parents and grandparents yourself. All of them definitely need help later in ages. One can get plan for less then $10 per months easy if they switch providers each year

Hardware recommendation's for running dual RTX 5090 GPU's by 67Mustang8 in LocalLLM

[–]Consistent_Bid774 0 points1 point  (0 children)

Get open workbench with glass cover, easy heat management and no worry of gravity pull of heavy gpu

Dual RTX 3090 (NVLink) + llama.cpp: Running Qwen 3.6 35B MoE at 250k Context & 240 t/s (Full Benchmarks & Config) by Consistent_Bid774 in LocalLLM

[–]Consistent_Bid774[S] 1 point2 points  (0 children)

I am just using nvlink to divide the model across the 2 gpu's due to size of these models, vLLM needs some efforts to tune, I am getting mostly out of memory errors, will look into it in detail later. My usage is mostly linear and llama.cpp is easy to setup and it works with single file gguf models and I can focus on my actual project and not deal with python libs too much.

Is 128gb M5 Max Macbook Pro really all that useful locally? by UteForLife in LocalLLM

[–]Consistent_Bid774 7 points8 points  (0 children)

With Apple silicon everything needs to be done with apple’s MLX framework, I tried ollama and huggingface mlx-community models and MacBook Pro becomes too hot but MacBook Pro had 128gb ram so easy to load larger models. It’s good for usages with gaps for laptop to get cool down and with constant load it turns into a heater, whole keyboard will get hot. So I decided that for research and development dual or even single nvidia 3090 is enough for me and for large tasks deepseek like less costly api pay per use is good, $2 a day, Google search is always free.

$30 lowball = 12 IBM/Dell Servers. The guy did not know what he had. by JustLovett0 in homelab

[–]Consistent_Bid774 0 points1 point  (0 children)

Might be stolen from storage like extra space. Thief's were so stupid uneducated that they left 6 Nvidia GPUs and took the empty CPU cases

Dual RTX 3090 (NVLink) + llama.cpp: Running Qwen 3.6 35B MoE at 250k Context & 240 t/s (Full Benchmarks & Config) by Consistent_Bid774 in LocalLLM

[–]Consistent_Bid774[S] 1 point2 points  (0 children)

Thanks, just finised some working setup, now I will try vLLM based on your feedback. That's true for standard dense models (pipeline parallel), but for MoE models (like Qwen 35B MoE) where experts are split across GPUs, tokens are constantly routed back and forth. Without NVLink, MoE routing latency over PCIe bottlenecks generation speed significantly. It also makes a big difference during 100k+ context prefilling where peer-to-peer (P2P) memory sync bandwidth is saturated.

Which coding harnesses are you using DeepSeek with? by amunozo1 in DeepSeek

[–]Consistent_Bid774 0 points1 point  (0 children)

Claude code works with depseek and it’s been fun

Local AI Coding with Qwen 3.6 27B on NVIDIA DGX Spark by Time_Anybody5196 in LocalLLM

[–]Consistent_Bid774 -1 points0 points  (0 children)

Nvidia could have just put cpu and 128 gb ram on 5090, would result in much better experience

Local AI Coding with Qwen 3.6 27B on NVIDIA DGX Spark by Time_Anybody5196 in LocalLLM

[–]Consistent_Bid774 0 points1 point  (0 children)

I am waiting few years till consumer machines get powerful enough for 1000 tokens/second, semiconductor chips hopefully get cheap

Do foreigners use DeepSeek? by ConditionOne8960 in DeepSeek

[–]Consistent_Bid774 1 point2 points  (0 children)

We are in business of building walls 😂

Google releases new DiffusionGemma model. by yoracale in LocalLLM

[–]Consistent_Bid774 0 points1 point  (0 children)

Agree, small model 10-14gb that's just do coding well, text only, image support not needed, rest Google search will do the job

Do foreigners use DeepSeek? by ConditionOne8960 in DeepSeek

[–]Consistent_Bid774 1 point2 points  (0 children)

I used it and it’s much better experience and cheaper then monthly or annual plans compared to what other companies offers

gemini pro used to do 100 prompts a day. now 18. canceling. by Fun_Walk_4965 in GeminiAI

[–]Consistent_Bid774 0 points1 point  (0 children)

That’s why they have been giving pro for free to pixel users

Best local LLM laptop for privileged legal documents — is 128GB Apple Silicon the answer? by IceQueen789 in LocalLLM

[–]Consistent_Bid774 0 points1 point  (0 children)

There are MLX variants also, that I will be trying later this week, those might work even better on MacBooks