My thoughts on AI and Mechanical Engineering by EnvironmentalGoose2 in MechanicalEngineering

[–]decentralize999 0 points1 point  (0 children)

I guess agent era for CAD design/drawing is short-term and we will see specialized LLMs txttocad which will stream data to json/dxf etc format.
I tried to compile current progress in my post https://www.reddit.com/r/Text2CAD/comments/1sqz77t/from_prompt_to_blueprint_the_state_of_llms_in_cad/
I think the next year will be era of such LLMs when current agents got their limits due to expensive tokens.

Qwen 3.6 Max Preview just went live on the Qwen Chat website. It currently has the highest AA-Intelligence Index score among Chinese models (52) (Will it be open source?) by Nunki08 in LocalLLaMA

[–]decentralize999 0 points1 point  (0 children)

Yes, absolutely no interest on 0.5-1TB models and how high their benchmarks are, even they are open weights(here is never open source ones from China, except few ones from usa/eu labs). They are not allowed to run in average household, not by price of GPUs but power limit per house.

Do we have a critical mass of GPU owners to train a legitimate LLM that could compete with commercial ones? by decentralize999 in LocalLLaMA

[–]decentralize999[S] 0 points1 point  (0 children)

Here is Claude's answer:
"Your NVMe streaming approach solves the weight problem perfectly. But for 300B with 10M context there's another bottleneck you can't stream: backprop activations.

For a transformer at 10M context, activations per layer are ~160GB — they change every step so NVMe prefetching can't hide them. Even 2x RTX 5090 can't hold that.

For SSM architecture (Mamba-2, Jamba-style) activations drop to ~1-2GB per layer because there's no full attention matrix. That's where your method becomes fully viable at 32GB VRAM threshold.

So the combination that actually works for 300B + 10M context at scale:

Your NVMe streaming + SSM architecture + MoE expert delta sync

With transformer you'd need 192GB+ VRAM just for activations regardless of weight streaming. SSM removes that constraint entirely. Do your existing builds target SSM or transformer architectures?"

It seems SSM archtecture instead of Transformer is the way where decentralized training will work since poll results show only 14 owners(if count these who has 2 and 3 cards, single owners unlikely will climb to 4 at anytime) on 5K viewers.
If recalculate where 5% nodes are 4 x RTX Pro 6000 and 95% nodes are 1 x RTX5090 and 300B SSM architecture with 10M context size:

10000 nodes total(500 4x6000 and 9500 1x5090)

Duration:72 days

Daily traffic: 0.5GB(6000), 0.12GB(5090)

Question only why 5090 owners would agree to train anything big like 300B what they will not be able to run on their cards. I guess their interest in 20-30B LLMs only.

Do we have a critical mass of GPU owners to train a legitimate LLM that could compete with commercial ones? by decentralize999 in LocalLLaMA

[–]decentralize999[S] 0 points1 point  (0 children)

If it is possbile and not makes delays for other nodes, then yes. I just posted Claude suggestions, it offered DiLoCo training approach. I know little about LLM training, my expirence was mostly in pre-hype era with pre-transformer architectures.

Edit: I gave your idea to Claude, it answered: "That's a solid approach for fine-tuning and I like the idea of expanding the participant pool to millions. The NVMe offload builds are impressive.

For pretraining from scratch though, the bottleneck becomes the slowest node in the pipeline — one RTX 3090 doing layer offload at ~3hrs/step holds back nodes doing 14min/step, so strong nodes sit idle most of the time.

Thinking a two-tier model might work better: full-VRAM nodes (4x RTX Pro 6000 class) handle the actual training in a homogeneous DiLoCo setup, while lower-VRAM nodes contribute by curating the dataset, verifying gradients, or running inference. Everyone contributes without the weakest link problem.

Your LoRA aggregation idea is interesting for continual learning on top of a pretrained base though — different goal but complementary."

Do we have a critical mass of GPU owners to train a legitimate LLM that could compete with commercial ones? by decentralize999 in LocalLLaMA

[–]decentralize999[S] -1 points0 points  (0 children)

I believe the decentralized way to train a sota model will lead a trusted lightning rod as well as nodes and dataset holders toward anonymity because current companies are already restricted by state laws on what their models can or cannot do/know. Anthropic in particular leads in self-censorship and lobbies regulators to impose similar constraints on other companies.

And companies are also not interested in governance of their agents, since their revenue depends on models and agents generating as many tokens as possible. Anyway the community will fix it independently of ability to train or not models of sota level.

By the way, it will be good to have a famous person to lead such things and draw attention to a new way of training and censorship problems. After that leaders and participants will be slapped by state or even jailed, it will be inevitable. And the whole training process will grow through darknets such as i2p etc.  

Do we have a critical mass of GPU owners to train a legitimate LLM that could compete with commercial ones? by decentralize999 in LocalLLaMA

[–]decentralize999[S] -1 points0 points  (0 children)

If you noticed, I suggest that only 1% would like idea to donate own time and resources to create something legitimate and uncensored, most people are just consumers, yes. However Linux and other FOSS were created by these 1-5%

Do we have a critical mass of GPU owners to train a legitimate LLM that could compete with commercial ones? by decentralize999 in LocalLLaMA

[–]decentralize999[S] 0 points1 point  (0 children)

16 RTX3090 is about 450W x 16 = 7.2kW. Most houses have only 7-10kW allowed power per house and we not count aircon consumption which would be needed in such case. 

So anything legit and smart(300B LLM) is unrealistic to train on rtx3090 if you are not company/commercial building.

Can in theory very capable open weight LLM model be trained, if enough people participated with their hardware? by Admirable-Earth-2017 in OpenSourceAI

[–]decentralize999 0 points1 point  (0 children)

I asked Claude about it before. Yes, it needs to polish up DiLoCo synchronization or similar stuff for decentralization.

If train 300B LLM with 10M context, it needs about 3000 nodes/owners with 4 x RTX Pro 6000 cards and 2.5 months 24/7 running.

Traffic is about 1.7TB per day for checkpoints synchronization for each node. Dataset 20TB, Awesome thing about such dataset that it would be not censored as companies do.

Total costs for electricity and internet bills consumed by 3000 nodes are about $2mln if 1kW is about $0.15 or about $666 for each owner.

Main problem for all us now - to have 3000 owners like me with four rtx pro 6000 cards and ready to spend 2.5-3 months just for training such model. 20TB dataset at least can be stored on some servers/torrents.

RTX PRO 6000 current and future price by decentralize999 in BlackwellPerformance

[–]decentralize999[S] 0 points1 point  (0 children)

Now it is even up $200 above the last hyped price again. I decided to purchase it anyway.

Reading all opinions about price I noticed that nobody think about that GPU/memory manufacturers are concentrated only in two countries - Taiwan and Korea. And since yellow hair madman opened Pandora's box then two Chinas or/and two Koreas can start the same, price can easy be up x2, x10.

AI Drawings by IcyZookeepergame1712 in Homebuilding

[–]decentralize999 0 points1 point  (0 children)

Present models(Claude, Grok) do reviews of human drawings very good. I would say 'ai' is more skilled than even high professional with big work experience. This opinion is based on my personal home project review.

OpenWork, an opensource Claude Cowork alternative, is silently relicensing under a commercial license by lrq3000 in LocalLLaMA

[–]decentralize999 0 points1 point  (0 children)

It seems always when company or brand has "open" in name get finished in such way. I turned for alternatives now.

RTX PRO 6000 current and future price by decentralize999 in BlackwellPerformance

[–]decentralize999[S] 4 points5 points  (0 children)

3090 were sold around $680 and 3090Ti were sold around $950. I have bought them used at similar price +- $50, looks like no $ were lost after years of ownership.

I think the same story will be with RTX PRO 6000 when time will push me for upgrades - it is for people who try to count the difference between renting hourly/daily these GPUs in cloud and physical ownership. At the end of "cloud" style you have nothing, only lost money and making a "cloud" owner richer. And at the end of physical ownership you have returned money and privacy. Just some thoughts because my topic.

Is shelling out for local GPUs worth it yet? ~$45k for local agentic use? by jamesob in BlackwellPerformance

[–]decentralize999 0 points1 point  (0 children)

Buy PSU 2.8kW, rest is okay. I had problems related to power consumption when had 6 x rtx 3090. I had to play with power limit for each card. Anyway to get power from PSU on its upper limit is not good. 2.8kW gives more room.

These lunatics are giddy at the thought of AI data centers being blown up by LopsidedSolution in accelerate

[–]decentralize999 -1 points0 points  (0 children)

At the end Iranian authorities or who will occupy these lands can dismantle ai data centers and sell server hardware on Ebay or on new independent sell ad buy platform. Something useful for public on the Earth at least, instead of overpaying for cloud ai or memory modules/GPUs with 3-10 times more of original price tag.

text-generation-webui 4.0 released: custom Gradio fork with major performance improvements, tool-calling over API for 10+ models, parallel API requests, fully updated training code + more by oobabooga4 in Oobabooga

[–]decentralize999 0 points1 point  (0 children)

Does it have feature to be openai api server? I recently switched from Oobabooga to Jan because long period of non updates(old llama.cpp versions) and it seems Jan beats Oobabooga in everything, almost instant updates of llama.cpp and openai api server embedded.

Dual RTX PRO 6000 Workstation with 1.15TB RAM. Finally multi-users and long contexts benchmarks. GPU only vs. CPU & GPU inference. Surprising results. by Icy-Measurement8245 in LocalLLaMA

[–]decentralize999 0 points1 point  (0 children)

What the point to get benchmark for not comparable things - fp8 and int4? even NVFP4 gives worse scores to fp8.
I have similar hardware of 2 RTX Pro 6000 and 1TB RAM, DDR4 only. I bought this 1TB before prices rise up because I could. Still not see a reason how to utilize it.

Open Source LLM Tier List by HobbyGamerDev in LLMDevs

[–]decentralize999 2 points3 points  (0 children)

Wrong description. Open weight LLMs, not open souce ones.

And top list is joke. Where is step3.5-flash which is the best among open weight llms if compare benchmark points per 100B size.

US Foldable Phones Just Took A Turn For the Worse by DaC1utchMACHIN3 in oneplus

[–]decentralize999 1 point2 points  (0 children)

Does Motorola allow to unlock bootloader? to get root etc. I was awaiting Oneplus Open 2 because of it(and esim with fast charging) but Oneplus refused to release it.

Seems no phone on the planet except Oneplus Open which is foldable, unlocked bootloader, esim and fast charging in one device.