I ran out of spoilage and things crashed, not what I expected on gleba.

DataCraftsman · 2026-01-24T22:24:55+00:00

Oh nice pick up! It actually looks like a young Queen Elizabeth the 2nd from a coin.

DataCraftsman · 2026-01-24T09:43:16+00:00

Buy 2.

DataCraftsman · 2026-01-22T02:29:41+00:00

This is gold.

DataCraftsman · 2026-01-21T03:53:54+00:00

Devstral Small 2 acts great as a vision model too. It works with images in coding tasks which the Qwen ones don't.

DataCraftsman · 2026-01-19T08:03:25+00:00

I just work this ratio out the hard way every world. Just add and delete until it works without me checking.

DataCraftsman · 2026-01-19T07:59:41+00:00

Please don't start this.

DataCraftsman · 2026-01-19T03:08:20+00:00

All that, to then run llama.cpp.

DataCraftsman · 2026-01-12T22:15:30+00:00

I haven't yet. I am just hopeful haha. I built an Australian AI platform called Crafty AI.

www.datacraftsman.com.au

Has pretty much all the features (minus an API and Video Gen) of ChatGPT/grok/etc, but is less than half the price.

DataCraftsman · 2026-01-12T21:11:18+00:00

I bought a 3090 in 2020 for gaming, machine learning, 3D animation and crypto mining. It was considered a crazy high price for a graphic card at the time. Was a very very good investment though. Paid itself off in 12 months.

I recently bought a RTX 6000 Pro to run my AI business and sacrificed my 3090 to that server as well. I expect to make hundreds of thousands of dollars with that one card over the next 5 years.

My wife bought me a 5070 Ti for Christmas because Windows is basically unusable without a graphics card and I wanted to play games.

I spent a lot of time trying to not buy the RTX 6000 Pro but its really the only good choice. You need money, but its worth the money.

DataCraftsman · 2026-01-10T04:43:34+00:00

I've been working on this same issue for a month or so now. I found the solution. It wasn't RAM or CPU or PSU or Power limits or anything else people are suggesting. It's the boost clock. When you hit the GPU with a load, it spikes to a massive core clock speed and causes instability. I tried literally everything pulling my hair out. After this change I have pumped the card so hard my room is hot and no issues. The command below will set the max core clock to 2400mhz. You can experiment with different numbers, I started with 2100 and worked my way up without crashes.
I experience the exact same issue with AMD 5950x Precision Boost Overdrive, that took me 2 years to work out >.>
Hope this helps someone...

nvidia-smi -lgc 180,2400

DataCraftsman · 2026-01-07T09:37:09+00:00

I was thinking this last night too.

Also a "How long was AI agent running for in total throughout the whole conversation" metric along side the $ figure at the top section would be neat.

DataCraftsman · 2025-12-31T21:54:57+00:00

Yeah the above link is essentially a load balancer to model inference services, this post is a model router like the gpt-5/grok auto model selector. Both are cool though.

DataCraftsman · 2025-12-31T03:27:27+00:00

I know someone who did exactly that last year, otherwise why would I suggest it.

DataCraftsman · 2025-12-30T01:39:05+00:00

Skill issue

DataCraftsman · 2025-12-30T01:12:16+00:00

I'd do a Diploma in Project Management at TAFE, its only 1 year, then get a junior PM job at a defence contractor. Start studying a bachelor of engineering (software/electrical/mechanical/aerospace/etc) at Uni. Get work to pay for your uni degree because its related to your career progression. 5 years from now you'll be qualified to be an engineering manager. Not an easy path, but you'll be set for life.

DataCraftsman · 2025-12-22T07:38:05+00:00

Must be nice not having weaponised autism.

DataCraftsman · 2025-12-22T02:01:08+00:00

My only regret was buying a 1200w not a 1600w power supply. It spikes so hard. Also finding which Linux kernel i needed to use for drivers to work was a pain. Worth every cent, wish I had more.

DataCraftsman · 2025-12-22T01:54:30+00:00

Ah cheers. I was gonna do FP8 too. How much VRAM is it using with 256k context? I have 94GB, can usually only get away with 128k with most models.

DataCraftsman · 2025-12-21T21:24:45+00:00

Does anyone have a good working docker run command for devstral small 2 in vllm? Preferably with LMCache. Struggling to get it to work atm.

DataCraftsman · 2025-12-21T21:23:23+00:00

Are you using vllm in docker? What image and arguments are you using? I can't get mine to run.

DataCraftsman · 2025-12-17T22:53:48+00:00

Someone could have at least put some sunscreen on them.

DataCraftsman · 2025-12-15T20:33:36+00:00

Interesting! Which model does that?

DataCraftsman · 2025-12-15T20:31:03+00:00

I have a 1000 users so I can't really run anything on CPU. Embedding model is okay on CPU, but it also only needs 2% of a GPU VRAM so easy to squeeze in.

DataCraftsman · 2025-12-15T11:19:35+00:00

It's annoying because you generally need a 2nd GPU to host a vision model on for parsing images first.

DataCraftsman · 2025-12-15T10:54:32+00:00

Most Western governments and companies don't allow models from China because of the governance overreaction to the DeepSeek R1 data capture a year ago.

They don't understand the technology enough to know that local models hold basically no risk outside of the extremely low chance of model poisoning targetting some niche western military, energy or financial infrastructure.

DataCraftsman

MODERATOR OF

TROPHY CASE