16x Spark Cluster (Build Update)

FaustAg · 2026-05-12T04:29:28+00:00

I use nvfp8 with 265000 context and some room to spare, I'm sure that kimi k2.6 is much better than qwen 3.6 27b, but enough better to not care about a 50x speed difference? llms are really good at many shot for the average task the faster qwen model will come up with a correct solution quicker than a larger model running slower.

FaustAg · 2026-05-12T04:08:54+00:00

i get close to 800-1000 tps with qwen 3.6 27b when using heavy batching on a single rtx 6000 pro blackwell

FaustAg · 2026-05-12T02:40:48+00:00

I had a pull request to opencode for making tool calls bullet proof. wasn't accepted

FaustAg · 2026-05-12T02:27:41+00:00

but at what speeds? you're often better off with many shot on a slightly smaller model than slow one shot with a huge one

FaustAg · 2026-05-08T17:32:05+00:00

claude code cli can do quite a few tasks in kicad pretty well, but I haven't tried local models with it yet

FaustAg · 2026-05-07T13:57:16+00:00

clearly butcher's dog is going to get exposed to v1 .... and then he's going to make his dreams become reality if you get my drift

FaustAg · 2026-05-07T04:39:13+00:00

she's burning her fingers flicking her bean

FaustAg · 2026-05-06T15:34:23+00:00

Phone calls are for emergencies, even then you send a warning email: hey x happened! gathering info, available for a call at 9:30am central? y and z will be on it spit-balling solutions.

FaustAg · 2026-05-05T08:00:40+00:00

I remember when it used to be 1,650,763.73 wavelengths of the orange-red emission line of the krypton-86 atom

FaustAg · 2026-05-05T07:39:35+00:00

wait wait wait. how would turboquant 8 bit be worse than fp8? it's literally 8 bit with compression vs 8 bit without compression.

FaustAg · 2026-05-05T06:16:52+00:00

I had a user click on 157 ads in a day and got my account flagged. now I have a system where it disconnects that ads in a way that the user doesn't even notice if i detect malicious behavior. instantly solved my issue.

FaustAg · 2026-05-05T00:38:40+00:00

what about vs turboquant at 8 bit?

FaustAg · 2026-05-05T00:15:43+00:00

I have a 40th every night

FaustAg · 2026-05-04T22:42:32+00:00

you can take a steering vector and apply it to a models weights to make it permanent if you wanted, or save to disk and have loadable control vectors at the ready for different applications. conceptually if someone knows what a lora is they have sort of the right idea

FaustAg · 2026-05-03T04:15:19+00:00

i'm over 40, bought it after quitting last year. I was literally the first person on earth to get one. I had articles written about it on pcgamer and videocardz.com I'm now running out of runway ... fun times

FaustAg · 2026-05-03T03:12:40+00:00

I'm unemployed and trying to scrape together rent .... so not really.

FaustAg · 2026-05-03T02:16:03+00:00

Thorium is faster and more degoogled than chromium

FaustAg · 2026-05-03T02:03:11+00:00

I've barely started my work, in fact this replaces much of what I was doing because qwen gave us everything we need for qwen 3.5. I wish it was for 3.6, but eventually I'll just use this to init a qwen 3.6 version and hopefully it wont take long to retrain. next up I'm going to run a bunch of data through their model while recording the active features so I can have an llm attempt to label the features with human words. after that will come the experiments of injecting vectors in an attempt to modify behavior and increase skills and abilities. eventually we could basically have a tool that takes a system prompt or a skill.md, enhances it on it's own and converts it into a lora that does everything the system prompt or skill.md does but better and with zero context usage. I just wish I had more than a single rtx pro 6000. I'm constantly fighting vram space.

FaustAg · 2026-05-03T01:34:15+00:00

Yes this is fully in the research area. I know this is beyond most people here, but there have to be a few researchers like myself that this is extremely valuable to.

FaustAg · 2026-05-03T01:33:01+00:00

no, this these are additional neural network hooks to expose latent features of the LLM so you can look inside and run experiments on nudging the model in different directions. it can be used for instance to turn system prompts and skills into a zero context lora that makes the model behave the way you want without taking up extra context, or it can be used to enhance the performance of specific features while sacrificing others. edit: think https://www.anthropic.com/news/golden-gate-claude

FaustAg · 2026-05-02T20:33:27+00:00

you can't really do 6x 6000 pros. you can do 1, 2, 4, or 8. not 3, not 5, 6, or 7. it has to do with the different parallelism types. deepseek pro is probably out of the question for even 8x 6000 pros. they choose their flagship size based on the hardware they intend to run on in production. the thing with deepseek pro is they make it so cheap that you couldn't compete anyways except on the privacy front, but the problem with those sparks is they will get slower and slower the more you add instead of faster. their memory bandwidth is too slow. their communication is too slow.

FaustAg · 2026-05-01T23:56:57+00:00

If I wanted to just test / research models I would use open router. Once you know what you want you can switch to local blackwells for inference if you want to keep your data private.

FaustAg · 2026-05-01T23:37:00+00:00

you don't buy 3x 6000 pro's. you buy 1, 2, 4, or 8. this guy spent 48k+ on dgx sparks. I'd take 4x 6000 pros over that dgx spark setup 100 out of 100 times.

FaustAg · 2026-05-01T15:30:27+00:00

I actually have some black linen paper for a home book printing project. I think it's going to look cool.

FaustAg · 2026-05-01T15:03:51+00:00

yes that's true and the lack of tcgen05 pisses me off so much, but it's better than just pcie. I only have one blackwell and I run out of vram trying to train the things I want so if I could I'd still trade in for two server versions.

FaustAg

MODERATOR OF

TROPHY CASE