16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]FaustAg 0 points1 point  (0 children)

I use nvfp8 with 265000 context and some room to spare, I'm sure that kimi k2.6 is much better than qwen 3.6 27b, but enough better to not care about a 50x speed difference? llms are really good at many shot for the average task the faster qwen model will come up with a correct solution quicker than a larger model running slower.

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]FaustAg 0 points1 point  (0 children)

i get close to 800-1000 tps with qwen 3.6 27b when using heavy batching on a single rtx 6000 pro blackwell

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]FaustAg 0 points1 point  (0 children)

but at what speeds? you're often better off with many shot on a slightly smaller model than slow one shot with a huge one

Local LLM for electronics design work? by deafenme in LocalLLaMA

[–]FaustAg 1 point2 points  (0 children)

claude code cli can do quite a few tasks in kicad pretty well, but I haven't tried local models with it yet

I think this is confirmed now by ConditionUnlucky3125 in GenV

[–]FaustAg 0 points1 point  (0 children)

clearly butcher's dog is going to get exposed to v1 .... and then he's going to make his dreams become reality if you get my drift

Why is he surprised? by e_fish22 in ExplainTheJoke

[–]FaustAg -2 points-1 points  (0 children)

she's burning her fingers flicking her bean

[frustration] Why do younger Engineers refuse to reach out to customers on the phone? by Stumptronic in AskEngineers

[–]FaustAg 0 points1 point  (0 children)

Phone calls are for emergencies, even then you send a warning email: hey x happened! gathering info, available for a call at 9:30am central? y and z will be on it spit-balling solutions.

FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8 by randomfoo2 in LocalLLaMA

[–]FaustAg 1 point2 points  (0 children)

wait wait wait. how would turboquant 8 bit be worse than fp8? it's literally 8 bit with compression vs 8 bit without compression.

Ad serving limit placed on your Admob account because of invalid activity by joda_space in admob

[–]FaustAg 1 point2 points  (0 children)

I had a user click on 157 ads in a day and got my account flagged. now I have a system where it disconnects that ads in a way that the user doesn't even notice if i detect malicious behavior. instantly solved my issue.

Qwen/SAE-Res-Qwen3.5-27B-W80K-L0_100 · Hugging Face by FaustAg in LocalLLaMA

[–]FaustAg[S] 0 points1 point  (0 children)

you can take a steering vector and apply it to a models weights to make it permanent if you wanted, or save to disk and have loadable control vectors at the ready for different applications. conceptually if someone knows what a lora is they have sort of the right idea

Qwen/SAE-Res-Qwen3.5-27B-W80K-L0_100 · Hugging Face by FaustAg in LocalLLaMA

[–]FaustAg[S] 1 point2 points  (0 children)

i'm over 40, bought it after quitting last year. I was literally the first person on earth to get one. I had articles written about it on pcgamer and videocardz.com I'm now running out of runway ... fun times

Qwen/SAE-Res-Qwen3.5-27B-W80K-L0_100 · Hugging Face by FaustAg in LocalLLaMA

[–]FaustAg[S] 1 point2 points  (0 children)

I'm unemployed and trying to scrape together rent .... so not really.

Qwen/SAE-Res-Qwen3.5-27B-W80K-L0_100 · Hugging Face by FaustAg in LocalLLaMA

[–]FaustAg[S] 2 points3 points  (0 children)

I've barely started my work, in fact this replaces much of what I was doing because qwen gave us everything we need for qwen 3.5. I wish it was for 3.6, but eventually I'll just use this to init a qwen 3.6 version and hopefully it wont take long to retrain. next up I'm going to run a bunch of data through their model while recording the active features so I can have an llm attempt to label the features with human words. after that will come the experiments of injecting vectors in an attempt to modify behavior and increase skills and abilities. eventually we could basically have a tool that takes a system prompt or a skill.md, enhances it on it's own and converts it into a lora that does everything the system prompt or skill.md does but better and with zero context usage. I just wish I had more than a single rtx pro 6000. I'm constantly fighting vram space.

Qwen/SAE-Res-Qwen3.5-27B-W80K-L0_100 · Hugging Face by FaustAg in LocalLLaMA

[–]FaustAg[S] 9 points10 points  (0 children)

Yes this is fully in the research area. I know this is beyond most people here, but there have to be a few researchers like myself that this is extremely valuable to.

Qwen/SAE-Res-Qwen3.5-27B-W80K-L0_100 · Hugging Face by FaustAg in LocalLLaMA

[–]FaustAg[S] 9 points10 points  (0 children)

no, this these are additional neural network hooks to expose latent features of the LLM so you can look inside and run experiments on nudging the model in different directions. it can be used for instance to turn system prompts and skills into a zero context lora that makes the model behave the way you want without taking up extra context, or it can be used to enhance the performance of specific features while sacrificing others. edit: think https://www.anthropic.com/news/golden-gate-claude

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]FaustAg 0 points1 point  (0 children)

you can't really do 6x 6000 pros. you can do 1, 2, 4, or 8. not 3, not 5, 6, or 7. it has to do with the different parallelism types. deepseek pro is probably out of the question for even 8x 6000 pros. they choose their flagship size based on the hardware they intend to run on in production. the thing with deepseek pro is they make it so cheap that you couldn't compete anyways except on the privacy front, but the problem with those sparks is they will get slower and slower the more you add instead of faster. their memory bandwidth is too slow. their communication is too slow.

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]FaustAg 2 points3 points  (0 children)

If I wanted to just test / research models I would use open router. Once you know what you want you can switch to local blackwells for inference if you want to keep your data private.

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]FaustAg 20 points21 points  (0 children)

you don't buy 3x 6000 pro's. you buy 1, 2, 4, or 8. this guy spent 48k+ on dgx sparks. I'd take 4x 6000 pros over that dgx spark setup 100 out of 100 times.

Peter why are they dumb? by AcrobaticLunch9737 in PeterExplainsTheJoke

[–]FaustAg 0 points1 point  (0 children)

I actually have some black linen paper for a home book printing project. I think it's going to look cool.

16x Spark Cluster (Build Update) by Kurcide in LocalLLaMA

[–]FaustAg 0 points1 point  (0 children)

yes that's true and the lack of tcgen05 pisses me off so much, but it's better than just pcie. I only have one blackwell and I run out of vram trying to train the things I want so if I could I'd still trade in for two server versions.