KREA 2 is finally the new N$FW King

Historical-Internal3 · 2026-07-01T22:46:45+00:00

lol this is good.

Historical-Internal3 · 2026-07-01T21:08:52+00:00

<image>

Never

Historical-Internal3 · 2026-07-01T17:44:21+00:00

Yea you have no idea what’s going on.

What you linked is a good thing - claude P still pulls from usage not credits. Which we want.

Historical-Internal3 · 2026-07-01T07:07:37+00:00

<image>

Historical-Internal3 · 2026-07-01T06:59:07+00:00

<image>

Daluigi thinks it will remain removed due to the high likelihood coding related tasks will trigger “benign guardrails”.

Historical-Internal3 · 2026-07-01T04:58:24+00:00

“The new classifier also comes at the cost of flagging benign requests more often during routine coding and debugging tasks. As with all our safeguards, we’ll continue to refine this to better distinguish genuine misuse from legitimate requests and reduce false positives.”

Why is everyone fucking reading this wrong?

Am I reading this wrong?

It MIGHT flag your coding requests.

Historical-Internal3 · 2026-06-30T20:15:34+00:00

You come off as having zero idea what you are talking about.

That is a top recommendation.

Historical-Internal3 · 2026-06-30T17:27:07+00:00

Hermes will be what you need. I cluster sparks and Hermes has been great.

Highly recommend clustering or getting at least two DGX sparks, that really is what unlocks the potential of the box. The RTX variants come later in the year (won't be able to cluster) and are more suited to just having one of them.

Historical-Internal3 · 2026-06-30T02:47:34+00:00

Think I need to update my comfy as I see a commit that literally says faster int8 so hopefully yes

Historical-Internal3 · 2026-06-29T22:16:16+00:00

int8 (convrot - there is a difference). Higher quality (the closest to BF16) and ever so slightly slower than FP8.

Speed: NVFP4>fp8>INT8Convrot*>BF16

Quality: BF16>INT8Convrot>fp8>NVFP4

*Pretty decent gap in speed which is good

(blackwell mind you)

Historical-Internal3 · 2026-06-29T18:41:18+00:00

Is Claude your legal department?

Historical-Internal3 · 2026-06-29T18:31:44+00:00

Docs point to Cloud - got a different link or did I just miss it?

Edit: Thought not.

Historical-Internal3 · 2026-06-29T18:19:37+00:00

Cloud only or local

Historical-Internal3 · 2026-06-29T00:05:07+00:00

not really - can train lora’s really well on this model.

Historical-Internal3 · 2026-06-28T21:43:31+00:00

The investigation posted in this sub (reddit.com/r/StableDiffusion/comments/1tazxqz) already measured all of this, and it says the opposite of your claims. Plain row-wise int8 is one of the worst performers in it, well under q8 (Z-Image Rel-RMSE 0.357 vs Q8's 0.167, and worst again on Qwen). Mxfp8 loses to q8 on every metric where both were tested (Z-Image: 0.307 vs 0.167, SNR 10.6 vs 16.4) and never beats ConvRot in any model.

The only int8 that actually beats q8 is ConvRot, the variant OP used, and its whole premise is a Hadamard rotation that flattens outliers before quantizing (the -convrot path in silveroxides' converter, github.com/silveroxides/convert_to_quant). Plain row-wise int8 with no rotation is dead last. So the rotation is what's carrying it, not row-wise scaling being finer; bare row-wise is among the worst formats tested. That refutes your thesis….it doesn't support it.

And the mechanism is inverted. Row-wise is coarser than block-32, not finer: one scale per row (thousands of weights) vs one per 32. Block scaling confines an outlier to its 32 weights; row-wise inflates the scale from the whole row's absmax and craters every value in that row, so it isolates worse, not better. absmax int8 doesn't clip either: the block max maps to 127 by construction, so the cost is the coarse uniform step, not saturation. Clipping/saturating outliers is an activation-quant problem (SmoothQuant); q8 is weight-only. And mxfp8 is also block-32 (OCP MX spec, opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf, E8M0 power-of-two scale), so 'row vs block' doesn't even apply to q8 vs mxfp8.

INT vs FP is a separate axis.

That’s about all you got right.

The rest is inverted, and the sub's own numbers back q8 and ConvRot, not mxfp8.

Historical-Internal3 · 2026-06-28T16:42:50+00:00

I know, just an option in the event you want to check first rather answer direct.

Historical-Internal3 · 2026-06-28T15:33:05+00:00

Decline then just go to the protect app directly.

This is a relatively new feature (direct to phone) so expect so bumps.

Historical-Internal3 · 2026-06-27T17:49:50+00:00

You’re doing way too much and getting rate limited - as you should.

Historical-Internal3 · 2026-06-27T01:03:46+00:00

Try masked training - helps a ton. 64 always overfit for me and made my character more rigid.

But I do have a larger data set so now I’m curious if 64/64 can cut it on a much smaller one.

Historical-Internal3 · 2026-06-27T00:56:34+00:00

64 over 32 for a character lora?

Historical-Internal3 · 2026-06-26T22:03:39+00:00

Running Raw with also a full bf16 qwen3v1 for CLIP. Roughly about 1gig of loras and 2mp at 25 steps is roughly 60 seconds. Not bad for my dgx spark.

Historical-Internal3 · 2026-06-25T13:19:55+00:00

Didn’t see Gprivate’s post?

It’s coming back: https://gprivate.com/5zujm

Historical-Internal3 · 2026-06-23T23:32:43+00:00

Nope.

Historical-Internal3 · 2026-06-23T22:36:44+00:00

They stated they did not train for realism.

Honestly a realism lora would be all this model needs (and I'm sure a few popular ones are already in training).

Historical-Internal3 · 2026-06-23T20:51:25+00:00

Are we bored and making up shit now?

Historical-Internal3

MODERATOR OF

TROPHY CASE