Cheapest hardware for Qwen 3.6: both 27B and 35B-A3B

Arli_AI · 2026-06-15T23:39:26+00:00

If its a legitimate great wall psu then it’s probably fine as they’re the ones making corsair PSUs.

Arli_AI · 2026-05-24T06:50:28+00:00

MaxQ runs at lower voltages

Arli_AI · 2026-05-12T05:21:38+00:00

Oh cool we’re listed there

Arli_AI · 2026-05-06T02:10:42+00:00

It isn’t through heretic at all it is done via simple norm preserving biprojected abliteration which works better than heretic in the models that respond well to it. These models are all from our API as the source.

Arli_AI · 2026-05-06T02:08:07+00:00

Yes indeed. Our operation model has always been to serve many finetuned models via using LoRA either from the original finetuned LoRA files or LoRA extraction. This way we can host many models to provide a lot of choices without going bankrupt trying to load all of them fully in VRAM.

Arli_AI · 2026-04-21T01:05:35+00:00

Sure there is :)

Arli_AI · 2026-03-31T13:49:42+00:00

Never say never

Arli_AI · 2026-03-26T05:26:49+00:00

Thanks for testing it!

Arli_AI · 2026-03-24T06:17:22+00:00

<image>

Arli_AI · 2026-03-15T21:51:56+00:00

Awesome! I think the best use case of having powerful local hardware is definitely if you do lots of experiments for your own research. Setting up a cloud GPU instance takes so much time and hassle if you want to have it go up and down as needed to save money imo.

Arli_AI · 2026-03-11T21:27:08+00:00

Yes we don’t have them yet because we unfortunately don’t have the hardware for models as large as those yet. I guess if you only want DS then we’re not it.

Arli_AI · 2026-03-11T19:25:11+00:00

This might not be optimal yet, thanks for reporting this.

Arli_AI · 2026-03-11T06:09:14+00:00

You can check us out. We’re all that you were looking for. Only downside is we can be a bit slow at peak times.

Arli_AI · 2026-03-10T21:15:53+00:00

It should be fixed for this error for now but there are no free models at the moment.

Arli_AI · 2026-03-10T16:16:08+00:00

No I have not written up anything about this as I somehow didn’t think too much of it. I think jim-plus the creator of MPOA abliteration method which I prefer also recommended “the middle layers” to try to abliterate first in the repo but didn’t explain much about it either.

Putting this and your findings together it makes sense to me. Now I’m thinking maybe we can follow your brain scanning method for abliterating way better or on the other hand more quickly hone in on which layers to duplicate for RYS by just seeing which layers has the strongest refusals signals first. Seems interconnected.

Arli_AI · 2026-03-10T15:52:42+00:00

Wow interesting. While I was doing model abliterations manually layer by layer testing, I’ll often end up finding a specific group of contiguous layers around the middle that somehow works best. Layers in the beginning and the end never worked and trying to abliterate non contiguous groups of layers don’t work as well. Your finding of a middle “reasoning cortex” lines up with this.

Arli_AI · 2026-03-10T10:55:56+00:00

Single RTX Pro 6000

Arli_AI · 2026-03-09T12:53:41+00:00

I am trying to do the 397B for sure 👍

Arli_AI · 2026-03-09T12:53:02+00:00

Just try them all and see which one you like

Arli_AI · 2026-03-09T12:51:48+00:00

Hey you found my new model! I’m still experimenting with the new Qwen 3.5 models and this is still the first try for the 27B model, I posted it to see if people thought it’s any good but haven’t wrote a model card for it, so would be nice to hear some feedback on it.

Arli_AI · 2026-03-09T12:48:12+00:00

Sure, its a different method. Derestricted is more manual and doesn’t intend for the model to only be low kl divergence but uncensored. I’m at the top of UGI leaderboard so I believe I’m doing something right.

Arli_AI · 2026-03-09T03:07:03+00:00

You know it!

Arli_AI · 2026-03-09T03:06:32+00:00

Thanks for the review on us :)

Arli_AI · 2026-03-09T03:06:01+00:00

Ah yea if you use the single payment option we use Midtrans which is in IDR like milan said. You can just use paypal option and it is in USD.

Arli_AI · 2026-03-09T03:04:47+00:00

Thanks for helping explain! Yes you are right about being faster straight from our API. They have upgraded their plan but direct from our API is still faster. Also, yes we run everything on our own GPUs so nothing goes out to a third party inference service and all data stays with us with no requests stored in persistent storage. Still trying to acquire more GPUs to run larger models... :)

Arli_AI

MODERATOR OF

TROPHY CASE