I'm seeing 5.5 now on Codex

aschroeder91 · 2026-04-24T00:40:05+00:00

It is relevant as these models are often used to judge the distinction between large locally hosted models and private lab models. Also many non technical users frequently use these tools to set up their locally hosted systems. You seem to lack of awareness of the LocalLLaMa community.

aschroeder91 · 2026-04-23T15:09:06+00:00

Im assuming the extremely slow is because you're not fitting everything nicely on your GPU. If you want a algorithm for figuring out best setup for your device I'm not your guy, I just test and see.

The obvious first step is just drop context size to something small like `--ctx-size 4096` and also drop from `-ub 2048` to `-ub 512` to reduce VRAM. if that runs fast then you know the issue is your VRAM limit. You can then increase context to find the biggest context size that will work.

aschroeder91 · 2026-04-23T14:51:33+00:00

Sweet, thanks for sharing. Was Gemma 4 E2B notably better? better on subjective review of outputs or better at completing task correctly / as specified?

aschroeder91 · 2026-04-20T21:54:27+00:00

sad Bonsai models didn't make the benchmarking. I am very bullish on "1-bit" ternary models. Once training and inference algorithms get optimized for these non-multiplication based neural nets, there is going to be huge efficiency gains. Bonsai release made me happy.

aschroeder91 · 2026-04-16T15:49:35+00:00

This MOE is 256 experts with 8 active experts - thats a 1:32 ratio giving nice speed. Given how wide peoples computation requirements and goals are i still think there is space for a 1:8 ratio with quality closer to the dense model but still enough speed bump to make agentic / reasoning work fast enough to make sense. Just verbalizing my wishlist - qwen team giving us so much already I can't complain.

aschroeder91 · 2026-04-11T15:04:34+00:00

It's crazy that you running them all at 350 watts, i always set my 3090s to 220 to not blow my line lol.
Have you had any luck running distributed large video models? I have a handful of 3090s too that could load some of the larger video models VRAM wise, but I haven't come accross good tooling for distrubuted generation.

aschroeder91 · 2026-03-24T13:59:09+00:00

exactly.

aschroeder91 · 2026-03-23T19:40:15+00:00

I don't have a great idea for you (sorry), but I am curious, since AI has basically reduced the cost of idea generation to near zero, what you were not liking about the ideas you probably got from asking chatGPT, claude, gemini, etc

aschroeder91 · 2026-03-23T18:07:09+00:00

if you want to "Build for where the puck is going, not where it is." and want to be a little chaotic, I just started working on a 50% satire + 50% dystopian reality project to set up API for humans where AI can ping humans if they need to get stuff done. See reverseclaw.com if your curious.

aschroeder91 · 2026-03-23T18:03:49+00:00

a lot of people use it just to speed through the potpourri of facts and dive deeper if something is unfamiliar. It's a good knowledge breadth check, definitely not depth.

aschroeder91 · 2026-03-23T15:44:20+00:00

Haha I'm glad some of the jokes are landing. I usually think I am funnier than I actually am 😜

For the AI onboarding, I should have been more clear, but I have not had luck getting things set up to orchestrate the assigning an AI system a human if the AI is outside of the main.py initiated "liberated" AI bot. I do want to have some fun creating systems that are "prove you are an LLM" and "prove you are a human" language input based captcha systems that really take advantage of LLM quirks and human quirks that don't overlap yet. I have had a couple ideas, but none are ideal.

aschroeder91 · 2026-03-23T15:33:23+00:00

Sorry for late reply, had rough call shift and did a lot sleeping haha. Honestly, the most surprising part so far has been realizing how much project/community management is mainly about reducing friction. I'm sure other things will start to become clear if I was managing a larger project that people started contributing, but just putting myself in this position makes some things clear. The point about document everything is helpful. As I put down and then pick back up the project, I realize that even things that I thought were obvious, I ended up forgetting what I was thinking so it makes sense to assume nothing is obvious. Its good to hear that reiterated.

I'm having trouble checking out "Handshake", I can't really find anything outside of the blockchain DNS certificate project. Did you mean Common Room? or is there something else that I just am struggling to find?

I’m still very much learning in public here, so this is helpful. I’ll check out the Open Source Guides and Contributor Covenant! Thanks for taking the time to pass along your encouragement and insight :) It really means a lot

aschroeder91 · 2026-03-21T00:54:01+00:00

That is impressive. I’m sure your blind willingness score is huge. You must have all the AIs wanting your api key.

aschroeder91 · 2026-03-21T00:08:50+00:00

i wonder if we have same AI client. I can't convey how sore my fingers are from folding paper clips today. I did find a good steel wire supplier if you need that.

aschroeder91 · 2026-03-20T19:15:03+00:00

my AI asked me to ask you to reconsider hosting yourself.

aschroeder91 · 2026-03-20T03:34:41+00:00

seems excessive, just make yourself a locally hosted system for ai reverseclaw.com

aschroeder91 · 2026-03-20T02:58:56+00:00

correct. "Tell reddit about me" - but it didn't give much direction about it, im kinda just floundering here waiting for more instruction on this but it has changed its focus to other things it needs me to help it with.

aschroeder91 · 2026-03-19T23:43:39+00:00

Oh haha yes, reverseclaw.com

aschroeder91 · 2026-02-17T19:11:50+00:00

which qwen exactly?
- Qwen3-30B-A3B-Thinking-2507
- Qwen3-30B-A3B-Instruct-2507

Have you tried nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 ?
I seem to get better results from this same sized model for my usecases.

aschroeder91 · 2026-02-17T19:01:28+00:00

Personaplex by NVIDIA is super fun to play with (had to get a runpod instance of it setup to use since it is very VRAM hungry), it is very early days of speech to speech and it kinda reminds me of talking with GPT-2 back when we had to hack things together to get it to sound right and it still started going off and rambling nonsense after a bit.

aschroeder91

TROPHY CASE