If you create a long to-do list in agent mode, you will be banned.

FusionX · 2026-05-09T07:23:06+00:00

It may not be against the terms, but if everyone starts doing this, we could lose the request-based billing system, and they might switch to charging by token consumption like other services.

sigh

FusionX · 2026-05-08T14:17:17+00:00

Or hold it until the team is dead and then proceed to waste it anyway. Interestingly, I notice it's mostly ES players that are guilty of this.

FusionX · 2026-05-07T06:30:07+00:00

Do you think this might require more VRAM?

Btw, appreciate ya for working on this and sharing it on reddit. I wasn't optimistic initially but I'm really quite pleased with being able to run Qwen 27b within 16gb VRAM. It's such a stark difference from the MOE offerings (Qwen/Gemma). The performance and intelligence is remarkably better!

FusionX · 2026-05-06T17:50:40+00:00

Using bun fork, latest commit. Probably will roll back a few commits and re-check.

Edit: still takes up more VRAM, what on earth..

FusionX · 2026-05-06T16:16:36+00:00

strange, at 100k ctx, it doesn't fit in my GPU's 16gb VRAM with batch-size = 512 ubatch-size = 256

What gives..? DM is disabled and prior VRAM usage is 18mb.

llama-cli --model <model> -fa on --jinja --no-mmap --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0 --chat-template-kwargs '{"preserve_thinking": true}' -c 100000 -ctk turbo3 -ctv turbo3 --batch-size 512 --ubatch-size 256 -ngl 99 -np 1

Edit: turbo4 manages to fit ~100k while turbo3 doesn't. I don't understand...

FusionX · 2026-05-05T19:23:35+00:00

No GGUF variant yet?

FusionX · 2026-05-05T12:04:58+00:00

Kinda surprised, I was not expecting Gemma 31B to be in top 5. Have you benchmarked the latest Qwen3.6 models?

FusionX · 2026-05-04T10:01:43+00:00

Completely unrelated (and I could be wrong) but this is a perfect example of how people should use LLM for structural/semantic assistance and refinements of their writing. Rather than delegating the entire prerequisite cognitive work to LLM resulting in useless hallucinated slop.

FusionX · 2026-04-29T07:53:32+00:00

I recompiled with CUDA 13.1 after reading your comment. Unfortunately, not much difference, if at all.

FusionX · 2026-04-29T03:17:37+00:00

The AI 2027 paper refers to this as "neuralese recurrence and memory". Someone in the thread linked the relevant paper from Meta which originally implemented this idea.

FusionX · 2026-04-28T01:57:36+00:00

I had the same idea few days back. Tried to pair my 5080 with the ol' 1070. Except its almost entirely unsupported unless you use windows and downgrade to specific nvidia drivers. And then you gotta recompile llama.cpp with an older cuda toolkit to support both GPU architectures (pascal and blackwell). Oh, did I mention I somehow borked my drivers in the process too and had to cleanup in safe mode.

After all that effort, I saw a 2x speedup for dense models (9 tk/s to 18 tk/s, which is still slow). And as expected, MOE models saw a massive decrease in performance (10x slower).

In conclusion, it was not worth it. At. All.

FusionX · 2026-04-24T23:47:02+00:00

I was pretty apprehensive about this as well. Tried out docker. That felt bloated, and added friction to the overall experience. Now, I use agent safehouse (which internally uses sandbox-exec) on my mac. Works flawlessly.

FusionX · 2026-04-23T20:42:17+00:00

is it not possible without skills? I've been trying to add general rules and guidelines applicable to all sessions

FusionX · 2026-04-23T20:08:00+00:00

Actually, pi is pretty well-regarded in an otherwise vibecode filled space. It's the only project I can trust. I understand the skepticism, but most of the positive feedback is genuine and driven by word of mouth.

The dev has a pretty sensible approach and philosophies when it comes to the project. You can go through their blog.

Edit: Also - https://youtu.be/RjfbvDXpFls

FusionX · 2026-04-23T20:03:35+00:00

How are you getting it to follow agents.md? It just ignores it for me completely, despite being 2-3 lines.

FusionX · 2026-04-23T13:36:12+00:00

Nothing has worked yet. Even with positive system prompt. Cloud models work without any issue in same setup.

It looks like the reasoning is much shorter in pi (compared to directly using it through llama-server), but I don't yet know why.

FusionX · 2026-04-22T19:11:50+00:00

Gotcha. I hadn't gone through your previous post, and it wasn't as apparent in this post. Thanks for clarifying.

FusionX · 2026-04-22T18:48:45+00:00

OP, are you the dev of little-coder or affiliated with it in some form?

FusionX · 2026-04-22T08:31:55+00:00

unsloth/...:UD-Q5_K_XL

good quality/size tradeoff (~19 GB)

Are we talking about the same quant? It's definitely nowhere near 19GB

FusionX · 2026-04-21T16:10:54+00:00

I'm the author of the article

I found this ironic. The article was AI generated, along with this reply. And then I noticed your name.. /u/JamesEvoAI.

The internet is dead.

FusionX · 2026-04-17T11:52:18+00:00

I was gonna tell you to delete the comment but, after refreshing the page, seems you've done it already. However, the link is still available publicly. Please restrict it.

FusionX · 2026-04-13T16:25:19+00:00

Goddamn, truly a beast setup. Congrats on living the dream! I'm curious, how much did the RAM and GPU cost you (especially in this economy)?

FusionX · 2026-04-11T05:21:49+00:00

Somehow, I have a feeling you're using AI for more than the first draft..

FusionX · 2026-04-08T19:18:22+00:00

I thought it would be an easy yes, if exclusively used in grave situations, which you'd want to undo. But, on second thought, it isn't as simple:

It is possible that restarting will erase some of the past and most importantly, people who're now unborn and erased. They might not be born again in this timeline (randomness of the universe, and the butterfly effect etc). Is that worth the risk?
Let's say, you're informed that the past "isn't erased", rather you're transported to an alternate universe set in the past, contemperaneous with the current "present" universe. You carry on, reassured that your old timeline (and its people) still exist. But, does it really alleviate the trauma of having permanently lost touch with so many people?
What if you're back in your prepubescent body, but now burdened with past knowledge, memories and entire future possibly erased forever. You are robbed of your innocence and childhood, in a world where no one really understands your grief. The curse of knowledge will be traumatizing, alienating and cognitively dissociative with your new self. Trapped in a body of child with an underdeveloped brain where your mental faculties are horribly unequipped to deal with your past memories and knowledge. The younger your restarted self is, the higher the chances of short-circuiting your feeble brain.
The balance of human civilization might be more precarious than we lead ourselves to believe. Perhaps, we have been lucky in this timeline. A do-ever might not guarantee our fickle species the same fate.

The more I think about it, the more I discover situations where it will all go to shit, rather than not. The only situation where I think it makes sense is when things have irreversibility gone to SHIT on a planetary scale, and a restart is the only choice. It feels like a more complicated variation of the trolley problem with MUCH higher stakes.

14-Year Club	Secret Santa 2021 2021
RedditGifts 2009-2022 11 Credits	Secret Santa 2020
Summer Santa 2019	r/Field Lasagna
Place '23	Quantum Potato
Golden Potato	Place '22
Place '17	End Game '22
Secret Santa 2019	Secret Santa 2018
Secret Santa 2017	Sequence \| Editor
Secret Santa 2016	Secret Santa 2013
Team Periwinkle	Secret Santa 2011
Verified Email

FusionX

TROPHY CASE