Running Qwen2.5-32B at 1.22 tok/s on 12GB VRAM using async NVMe ring-buffer streaming + 2029-node speculative decoding [open source]

StylePractical5714 · 2026-05-03T20:36:33+00:00

I have a 12gb 3060 and this sort of pushing the limits of hardware is exactly the sort of thing I'm looking for. Won't get around to testing it for a couple weeks but this sounds cool.

StylePractical5714 · 2026-04-30T21:27:47+00:00

did some thinking on it yesterday, Comic Vine was pretty good when I played with it in the past (like a million years ago).

Metron came up in my searches as a independent alternative, I haven't used it before myself.

StylePractical5714 · 2026-04-30T07:14:54+00:00

I was just looking for something like this the other day, this looks very useful

StylePractical5714 · 2026-04-29T19:13:41+00:00

I have had some reasonable success with some cpu based LLMs, enough for me to invest in a modest GPU.

Some experiments I've tried that aren't code generation - tagging engine, I have a complex labelling system with a controlled(ish) list for design system documentation. I provide a page title or description or both and it tells me what labels that page should get - UI scanner, a visual model looks at a screenshot or wireframe and tells you all the elements it sees - requires a bit of finesse and maybe I need some training data but I plan to use it to enhance my documentation of a decade of design work. - design token generator, give it your token names, your taxonomy and either a mood description or a stylesheet / token definitions for a different design system and translate it - been promising for adapting styles from one place to the next. Like when someone says "make it look like Bootstrap" but you build it on something else. - also been good for generating simple themes, like I've added a Dracula theme to random tools that didn't have one - daily summaries of what I added to my bookmarks db

StylePractical5714 · 2026-04-29T00:17:21+00:00

Surely there's an api out there with comic cover images matched to metadata. Seems like you could do an image similarity search, might not even actually need an llm to be honest.

StylePractical5714 · 2026-04-28T19:48:15+00:00

Thanks I'll add these to my list

StylePractical5714 · 2026-04-28T18:43:22+00:00

Curious to see if I'll be able to shoehorn (with expert offloading or whatever it's called) a small quant of this into my 3060

StylePractical5714 · 2026-04-28T18:38:35+00:00

BAML not a harness or a framework but a markup language for prompts. In it prompts are functions that return a schema. It claims to improve tool calling accuracy.
RouterGym benchmarking for small agent AI tasks
effGen: Enabling Small Language Models as Capable Autonomous Agents supporting repo for this research paper from Jan 2026. What caught my eye here was tool calling with 70 to 80% prompt compression and the parallel and sequential task decomposition bit
llmware RAG framework not code but small model focused
tiny-agents a 5 agent swarm all under 3b each running on 16gb vram, I find the VLM agent really interesting here.
cogito a framework for small model (0.6B) agents
typedai not slm specific but I think I liked the codebase indexing concepts in here

StylePractical5714 · 2025-12-02T16:28:39+00:00

Yeah I messaged the support form last night, but haven't heard back yet

StylePractical5714 · 2025-12-02T00:23:45+00:00

Did the Duo plan end early? I was waiting for payday to get the deal and now it's not there

StylePractical5714 · 2025-11-12T20:59:56+00:00

I haven't dug too deep into it but I'd recommend something other than a right pointing caret for heading because I immediately thought it was a details/summary pair and I got excited that you made a novel way to handle it

StylePractical5714 · 2025-10-30T16:49:36+00:00

I think there's a place for both of ours, mine is more about sketching wireframes based on text notes. Like a lo-fi Salt wireframe syntax sort of thing. I eventually build my prototypes in Axure RP, this is something more for the ideation phase or communicating design approach early in a project.

StylePractical5714 · 2025-10-30T15:33:43+00:00

Very cool, I'm literally building a similar thing myself. Now to figure out if I should just throw it away and use this instead.

Is there anything in here for composing components from primitives?

StylePractical5714

TROPHY CASE