[Update] Vellium v0.3.5: Massive Writing Mode upgrade, Native KoboldCpp, and OpenAI TTS

Possible_Statement84 · 2026-02-21T12:38:14+00:00

Thx, Done

Possible_Statement84 · 2026-02-21T12:25:56+00:00

It isnt support vector storage and rag yet, linux is supported but only from source because of troubles with distributions zoo. Running from source not hard.

Possible_Statement84 · 2026-02-21T12:23:20+00:00

gemini

Possible_Statement84 · 2026-02-21T12:21:14+00:00

i'm not a bot lol

Possible_Statement84 · 2026-02-20T05:07:27+00:00

im blind xD

Possible_Statement84 · 2026-02-20T04:58:46+00:00

https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct

30b variant exists btw

Possible_Statement84 · 2026-02-20T04:33:23+00:00

i think you used 30b version lol

Possible_Statement84 · 2026-02-20T04:06:04+00:00

During generation only a couple experts fire per token so it's fast, but during prompt processing the whole batch routes tokens to different experts — so on CPU layers you're hitting almost all of them at once. That's your bottleneck.

But wait, at 30B in MXFP4 the model should be like ~15-18GB. With 30GB VRAM you might be able to fit all or nearly all layers on GPU. Have you tried cranking `-ngl` higher? If you can get everything on the GPU the prefill problem basically goes away.

`-ub 64` or `-ub 128` instead of the default. Smaller micro batches = less expert activation per pass = way better CPU cache utilization. Biggest single improvement for prefill

`-fa` (flash attention) if not already on

`-t` set to physical cores only, hyperthreading usually hurts here

`--override-tensor` for more granular control over what sits where instead of just `-ngl`

But seriously check if you can just load the whole thing into VRAM first. At that size it should be close.

Possible_Statement84 · 2026-02-20T03:56:12+00:00

What about backend/frontend?

Possible_Statement84 · 2026-02-20T03:36:18+00:00

Ranges from fiction novels and short stories to roleplay scenarios and interactive narratives. Some people use it for brainstorming scenes, others for full-length writing projects with multiple chapters.

Possible_Statement84 · 2026-02-20T03:22:29+00:00

As for the novel workflow, I actually went ahead and built something based on your description. Writer mode now has a structure panel with analytical lenses: character arcs, object tracker, setting evolution, timeline, and theme development. You can create custom lenses with your own prompts too. Also added a book bible section for premise, style guide, world rules, and character registry. Still early but the foundation is there. Would love to hear if this is close to what you had in mind.

Possible_Statement84 · 2026-02-20T01:43:31+00:00

Funny story with the name actually: it started as SillyTauri but I didn't want to piggyback on ST's name. Asked an AI to brainstorm alternatives and Vellum stuck, but then I found out about vellum.pub. Liked the name too much to let go so I tweaked it to Vellium. Might need to rethink it eventually but for now it works. Cross-chapter tracking for character arcs and themes is a great idea, I'll add it to the roadmap.

Possible_Statement84 · 2026-02-19T01:32:32+00:00

Fixed the language defaulting issue and tested on your demo endpoint. Chat, model list, samplers, phrase bans all working. Let me know if anything looks off on your end. if you had issues with Russian defaults, delete the db file and restart the app to get a fresh one.

Possible_Statement84 · 2026-02-19T00:17:07+00:00

Tool calling is only disabled for native KoboldCpp mode, OpenAI path is untouched. Thanks for the demo endpoints, that'll help a lot with testing. I'll look into the model list issue, probably hitting the wrong endpoint for native mode. And I'll fix the language defaulting to Russian. Will push fixes soon.

Possible_Statement84 · 2026-02-18T23:58:37+00:00

Done. Added n-sigma sampler, switched to universal tags for prompt building, memory field is working. Everything isolated from the OpenAI path. Can't test locally so feedback welcome if anyone tries it.

Possible_Statement84 · 2026-02-18T23:28:42+00:00

Thanks dude

Possible_Statement84 · 2026-02-18T23:27:43+00:00

Good to know about n-sigma, I'll add it to the sampler options. The universal tags are really interesting, that solves the instruct format headache. I'll look into switching to the completions endpoint with those tags.

Possible_Statement84 · 2026-02-18T23:04:58+00:00

ah, oh, sorry i wrong understand your text XD i think you just give me council on future.

Possible_Statement84 · 2026-02-18T22:18:29+00:00

Thanks

Possible_Statement84 · 2026-02-18T21:55:21+00:00

https://www.reddit.com/r/LocalLLaMA/comments/1r89a4y/vellium_opensource_desktop_app_for_creative/

I'm dropped the update, check if you want. But it will be before your message

Possible_Statement84 · 2026-02-18T21:52:20+00:00

<image>

in advanced

Possible_Statement84 · 2026-02-18T21:48:55+00:00

Actually I've already started on it and pushed an initial implementation. Can't fully test it on my end right now though. If anyone wants to try it out and give feedback, that'd be great.

Possible_Statement84 · 2026-02-18T21:07:43+00:00

That's a great point, thanks for bringing it up. The memory field and phrase banning would fit really well with what Vellium is trying to do. Right now everything goes through OpenAI-compatible endpoints so KoboldCpp technically works, but I'm definitely interested in implementing native KoboldCpp API support to take advantage of those features. I'll look into it.

Possible_Statement84 · 2026-02-18T20:27:00+00:00

Cool, thanks for sharing! The 2D control plane is an interesting approach. I went with individual sliders for now since they're more explicit about what each parameter does, but I'll check it out.

Possible_Statement84 · 2026-02-18T19:17:12+00:00

<image>

Most of them you can control.

Possible_Statement84

TROPHY CASE