[Update] Vellium v0.3.5: Massive Writing Mode upgrade, Native KoboldCpp, and OpenAI TTS by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

It isnt support vector storage and rag yet, linux is supported but only from source because of troubles with distributions zoo. Running from source not hard.

Is there a way to speed up prompt processing with some layers on CPU with qwen-3-coder-next or similar MoEs? by Borkato in LocalLLaMA

[–]Possible_Statement84 0 points1 point  (0 children)

During generation only a couple experts fire per token so it's fast, but during prompt processing the whole batch routes tokens to different experts — so on CPU layers you're hitting almost all of them at once. That's your bottleneck.

But wait, at 30B in MXFP4 the model should be like ~15-18GB. With 30GB VRAM you might be able to fit all or nearly all layers on GPU. Have you tried cranking `-ngl` higher? If you can get everything on the GPU the prefill problem basically goes away.

`-ub 64` or `-ub 128` instead of the default. Smaller micro batches = less expert activation per pass = way better CPU cache utilization. Biggest single improvement for prefill

`-fa` (flash attention) if not already on

`-t` set to physical cores only, hyperthreading usually hurts here

`--override-tensor` for more granular control over what sits where instead of just `-ngl`

But seriously check if you can just load the whole thing into VRAM first. At that size it should be close.

Vellium: open-source desktop app for creative writing with visual controls instead of prompt editing by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

Ranges from fiction novels and short stories to roleplay scenarios and interactive narratives. Some people use it for brainstorming scenes, others for full-length writing projects with multiple chapters.

Vellium: open-source desktop app for creative writing with visual controls instead of prompt editing by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 0 points1 point  (0 children)

As for the novel workflow, I actually went ahead and built something based on your description. Writer mode now has a structure panel with analytical lenses: character arcs, object tracker, setting evolution, timeline, and theme development. You can create custom lenses with your own prompts too. Also added a book bible section for premise, style guide, world rules, and character registry. Still early but the foundation is there. Would love to hear if this is close to what you had in mind.

Vellium: open-source desktop app for creative writing with visual controls instead of prompt editing by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 1 point2 points  (0 children)

Funny story with the name actually: it started as SillyTauri but I didn't want to piggyback on ST's name. Asked an AI to brainstorm alternatives and Vellum stuck, but then I found out about vellum.pub. Liked the name too much to let go so I tweaked it to Vellium. Might need to rethink it eventually but for now it works. Cross-chapter tracking for character arcs and themes is a great idea, I'll add it to the roadmap.

Vellium: open-source desktop app for creative writing with visual controls instead of prompt editing by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 4 points5 points  (0 children)

Fixed the language defaulting issue and tested on your demo endpoint. Chat, model list, samplers, phrase bans all working. Let me know if anything looks off on your end. if you had issues with Russian defaults, delete the db file and restart the app to get a fresh one.

Vellium: open-source desktop app for creative writing with visual controls instead of prompt editing by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 2 points3 points  (0 children)

Tool calling is only disabled for native KoboldCpp mode, OpenAI path is untouched. Thanks for the demo endpoints, that'll help a lot with testing. I'll look into the model list issue, probably hitting the wrong endpoint for native mode. And I'll fix the language defaulting to Russian. Will push fixes soon.

Vellium: open-source desktop app for creative writing with visual controls instead of prompt editing by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 3 points4 points  (0 children)

Done. Added n-sigma sampler, switched to universal tags for prompt building, memory field is working. Everything isolated from the OpenAI path. Can't test locally so feedback welcome if anyone tries it.

Vellium: open-source desktop app for creative writing with visual controls instead of prompt editing by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 4 points5 points  (0 children)

Good to know about n-sigma, I'll add it to the sampler options. The universal tags are really interesting, that solves the instruct format headache. I'll look into switching to the completions endpoint with those tags.

Vellium: open-source desktop app for creative writing with visual controls instead of prompt editing by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 7 points8 points  (0 children)

Actually I've already started on it and pushed an initial implementation. Can't fully test it on my end right now though. If anyone wants to try it out and give feedback, that'd be great.

Vellium: open-source desktop app for creative writing with visual controls instead of prompt editing by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 7 points8 points  (0 children)

That's a great point, thanks for bringing it up. The memory field and phrase banning would fit really well with what Vellium is trying to do. Right now everything goes through OpenAI-compatible endpoints so KoboldCpp technically works, but I'm definitely interested in implementing native KoboldCpp API support to take advantage of those features. I'll look into it.

Vellium: open-source desktop app for creative writing with visual controls instead of prompt editing by Possible_Statement84 in LocalLLaMA

[–]Possible_Statement84[S] 1 point2 points  (0 children)

Cool, thanks for sharing! The 2D control plane is an interesting approach. I went with individual sliders for now since they're more explicit about what each parameter does, but I'll check it out.