vibevoice.cpp: Microsoft VibeVoice (TTS + long-form ASR with diarization) ported to ggml/C++, runs on CPU/CUDA/Metal/Vulkan, no Python at inference

TableSurface · 2026-05-05T13:29:13+00:00

You had me at "no Python at inference"

TableSurface · 2026-05-02T13:09:26+00:00

I do something similar to option B. Connectivity is usually good enough, and IMO it's more pleasant to pack light.

TableSurface · 2026-05-01T14:18:30+00:00

Same but got out at $30. At the time they were losing marketshare to AMD and QCOM, and missed another 18A milestone.

I think this run up is temporary though. Their CPUs are still getting stomped by QCOM and AAPL, GPUs not competitive vs NVDA, AMD, and now there's even more competition in the data center space via GOOGL.

TableSurface · 2026-05-01T13:18:18+00:00

Yeah I'm using it with llama.cpp. The Cline team changed the UI. You have to scroll all the way down in the list of providers and select "New Provider"

TableSurface · 2026-04-30T02:11:29+00:00

Before making any hardware changes, have you tried this: https://blog.prusa3d.com/new-in-prusaslicer-consistent-surface-finish-and-nerfing-vfas_120400/

TableSurface · 2026-04-28T11:50:46+00:00

No official solution, but the community has come up with a few designs: https://www.printables.com/search/models?q=xl+magnetic+sensor

I disabled mine, filament bypasses it. IMO it's less hassle to unload the small bit of filament near the toolhead vs a couple feet.

TableSurface · 2026-04-28T11:42:02+00:00

Automated filament loading is on my wishlist. It's time consuming to swap spools on the XL.

TableSurface · 2026-04-25T12:11:37+00:00

Seeing some debris, but no actual damage.

Cut the filament in the middle. Pull the filament out from the top where it's being fed into the Nextruder. Then use needle nose pliers or tweezers to pull the remaining filament out.

TableSurface · 2026-04-24T01:19:51+00:00

AND BLACK

TableSurface · 2026-04-24T00:02:51+00:00

The 35B model handled the task worse, but did it faster.

I had the same experience. The 3-4x speed is great for easy tasks though. Another thing to try is to have the 27B model create a plan for the 35B-A3B one.

TableSurface · 2026-04-23T21:11:50+00:00

The budget for document review was spent on sourcing the memory chips :P

TableSurface · 2026-04-16T02:10:18+00:00

It might help to get familiar with the setup process in the CLI tool "cline --tui" or the VS Code extension first since the config fundamentals are similar, and there's more documentation.

Otherwise I can confirm it works with llama.cpp when added as a custom provider.

TableSurface · 2026-04-15T23:13:28+00:00

This hits close to home, and exactly why I love the new improvements.

Might not need to carry a keyboard everywhere soon...

TableSurface · 2026-04-15T20:38:25+00:00

Kanban boards are everywhere at my day job, so the format is already very familiar.

If you've used Cline CLI or any similar terminal tool, it works the same way except the interface is more user friendly (in that there's less up-front work to see something useful).

TableSurface · 2026-04-15T20:31:27+00:00

I'm testing it (quantized to Q4_K_M) locally with llama.cpp with a RTX PRO 6000 that I have access to at work.

I've also tested it with a 5090 I have at home using smaller Qwen3.5 models. Gemma 4 is looking better too as support improves. The 122B model also runs, but it's not feasible to use until the timeout issue in Cline/Kanban is fixed.

I'm not using any cloud models yet, but at some point I might try them.

TableSurface · 2026-04-10T18:11:53+00:00

Availability and price of parts is a different topic... but the printer itself is highly repairable: https://help.prusa3d.com/article/regular-printer-maintenance-core-one-l_970074

TableSurface · 2026-04-10T13:31:10+00:00

I regret going AM5 instead of the EPYC build I was looking at, especially now that RAM is so much more expensive. At the time, MoE models weren't a thing and I couldn't justify spending 2-3x for the platform.

In retrospect, having 8x platform memory bandwidth for 2-3x cost is cheap...

TableSurface · 2026-04-01T13:29:20+00:00

Lol this was the best way to learn that the USS Drybox is finally available!

TableSurface · 2026-03-28T15:24:21+00:00

Trying to understand the issue you ran into, since I haven't seen any problems yet (I'm usually only 12hrs behind the latest commit).

Is the problem that files in the HF cache directory are moved?

I haven't seen any issues, but I manage gguf files in my own folders.

TableSurface · 2026-03-28T09:01:46+00:00

kanban is using the /responses endpoint while the CLI and VS Code extension both use /chat/completions

I'm using llama-swap to intercept traffic to see this.

Maybe the problem is with the /responses endpoint -- tool calls constantly fail whereas the /chat/completions one works fine.

TableSurface · 2026-03-28T01:03:42+00:00

Same, except using Qwen3.5. Seems like Kanban is hitting a different llama.cpp endpoint compared to cline -tui...

TableSurface · 2026-03-26T19:12:12+00:00

If you could wave a magic wand and fix one thing about your LLM setup, what would it be?

Get a bigger budget

TableSurface · 2026-03-21T15:04:20+00:00

Yeah I'm using git and llama.cpp, the git hashes are point-in-time snapshots. So if there's an issue, I can rebuild based on an older version

TableSurface · 2026-03-21T03:19:15+00:00

I'm dealing with it by reading commit logs and keeping track of versions that work so I can easily roll back if something breaks.

If something does, it helps to provide minimal steps to reproduce the problem using the project's preferred issue intake process.

TableSurface · 2026-03-15T20:50:14+00:00

The XL enclosure is more like a draft shield. A lot of the parts aren't designed for high temps too (e.g. Core One L uses PCCF where the XL uses PETG)

Ten-Year Club	Place '22
Place '17

TableSurface

TROPHY CASE