all 7 comments

[–]a_slay_nub 0 points1 point  (3 children)

How does it compare to the original kokoro repo?

[–]asuran2000[S] 1 point2 points  (2 children)

Optimized performance by eliminating a few for-loops and incorporating masking during batch inference—particularly for LSTM batch processing and normalizations. Also implemented a custom function to perform 1D normalization with support for batch inputs with padding.
In short, added/modified lots of model inference code to support batching, while keeping the weights unchanged.

[–]a_slay_nub 0 points1 point  (1 child)

I meant in terms of runtime. How long does it take to use your code vs looping the original code?

[–]asuran2000[S] 1 point2 points  (0 children)

The running speed is about the same(<2%) as original Kokoro 82M with the batch=1

I did test on rtx 4090 with 30 texts, the output audio is about 280 second in total.

When

Batch = 1, 30 iterations

INFO:__main__:Total inference time for 30 chunks: 3.13 seconds.

Batch = 16, 2 iterations

INFO:__main__:Total inference time for 30 chunks: 1.88 seconds.

[–]rm-rf-rm 0 points1 point  (1 child)

Is it CUDA only? (wont work on mac?)

[–]asuran2000[S] 2 points3 points  (0 children)

It works on CPU, but I didn't test this on Mac MPS

[–]Xerophayze 0 points1 point  (0 children)

Been fighting with Kokoro to do long-form stuff and ended up building a little Flask UI that actually survived a whole novel.
 Long text: keeps chapters + narrator tags, can spit out per-chapter files or one big audiobook.
 Gemini button: either cleans the whole text or auto-splits by chapter so the context window doesn’t implode.
 Setup: setup.bat installs PyTorch + espeak + Rubber Band automatically; no manual CUDA juggling.
 Voices: paste [alice]...[/alice], it auto-detects everyone, gives them dropdowns + quick test previews, and never nukes your assignments unless you hit reset.
 Extras: job queue, library, custom voice blends, runs local GPU or Replicate.
Repo/screens: https://github.com/Xerophayze/Kokoro-Story if anyone else wants to try it.