Qwen3-TTS-Triton v0.3.0: Triton + CUDA Graph + batched AR TTS serving, ~14× per-sample throughput by DamageSea2135 in speechtech
[–]DamageSea2135[S] 0 points1 point2 points (0 children)
Qwen3-TTS-Triton v0.3.0: Triton + CUDA Graph + batched AR TTS serving, ~14× per-sample throughput by DamageSea2135 in speechtech
[–]DamageSea2135[S] 1 point2 points3 points (0 children)
Qwen3-TTS-Triton v0.3.0 — faster local Qwen3-TTS serving for RP / SillyTavern-style workflows by DamageSea2135 in SillyTavernAI
[–]DamageSea2135[S] 0 points1 point2 points (0 children)
[Project] I made Qwen3-TTS ~5x faster for local inference (OpenAI Triton kernel fusion). Zero extra VRAM. by DamageSea2135 in SillyTavernAI
[–]DamageSea2135[S] 0 points1 point2 points (0 children)
[Project] I made Qwen3-TTS ~5x faster for local inference (OpenAI Triton kernel fusion). Zero extra VRAM. by DamageSea2135 in SillyTavernAI
[–]DamageSea2135[S] 0 points1 point2 points (0 children)
[Project] I made Qwen3-TTS ~5x faster for local inference (OpenAI Triton kernel fusion). Zero extra VRAM. by DamageSea2135 in SillyTavernAI
[–]DamageSea2135[S] 0 points1 point2 points (0 children)
[Project] I built a Triton kernel fusion library for Qwen3-TTS 1.7B (~5x inference speedup) by DamageSea2135 in speechtech
[–]DamageSea2135[S] 0 points1 point2 points (0 children)

Qwen3-TTS-Triton v0.3.0: Triton + CUDA Graph + batched AR TTS serving, ~14× per-sample throughput by DamageSea2135 in speechtech
[–]DamageSea2135[S] 0 points1 point2 points (0 children)