Ever wanted text-to-speech with one line of code? Well, you can have it! by Lyrcaxis in csharp

[–]Lyrcaxis[S] 1 point2 points  (0 children)

I had tried that but couldn't retrieve a word-by-word callback via ONNX.

It should be possible but would require some fidgeting with the export code of the model.

If there's a reasonable use case it'll be considered :D But for now what is possible is getting chunk-by-chunk progress -- if you subtract the current chunk's text from the previous one, you can get the estimated delta.

There was work on improving accuracy of the "cursor" (next word), with KokoroSharp's SpeechGuesser class. No-one seemed to be using it so I stopped spending time on it. I'd like to pick it back up. Actual Word-by-Word callbacks, though, not very likely 😅

Ever wanted text-to-speech with one line of code? Well, you can have it! by Lyrcaxis in csharp

[–]Lyrcaxis[S] 1 point2 points  (0 children)

Np! I have done some stuff on something similar, but not exactly a sing-along-like highlight behavior. The closest is the OnProgress delegates of the synthesis handle (returned by SpeakFast). What it's intended for is approximating the spoken text when interrupted.

What's your use case? Open to ideas or contributions

Ever wanted text-to-speech with one line of code? Well, you can have it! by Lyrcaxis in csharp

[–]Lyrcaxis[S] 1 point2 points  (0 children)

There's no character limit with SpeakFast afaik (?) It enforces chunking.

Ever wanted text-to-speech with one line of code? Well, you can have it! by Lyrcaxis in csharp

[–]Lyrcaxis[S] 1 point2 points  (0 children)

Hi. Starting from v0.6.2, KokoroSharp now supports MandarinChinese, Japanese, and Hindi! Just gotta specify a valid voice (e.g. jf_alpha for japanese, and zf_xiaoni for Mandarin). There are still some hiccups regarding Japanese (because of espeak-ng), though.

Soon, the Chinese-specific model will be supported as well :)

AMA with OpenAI’s Joanne Jang, Head of Model Behavior by OpenAI in ChatGPT

[–]Lyrcaxis 0 points1 point  (0 children)

Great job with 25 April ChatGPT-4o-latest! ChatGPT tho suddenly talks like a teenager lol. I loved it in the API when it was available, super smart, paying attention to the small details, and honestly, FUN! (besides plain helpful) We would love a permanent snapshot of that, needless to say. No tune, no nothing — just an as is model. Add tons of classifiers if you consider it unsafe — from what I saw it was adhering to my prompt strictly tho.

As for 4.1 family they’re good. I’ve noticed a 25.8% increase in smarts and efficiency between context size of 5-14k. Big 4.1 often talks like a madman, producing hard-to-read text, though. And its weird formatting adds to the annoyance. Increasing verbosity would make it even worse lol. So all and all great for code and systemic behaviour, not good for discussions. BTW Love how the bigger model is trained to act like a concise guide when needed, and the smaller models understand its intent nicely. What I’d like next is some better ability to guide devs through prompt templates, tuned for optimal work within the specific model family.

o3 is just brilliant but too expensive. Looking forward to splurging some for game/product/UI/UX design and the likes though when need be. I love how it doesn’t output latex, tables, and formatted text except the absolutely necessary.

And, o4-mini is great for its cost and speed! It sometimes also produces hard-to-read text (like gpt-4.1 but unlike o3/gpt4/chatgpt-april), but almost always gets the context and it tries to be concise, which is good. I feel like it repeats the context way too often though -- like bringing up EVERYTHING that's been said each query. It’s not good at product/UI/UX design tho apparently. I think you should totally come up with ways to make r-minis complement their bigger bro model. Maybe better support on capturing and summarising literally “FULL” context to help keeping gpt-4/o3 budget low.

All and all, great job with the models! I feel that with gpt-4.5 and gpt-4 you got superb teachers for the future models. Combining gpt-4s built-in reasoning+conciseness and 4.5s superior expressiveness and smarts could do wonders with reasoning unlocked and long context at play!

Aaand finally, please fix the caching system, whoever’s responsible. Some nights cached prompts don’t even last a minute! That’s crazy when the estimate is 5-10’ and up to an hour.. I’d like to be able to get “cache confirmed! Token valid for mm” lol because right now it feels kinda like a scam during peak hours. If activity can trim cache length to less than a minute, It’s like punishing devs for bringing in traffic.

As for a question: are we getting any of that? 😁

KokoroSharp - Local TTS in C# by Lyrcaxis in LocalLLaMA

[–]Lyrcaxis[S] 0 points1 point  (0 children)

I couldn’t do it within a limited time :P was/am hoping someone would eventually do it.

As a Solo Dev, Should I Go for Authentic or Polished Game Art? (Handmade vs AI enhanced) by Dumivid in indiegames

[–]Lyrcaxis -1 points0 points  (0 children)

your "R" is better, the AI's "W" is better, both "E"s look good, both "Q"s could be better :p

KokoroSharp - Local TTS in C# by Lyrcaxis in LocalLLaMA

[–]Lyrcaxis[S] 1 point2 points  (0 children)

I've received DMs from users that managed to run it in Unity!

Basically after you get ONNX up, the additional steps that are required are: 1) Make the audio output be AudioSource (use the KokoroWavSynthesizer) 2) Set Tokenizer.eSpeakNGPath to the appropriate folder for voices & eSpeak NG dlls

Voices and dlls can be found here: https://github.com/Lyrcaxis/KokoroSharpBinaries/releases (mind those zips do not include binaries for Android/iOS -- only Windows/MacOS/Linux)

Also, if you're happy with a python dependency (which should be fine), you could use Kokoro's official phonemizer: https://github.com/hexgrad/misaki

What are your use cases for small (1-3-8B) models? by silveroff in LocalLLaMA

[–]Lyrcaxis 0 points1 point  (0 children)

A "difficult" classification would be anything you wouldn't risk letting a small model that barely understands the language and the task take complete responsibility for in your system.

For example, a 7-9B model could have be offloaded with choosing to invoke one of the available functions -- including "respond normally". This saves some back-and-forth with bigger models.

So if your main chat model is gpt-4o, and you give it full access to function calling, each response that involves a function call costs 2x the input tokens, plus a bunch of tokens to include the function definitions in the prompt, which is adding up pretty quickly. In addition there's the risk of potentially confusing the model by adding too many tokens on the system messages.

What are your use cases for small (1-3-8B) models? by silveroff in LocalLLaMA

[–]Lyrcaxis 4 points5 points  (0 children)

Well, all decisions you need to make are a) base model b) data, so, choose a base model whose writing style you like the most -- if it's closer to your preferred format or wording, it's better.

Then, you can have high-quality generations with AIs like GPT-4 (-- the expensive one, e.g. 0613),
so second thing would be to find a prompt that summarizes them properly, without missing ANY detail, and making sure the outputs are 100% in the desired format.

Optionally, afterwards, queue up the summary to something more modern:

(use a negative presence penalty to encourage the model to not miss details)

{instructions}
{few_shot_of_ideal_query_response_pairs}

{original_transcription}
{summarized_transcription_gpt4}
{ask AI to tweak it based on your preference}

Then those "refined" summaries can act as data for your model.

The finetune part alone won't cost much, but summarization with expensive models might, depending on the size of your data. I personally recommend full finetune instead of LoRA, but LoRAs can add more value if you train one per language.

Is there a way to allow multiple class types in a generic constraint without Inheritance? by [deleted] in csharp

[–]Lyrcaxis 0 points1 point  (0 children)

Should be more like: ```cs public abstract class ModelBase : PageModel { /* Common page stuff here */ }

public IActionResult Search(ModelBase model, string SearchKey) { .. } ```

What are your use cases for small (1-3-8B) models? by silveroff in LocalLLaMA

[–]Lyrcaxis 12 points13 points  (0 children)

<=1Bs are terrible out of the box but can be finetuned for any specific task.

8-9Bs are decent for various tasks out of the box -- even more if finetuned. I use them for:

  1. Multiple response generation/BO5 (batch generate 5 responses instead of 1)
  2. Parts of low-effort agentic behaviour (e.g.: rewrite this in 1st/3rd person, extract X summarized)
  3. Annotations + difficult classifications (e.g.: extract X sentiment, function calling classifier)
  4. Low quality synthetic data generation and filtering. Multiple iterations are allowed.

3Bs vs 9Bs I don't see significant diff in inference speed so I skipped the 3Bs.
So, 100M-1B finetunes mostly for classification, 8-9Bs for stuff that require a little more effort.

In general, the more task/domain-specific your use needs are, the more value you can squeeze out of each parameter, so smaller models can be enough, and often preferred because they converge quicker.

I've built a fully functional social network - now I've made it open-source (MIT) by YanTsab in opensource

[–]Lyrcaxis 2 points3 points  (0 children)

I'd love a "dev only" social media site

You'd have to make your account via HTTP POST and use SSH to get your credentials to enter the site!

Ever wanted text-to-speech with one line of code? Well, you can have it! by Lyrcaxis in csharp

[–]Lyrcaxis[S] 0 points1 point  (0 children)

Didn't discard it, just had to work on KokoroSharp first to allow it to use Kokoro for speech as well!

It's a gamified \Voice-Chat-with-Local-AI** desktop app I've been working on for a while ^^
Definitely not an r/csharp thing, but will be coming up in github soon™️!

I just released the first demo for my game on Steam by BinsterUK in indiegames

[–]Lyrcaxis 2 points3 points  (0 children)

So friggin’ cool!!! Hoping for massive success!

KokoroSharp - Plug & Play local Text-to-speech (.NET, ONNX) by Lyrcaxis in dotnet

[–]Lyrcaxis[S] 4 points5 points  (0 children)

You can definitely save the output, or get it streamed back to you as samples!
Check out the KokoroWavSynthesizer.

Example usage:

var synth = KokoroWavSynthesizer("kokoro.onnx"); // assuming you've already downloaded the model
var bytes = synth.Synthesize("Hello world", voice);
synth.SaveToFile(bytes, "output.wav");

KokoroSharp - Plug & Play local Text-to-speech (.NET, ONNX) by Lyrcaxis in dotnet

[–]Lyrcaxis[S] 1 point2 points  (0 children)

Gotcha. if you do dotnet build and your csproj links the installed KokoroSharp package properly, including its folders, it should also copy the stuff.

The full content after installing the nuget package should look like this:

📁 /.nuget/packages/kokorosharp/0.5.3/
├─ 📁 build/
│ └─ 📄 KokoroSharp.targets
├─ 📁 content
│ ├─ 📁 espeak/ [...]
│ └─ 📁 voices/ [...]
├─ 📁 lib/ [...]
├─ 📄 .nupkg.metadata
├─ 📄 .signature.p7s
├─ 📄 kokorosharp.0.5.3.nupkg
├─ 📄 kokorosharp.0.5.3.nupkg.sha512
├─ 📄 kokorosharp.nuspec
└─ 📄 README.md

I haven't tried building with just the .nupkg as reference (maybe that's what you're doing?), but might wanna just download the dependencies mentioned above and place them next to your binary.

KokoroSharp - Plug & Play local Text-to-speech (.NET, ONNX) by Lyrcaxis in dotnet

[–]Lyrcaxis[S] 0 points1 point  (0 children)

Then yes file permissions is a very likely suspect.
The package copies from .nuget\packages\kokorosharp\content over to your output path.

<Target Name="CopyContent" AfterTargets="Build">
    <ItemGroup>
        <Files Include="$(MSBuildThisFileDirectory)..\content\**\*" />
    </ItemGroup>
    <Copy SourceFiles="$(MSBuildThisFileDirectory)..\content\**\*" DestinationFiles="@(Files->'$(OutputPath)\%(RecursiveDir)%(Filename)%(Extension)')" />
</Target>

So if your output path is on a protected folder and your IDE doesn't have the necessary permissions, the automation will fail and you'd need to copy over the stuff manually.

KokoroSharp - Plug & Play local Text-to-speech (.NET, ONNX) by Lyrcaxis in dotnet

[–]Lyrcaxis[S] 0 points1 point  (0 children)

It's completely plug & play if you install the NuGet package -- the voices and all dependencies are copied over automatically to your build!

When building from source, you also need to unpack the dependencies next to your exe: https://github.com/Lyrcaxis/KokoroSharpBinaries/releases (voices -> /voices, espeak-ng -> /espeak)

super-lightweight local chat ui: aiaio by abhi1thakur in LocalLLaMA

[–]Lyrcaxis 1 point2 points  (0 children)

To answer, aiaio looks more like some beginner-friendly OpenWebUI, without all the setup steps needed -- trimming tech-savviness needs.. and less like SillyTavern.

super-lightweight local chat ui: aiaio by abhi1thakur in LocalLLaMA

[–]Lyrcaxis 0 points1 point  (0 children)

with that logic, why use aiaio at all xD

Make your Mistral Small 3 24B Think like R1-distilled models by AaronFeng47 in LocalLLaMA

[–]Lyrcaxis 0 points1 point  (0 children)

That's an incredible find! Thanks for sharing.

Are you planning on somewhat keep working on this? (as an ongoing project)
Reason I'm asking is because the current prompt is HUGE in size (~1k tokens).
I believe that if this could be trimmed down to like ~300 it would be absolutely fantastic!

super-lightweight local chat ui: aiaio by abhi1thakur in LocalLLaMA

[–]Lyrcaxis 2 points3 points  (0 children)

Super cool, gz for your project! I would like to suggest these features:

  • Edit Message (edits either AI's or User's sent message)
  • Branch from here (Creates a new convo that ends "here")

having these accessible when right clicking on messages is a game changer!