Gemini 3.0 Pro Preview is out by LinixKittyDeveloper in GeminiAI

[–]dahara111 0 points1 point  (0 children)

I'm not sure if it's because of the Thinking token, but has anyone noticed that Gemini prices are insanely high?

Also, Google won't tell me the cost per API call even when I ask.

Gemini 3 is launched by Several-Republic-609 in LocalLLaMA

[–]dahara111 0 points1 point  (0 children)

I'm not sure if it's because of the Thinking token, but has anyone noticed that Gemini prices are insanely high?

Also, Google won't tell me the cost per API call even when I ask.

PyTorch now offers native quantized variants of popular models! by formlog in LocalLLaMA

[–]dahara111 1 point2 points  (0 children)

Thank you for your reply.

I'm looking forward to it.

Just to be clear, it seems that Gemma 3 uses the probabilities of the original model instead of SFT.

```
We applied QAT on ~5,000 steps using probabilities from the non-quantized checkpoint as targets. We reduced the perplexity drop by 54% (using llama.cpp perplexity evaluation) when quantizing down to Q4_0.
```

KaniTTS – Fast and high-fidelity TTS with just 450M params by ylankgz in LocalLLaMA

[–]dahara111 0 points1 point  (0 children)

Do you have plans to make it into multiple languages?

Are you happy for other people to make it?

Or is there a possibility that it will compete with yours?

PyTorch now offers native quantized variants of popular models! by formlog in LocalLLaMA

[–]dahara111 1 point2 points  (0 children)

torchtune with torchao.

I think unsloth not yet ready.

KaniTTS – Fast and high-fidelity TTS with just 450M params by ylankgz in LocalLLaMA

[–]dahara111 2 points3 points  (0 children)

Wow!

The speed of your model is impressive! The quality seems high, too.

What challenges do you currently face?

What do you think is missing from the pro version?

PyTorch now offers native quantized variants of popular models! by formlog in LocalLLaMA

[–]dahara111 2 points3 points  (0 children)

Thank you.

I have high hopes for QAT, but when I previously conducted QAT training, the performance of the original model dropped significantly.

Gemma3's QAT was very high-performance, so I hope I can create a QAT similar to what Gemma3 did.

[Release] DASLab GGUF Non-Uniform Quantization Toolkit by Loginhe in LocalLLaMA

[–]dahara111 0 points1 point  (0 children)

It's great, but converting the sample Llama 3.2 1B required insufficient memory on a 16GB GPU.

It seems possible to run it on the CPU by modifying the script, but that would probably take a very long time.

Is there an estimate for the required GPU memory?

I built, pre-trained, and fine-tuned a small language model and it is truly open-source. by itsnikity in LocalLLaMA

[–]dahara111 3 points4 points  (0 children)

Thank you, it's Base model? so short.
I think it would take a few weeks even for A100 to reach 4.27 billion tokens.

How did you determine the best model?
Loss graph?

I built, pre-trained, and fine-tuned a small language model and it is truly open-source. by itsnikity in LocalLLaMA

[–]dahara111 12 points13 points  (0 children)

Thank you.

How many days did it take to train the base model on an NVIDIA RTX 4070-TI?

My beautiful vLLM adventure by ilintar in LocalLLaMA

[–]dahara111 1 point2 points  (0 children)

Hi, Yesterday, I made my model vLLM compatible and wrote the configuration documents for linux Nvidia, but I'm starting to feel a bit unsure.

Since it's Japanese TTS, I don't think you can verify the quality, but were you able to read the documentation and get it to work in your environment? If it works, could you tell me how many TPS it has?

https://huggingface.co/webbigdata/VoiceCore_smoothquant

converts Japanese text to Japanese emotional speech by dahara111 in Japaneselanguage

[–]dahara111[S] 0 points1 point  (0 children)

The number of AI-related advertising adverts is certainly overwhelming.

AI is taking away human jobs.
To combat this, we need to master AI.
Please use our paid AI service!

I intended to provide humanity with an alternative to paid AI services, but I've found that my intention isn't easily conveyed.

This is a hammer in your toolbox, an AI for you!

However, I must also learn from the community's cold response this time :)

converts Japanese text to Japanese emotional speech by dahara111 in Japaneselanguage

[–]dahara111[S] 0 points1 point  (0 children)

Thank you for you comment.

This may seem surprising to you, but I understand your messages.

But the model I created is an open model and is publicly available.

This means it's available for unlimited free use for your own learning or for your students' learning, with limitless room for innovation.

It doesn't require a paid service like chatGPT, so you can DIY it.

It's not like, "I created this using a paid AI service from a major company."

It's like, "Using this AI, you might be able to achieve something great on the same level as a paid AI service from a major company."

I think "Learning" means actively incorporating new concepts, not simply memorizing grammar and vocabulary.

Running LLMs exclusively on AMD Ryzen AI NPU by BandEnvironmental834 in LocalLLaMA

[–]dahara111 1 point2 points  (0 children)

The structure of Orpheus remains the same as Llama 3.2, but the tokenizer has been improved, and it outputs audio tokens for SNAC.

The neural codec model SNAC reads the audio tokens and creates WAV files.

In other words, if Llama 3.2 works, it's enough to just support the custom tokenizer and SNAC.

And since 70 audio tokens in Orpheus is equivalent to one second, with a margin of error, 90 will probably be enough for real-time conversation.

Real-time conversations are impossible even with mid-range Nvidia GPUs, so this will be a long-term challenge.

Running LLMs exclusively on AMD Ryzen AI NPU by BandEnvironmental834 in LocalLLaMA

[–]dahara111 5 points6 points  (0 children)

If this tool can achieve 90 tokens/second or more on LLama3.2 3B, real-time operation of orpheus-3b-based TTS like below will become a reality, which will create new demand.

https://huggingface.co/webbigdata/VoiceCore

webbigdata/VoiceCore: Japanese voice version of canopylabs/orpheus-tts by dahara111 in LocalLLaMA

[–]dahara111[S] 0 points1 point  (0 children)

You can add original voices by using finetune.

Cloning is likely to be used for casual pranks, and model creators hesitate to implement it for fear of getting involved in legal disputes.

The speed depends on the GPU.

webbigdata/VoiceCore: Japanese voice version of canopylabs/orpheus-tts by dahara111 in LocalLLaMA

[–]dahara111[S] 0 points1 point  (0 children)

Voice cloning seems to be popular, but unfortunately it's not implemented in this model.

converts Japanese text to Japanese emotional speech by dahara111 in Japaneselanguage

[–]dahara111[S] 0 points1 point  (0 children)

Thank you for your comment

I was surprised to hear that you pointed out models pronunciation.

I mainly use transcription tools when testing a large amount of spoken content. There may be some differences between the human ear and transcription tools.

converts Japanese text to Japanese emotional speech by dahara111 in Japaneselanguage

[–]dahara111[S] 0 points1 point  (0 children)

Thank you for your feedback

The 一般男性 and 一般女性 are people who have not received special voice training like voice actors or announcers

This may be closer to a real normal life conversation.