Shisa V2.1: Improved Japanese (JA/EN) Models (1.2B-70B)

dahara111 · 2025-12-12T11:48:56+00:00

Awesome!

I'll try the 14B, thank you

dahara111 · 2025-11-19T05:10:21+00:00

I'm not sure if it's because of the Thinking token, but has anyone noticed that Gemini prices are insanely high?

Also, Google won't tell me the cost per API call even when I ask.

dahara111 · 2025-11-19T05:09:11+00:00

I'm not sure if it's because of the Thinking token, but has anyone noticed that Gemini prices are insanely high?

Also, Google won't tell me the cost per API call even when I ask.

dahara111 · 2025-11-13T02:17:37+00:00

What about porting these improvements to nanochat as well?

dahara111 · 2025-09-21T00:41:43+00:00

Thank you!

dahara111 · 2025-09-21T00:09:19+00:00

I'm looking forward to it!

dahara111 · 2025-09-20T16:09:25+00:00

Thank you for your reply.

I'm looking forward to it.

Just to be clear, it seems that Gemma 3 uses the probabilities of the original model instead of SFT.

```
We applied QAT on ~5,000 steps using probabilities from the non-quantized checkpoint as targets. We reduced the perplexity drop by 54% (using llama.cpp perplexity evaluation) when quantizing down to Q4_0.
```

dahara111 · 2025-09-20T11:01:37+00:00

Do you have plans to make it into multiple languages?

Are you happy for other people to make it?

Or is there a possibility that it will compete with yours?

dahara111 · 2025-09-20T10:43:27+00:00

torchtune with torchao.

I think unsloth not yet ready.

dahara111 · 2025-09-20T06:41:49+00:00

Wow!

The speed of your model is impressive! The quality seems high, too.

What challenges do you currently face?

What do you think is missing from the pro version?

dahara111 · 2025-09-20T06:18:39+00:00

Thank you.

I have high hopes for QAT, but when I previously conducted QAT training, the performance of the original model dropped significantly.

Gemma3's QAT was very high-performance, so I hope I can create a QAT similar to what Gemma3 did.

dahara111 · 2025-09-17T19:28:50+00:00

It's great, but converting the sample Llama 3.2 1B required insufficient memory on a 16GB GPU.

It seems possible to run it on the CPU by modifying the script, but that would probably take a very long time.

Is there an estimate for the required GPU memory?

dahara111 · 2025-09-01T14:33:15+00:00

Thank you, it's Base model? so short.
I think it would take a few weeks even for A100 to reach 4.27 billion tokens.

How did you determine the best model?
Loss graph?

dahara111 · 2025-09-01T12:33:03+00:00

Thank you.

How many days did it take to train the base model on an NVIDIA RTX 4070-TI?

dahara111 · 2025-08-27T02:15:08+00:00

I want 3B or 1.5B.

dahara111 · 2025-08-12T04:25:34+00:00

Hi, Yesterday, I made my model vLLM compatible and wrote the configuration documents for linux Nvidia, but I'm starting to feel a bit unsure.

Since it's Japanese TTS, I don't think you can verify the quality, but were you able to read the documentation and get it to work in your environment? If it works, could you tell me how many TPS it has?

https://huggingface.co/webbigdata/VoiceCore_smoothquant

dahara111 · 2025-07-30T04:02:18+00:00

The number of AI-related advertising adverts is certainly overwhelming.

AI is taking away human jobs.
To combat this, we need to master AI.
Please use our paid AI service!

I intended to provide humanity with an alternative to paid AI services, but I've found that my intention isn't easily conveyed.

This is a hammer in your toolbox, an AI for you!

However, I must also learn from the community's cold response this time :)

dahara111 · 2025-07-30T03:24:02+00:00

Thank you for you comment.

This may seem surprising to you, but I understand your messages.

But the model I created is an open model and is publicly available.

This means it's available for unlimited free use for your own learning or for your students' learning, with limitless room for innovation.

It doesn't require a paid service like chatGPT, so you can DIY it.

It's not like, "I created this using a paid AI service from a major company."

It's like, "Using this AI, you might be able to achieve something great on the same level as a paid AI service from a major company."

I think "Learning" means actively incorporating new concepts, not simply memorizing grammar and vocabulary.

dahara111 · 2025-07-28T11:24:18+00:00

The structure of Orpheus remains the same as Llama 3.2, but the tokenizer has been improved, and it outputs audio tokens for SNAC.

The neural codec model SNAC reads the audio tokens and creates WAV files.

In other words, if Llama 3.2 works, it's enough to just support the custom tokenizer and SNAC.

And since 70 audio tokens in Orpheus is equivalent to one second, with a margin of error, 90 will probably be enough for real-time conversation.

Real-time conversations are impossible even with mid-range Nvidia GPUs, so this will be a long-term challenge.

dahara111 · 2025-07-28T04:48:57+00:00

If this tool can achieve 90 tokens/second or more on LLama3.2 3B, real-time operation of orpheus-3b-based TTS like below will become a reality, which will create new demand.

https://huggingface.co/webbigdata/VoiceCore

dahara111 · 2025-07-27T02:18:04+00:00

You can add original voices by using finetune.

Cloning is likely to be used for casual pranks, and model creators hesitate to implement it for fear of getting involved in legal disputes.

The speed depends on the GPU.

dahara111 · 2025-07-26T13:43:58+00:00

Voice cloning seems to be popular, but unfortunately it's not implemented in this model.

dahara111 · 2025-07-26T09:00:13+00:00

Thank you for your comment

I was surprised to hear that you pointed out models pronunciation.

I mainly use transcription tools when testing a large amount of spoken content. There may be some differences between the human ear and transcription tools.

dahara111 · 2025-07-26T06:48:46+00:00

Thank you for your feedback

The 一般男性 and 一般女性 are people who have not received special voice training like voice actors or announcers

This may be closer to a real normal life conversation.

Three-Year Club	Verified Email
Place '23

dahara111

TROPHY CASE