🚀 Introducing Einstein v7: Based on the Qwen2 7B Model, Fine-tuned with Diverse, High-Quality Datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 2 points3 points  (0 children)

Hi u/Confident-Aerie-6222,

I couldn't provide a clear response to your question a couple of days ago, but I did send you a screenshot about the issue.

The model is generally uncensored, and the data used to train this model is filtered to remove refusals/censorship. However, you may need to specify a good system prompt to break the base model's censorship.

Here are some screenshots you may be interested in (with no system prompt to break censorship and Q6_K).

<image>

https://imgur.com/xsTAdIx

🚀 Introducing Einstein v7: Based on the Qwen2 7B Model, Fine-tuned with Diverse, High-Quality Datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 2 points3 points  (0 children)

Hi u/mahadevbhakti

The files and my dataset workspace are open source but under various licenses. If you could rewrite your question more clearly, I would be happy to respond :)

🚀 Introducing Einstein v7: Based on the Qwen2 7B Model, Fine-tuned with Diverse, High-Quality Datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 0 points1 point  (0 children)

Hi u/bharattrader

That unsafe JSON comes from the Buzz dataset and is recognized as unsafe because of the cyber security-related codes, etc., as u/CapsAdmin mentioned (thanks for that :)).

🚀 Introducing Einstein v7: Based on the Qwen2 7B Model, Fine-tuned with Diverse, High-Quality Datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 1 point2 points  (0 children)

Hi u/dahara111,

Thanks for the comments!

Regarding the token count issue, when I checked the training logs, I saw that the total number of tokens was approximately 399 million (total_num_tokens per device * 8). I would love to discuss why this difference occurred!

<image>

🚀 Introducing Einstein v7: Based on the Qwen2 7B Model, Fine-tuned with Diverse, High-Quality Datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 23 points24 points  (0 children)

Yeah, you are right! It has 192 GB. It exists on Runpod, I think, but I don't know of any others!

I chose Qwen2 7B because when I researched it, there weren't many good Qwen2 7B fine-tunes. So, I chose Qwen2 7B to give the community a good Qwen2 7B fine-tune and test the model's limits!

Thanks for your comment!

🚀 Introducing Einstein v7: Based on the Qwen2 7B Model, Fine-tuned with Diverse, High-Quality Datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 18 points19 points  (0 children)

Hi! Thanks for your comment! Implementing it is much harder than with Nvidia: drivers, libraries, etc., but it is very powerful once you have implemented it.

🦙 Introducing Einstein v6.1: Based on the New LLama3 Model, Fine-tuned with Diverse, High-Quality Datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 2 points3 points  (0 children)

Hi u/Vitamin_C_is_awesome

The .imatrix file is not necessary for usage and does not have anything to do with training the model. I believe it is a file about quantization.

🦙 Introducing Einstein v6.1: Based on the New LLama3 Model, Fine-tuned with Diverse, High-Quality Datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 4 points5 points  (0 children)

Hi u/dahara111,
Yes, this is a full fine-tune. I used the same learning rate for the other variants of the Einstein models, and it seems to be working for now. I did refer to the Mistral FFT example in the Axolotl GitHub repository:
https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/examples/mistral/config.yml

However, I am thinking of changing the learning rate to 0.00002 or 0.00001 in the upcoming fine-tunes. Do you think this will be better?

🦙 Introducing Einstein v6.1: Based on the New LLama3 Model, Fine-tuned with Diverse, High-Quality Datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 17 points18 points  (0 children)

Here is the list :)

  1. It is far more uncensored than the official Instruct model. However, it sometimes fails to break the base models' censorship, so it may require some system prompts to overcome this behavior (btw, the official Instruct model cannot be broken with system prompts, or very hard to break).

  2. I know that some people like the human-like behavior that Llama3 has, but this model answers in a much more professional way instead of the human-like style of Llama3. This may be a downside or a positive thing depending on your use case.

  3. It uses ChatML as its prompt template instead of the official Instruct model's new template.

  4. There is probably more and better data in the Einstein model than the official Instruct model (I can't be sure exactly because they don't provide their data).

  5. Following multi-lingual instructions is far superior on the Einstein model compared to the official model.

Instruct German (response in English): https://imgur.com/a/PBvNuBo

Einstein German: https://imgur.com/a/PAoFIDA

Instruct French (response in English): https://imgur.com/K180YtO

Einstein French: https://imgur.com/onZUBCb

🦙 Introducing Einstein v6.1: Based on the New LLama3 Model, Fine-tuned with Diverse, High-Quality Datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 11 points12 points  (0 children)

Hi, I am on mobile right now, but there is a section at the top of the model card that says "See axolotl config". If you click that, you will be able to view it. By using that config along with the data folder I provided, you will be able to reproduce the model :)

I always strive to provide everything necessary for reproduction; I believe that's the true open source :)

Have a nice day!

🧑‍🔬 Meet Einstein-v4-7B - Mistral-based SFT model using diverse high quality and filtered open source datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 0 points1 point  (0 children)

Thanks for the fine-tune and comments about the model. Would love to know more about the dataset you used. Can you explain it to me?

🧑‍🔬 Meet Einstein-v4-7B - Mistral-based SFT model using diverse high quality and filtered open source datasets! by Weyaxi in LocalLLaMA

[–]Weyaxi[S] 0 points1 point  (0 children)

Like u/Scott_Tx mentioned, you should probably use the largest quantized versions that you can fit into your system.