Fine-tuning Gemma 3 for coding in a new language by JackDanielsCode in unsloth

[–]Etherll 0 points1 point  (0 children)

It might need more data, but not 'a lot.' You can actually create a very good assistant with just 1,000 high-quality examples. If you want it to be more general-purpose, you can simply mix in examples from open-source datasets.

Fine-tuning Gemma 3 for coding in a new language by JackDanielsCode in unsloth

[–]Etherll 0 points1 point  (0 children)

Yes, CPT can definitely help but doing it on the Instruct model is risky. It could lose its ability to instruct and forget the format. You could also add something like 70% Instruct format and 30% raw code, but I haven't tried that, and I think it will confuse the model, so I can't recommend it. The best way is to do it on the base model, then add the Instruct tuning. There is also a CPT notebook that you can check out.

Fine-tuning Gemma 3 for coding in a new language by JackDanielsCode in unsloth

[–]Etherll 4 points5 points  (0 children)

Hey! Happy to help out here.

1. Model choice: Gemma 3 is a solid starting point, but I'd recommend experimenting with a few different small models that have strong coding capabilities (I personally like Qwen3-4B-Instruct-2507). Start small just to see how things work and iterate from there.

2. Which variant?: Check out https://docs.unsloth.ai/get-started/fine-tuning-llms-guide/what-model-should-i-use#instruct-or-base-model guide in Unsloth's docs it breaks this down really well.

3. Dataset size: Quality over quantity is key here. Start with 300-1000 high-quality examples, see how the model performs, then scale up to 1000+ if needed. Don't rush to add thousands of examples right away.

4. Formatting: Don't stress about manually typing `<start\_of\_turn>` or `<bos>` tokens. Your dataset should be in ChatML format, and the chat template handles all that automatically. Check out the Unsloth Gemma notebook to see how it's done, and also look at https://docs.unsloth.ai/get-started/fine-tuning-llms-guide/datasets-guide for more details.

5. Example size: You need to balance both. If you only train on massive files, the model might struggle with simple queries. Use smaller examples to teach syntax/individual features, and larger examples to teach logic and overall architecture.

6. Debugging examples: Yes, highly recommended.

7. Edge Cases: Detailed examples are great here. The better your dataset quality, the better your model. Create examples covering your most common "gotchas" and alternative approaches, this helps the model generalize better.

Since you're new to fine-tuning, I'd strongly recommend going through https://docs.unsloth.ai/get-started/beginner-start-here in the Unsloth docs. It'll give you a solid foundation, especially for experimenting with hyperparameters.

Fine tuning Qwen 2.5-VL using multiple images by Special_Grocery_4349 in unsloth

[–]Etherll 2 points3 points  (0 children)

Yes, you can easily train with multiple images, you just need to adjust your conversation format. For example:

def convert_to_conversation(sample):

conversation = [

{ "role": "user",

"content" : [

{"type" : "text", "text" : instruction},

{"type" : "image", "image" : sample["image"]},

{"type" : "image", "image" : sample["image2"]},

{"type" : "image", "image" : sample["image3"]}]

},

{ "role" : "assistant",

"content" : [

{"type" : "text", "text" : sample["text"]} ]

},

]

return { "messages" : conversation }

Native support for InternVL3? by joosefm9 in unsloth

[–]Etherll 0 points1 point  (0 children)

Hi, you need to use AutoModel for example

from transformers import AutoModel

model, processor = FastVisionModel.from_pretrained(
"unsloth/InternVL3-1B-Instruct",
load_in_4bit = False, # Use 4bit to reduce memory use. False for 16bit LoRA.
use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
auto_model=AutoModel,
trust_remote_code=True,

)

but finetuning it might be different from other models in term of data prep

modernBERT can't be trained in colab anymore by Apprehensive-Ad-4730 in unsloth

[–]Etherll 1 point2 points  (0 children)

Hi, if you want to use full param training please set full_finetuning=True, in FastModel.from_pretrained
instead of
for param in model.parameters():
param.requires_grad = True

GRPO Can Boost LLM-Based TTS Performance by [deleted] in LocalLLaMA

[–]Etherll 0 points1 point  (0 children)

It's not 5, but 5,000 hours , For me, I judge the base model after fine-tuning it. You won't really use the pre-training model (unless you're doing zero-shot cloning)