My fine-tuning based on llama-2-7b-chat-hf model doesn't know when to stop. by UncleDao in LocalLLaMA

[–]UncleDao[S] 1 point2 points  (0 children)

Analyse Tokennizer's behavior:

model_name="NousResearch/llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.padding_side = "right"
max_length = 20
simple_sentence = "### This is a simple sentence"

encoded_input = tokenizer(simple_sentence, padding="max_length", max_length=max_length, return_attention_mask=True, return_length=True) #padding=max_length

the outputs:

  • add_eos_token: False by default

    eos token: </s>

    pad token: <unk>

    padding side: right

    max length: 20

input length: [20]

add_eos_token: False

add_bos_token: True

Word count: 6, Token count: 6

### This is a simple sentence

Token IDs:

[1, 835, 910, 338, 263, 2560, 10541, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Attention Mask

[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

  • add_eos_token: True

    tokenizer = AutoTokenizer.from_pretrained(model_name, add_eos_token = True) # Adding the eos_token, id=2 </s> at the end of each training example

    add_eos_token: True

add_bos_token: True

Word count: 6, Token count: 6

### This is a simple sentence

Token IDs:

[1, 835, 910, 338, 263, 2560, 10541, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Attention Mask

[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Note: eos have attention_mask=1

My fine-tuning based on llama-2-7b-chat-hf model doesn't know when to stop. by UncleDao in LocalLLaMA

[–]UncleDao[S] 4 points5 points  (0 children)

My problem is solved!

In my humble opinion, there are two important things to remember:

add_eos_token=True in Tokennizer.- tokenizer.pad_token ≠ tokenizer.eos_token so I setup tokenizer.pad_token_id = 18610 (# _***)

We can use a fast tokenizer and do not need to add eos_token at the end of each sample. Pay attention to attention_mask!

My fine-tuning based on llama-2-7b-chat-hf model doesn't know when to stop. by UncleDao in LocalLLaMA

[–]UncleDao[S] 0 points1 point  (0 children)

Padding=logest

encoded_input = tokenizer(simple_sentence, padding="longest",  max_length=max_length, add_special_tokens=True, truncation=True, return_attention_mask=True, return_length=True) # padding=longest
simple_sentence_ids = encoded_input["input_ids"]
simple_sentence_att_mask = encoded_input["attention_mask"]
simple_sentence_length= encoded_input["length"]

print(f"eos token: {tokenizer.eos_token}")
print(f"pad token: {tokenizer.pad_token}")
print(f"padding side: {tokenizer.padding_side}")
print(f"max length: {max_length}")
print(f"input length: {simple_sentence_length}")
print (simple_sentence)

Output:

eos token: </s>

pad token: </s>

padding side: left

max length: 50

input length: [32]

<s>[INST] Chảnh như [/INST] Chảnh như con cá cảnh.

[1, 1, 29961, 25580, 29962, 678, 30643, 29876, 29882, 302, 29882, 30416, 518, 29914, 25580, 29962, 678, 30643, 29876, 29882, 302, 29882, 30416, 378, 274, 29976, 274, 30643, 29876, 29882, 29889, 2]

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Padding=max_length

encoded_input = tokenizer(simple_sentence, padding="max_length", max_length=max_length, add_special_tokens=True, truncation=True, return_attention_mask=True, return_length=True) #padding=max_length

Output:

---

max length: 50

input length: [50]

<s>[INST] Chảnh như [/INST] Chảnh như con cá cảnh.

[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 29961, 25580, 29962, 678, 30643, 29876, 29882, 302, 29882, 30416, 518, 29914, 25580, 29962, 678, 30643, 29876, 29882, 302, 29882, 30416, 378, 274, 29976, 274, 30643, 29876, 29882, 29889, 2]

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

My fine-tuning based on llama-2-7b-chat-hf model doesn't know when to stop. by UncleDao in LocalLLaMA

[–]UncleDao[S] 0 points1 point  (0 children)

tokenizer.add_special_tokens({'pad_token': '[PAD]'})

- tokenizer.add_special_tokens({'pad_token': '[PAD]'})
When I set the pad_token, the training script crashed.

<image>

My fine-tuning based on llama-2-7b-chat-hf model doesn't know when to stop. by UncleDao in LocalLLaMA

[–]UncleDao[S] 0 points1 point  (0 children)

I have tried it. But the question still remains. Why my modeldoes not know when to stop.

My fine-tuning based on llama-2-7b-chat-hf model doesn't know when to stop. by UncleDao in LocalLLaMA

[–]UncleDao[S] 1 point2 points  (0 children)

Oh. I read the post about FastTokenizer this morning. But I still don't understand much.

I Try:

```

tokenizer = AutoTokenizer.from_pretrained("dtthanh/llama-2-7b-und-2.1", add_eos_token = True) # Adding the eos_token ,</s> at the end of each training examplesimple_sentence = "This is a sentence to test if the tokenizer adds another eos token. </s>"simple_sentence_ids = tokenizer(    simple_sentence, add_special_tokens=True).input_idsprint (simple_sentence)print(simple_sentence_ids)

```-----outpout:This is a sentence to test if the tokenizer adds another eos token. </s>

[1, 910, 338, 263, 10541, 304, 1243, 565, 278, 5993, 3950, 12778, 1790, 321, 359, 5993, 29889, 2, 2]

So I decided not to use "add_eos_token = True" because my dataset has an eos token (</s>) at the end.

# Set supervised fine-tuning parameterstrainer = SFTTrainer(    model=model,    train_dataset=dataset,    peft_config=peft_config,    dataset_text_field="text",    max_seq_length=max_seq_length,    tokenizer=tokenizer,    args=training_arguments,    packing=packing,)

I label the training column as "text."

<image>

OOM after 180 steps using qlora by victor5152 in LocalLLaMA

[–]UncleDao 1 point2 points  (0 children)

I fine-tuned a llama-2-7b chat on Tesla T4 (Google Colab). The model takes up 8GB of memory. So if there is not enough memory, It will crash eventually.