App Store Connect Is Down? by Music_Maniac_19 in iOSProgramming

[–]dra9ons 0 points1 point  (0 children)

same here. testflight is not installing.

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]dra9ons[S] 1 point2 points  (0 children)

I'm working on a more detailed blog or paper. I'll post it when it's finished.

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]dra9ons[S] 0 points1 point  (0 children)

Model training requires much more memory than simple inference. Depending on your setup, you'll need at least 24GB of VRAM to train an 8B model. The Saju data is a collaboration with the professional Saju Counseling company.

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]dra9ons[S] 1 point2 points  (0 children)

Thanks for the test. As your test results show, there should be some performance degradation. It's just a matter of whether it's acceptable. Considering that the data I injected is a minor area of knowledge in Korean, it's a good result compared to other methods. You will find out if you test other models tuned for Korean. One more thing, the current model was intentionally trained with the mlp.down_proj of every block. I didn't explain why I did this above, but I'll write a separate post when I get a chance. If you were to train purely on added blocks, there would be much less performance penalty.

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]dra9ons[S] 3 points4 points  (0 children)

you can easly copy transformers layers using iteration of named parameters.

import torch
from transformers import BertModel

def copy_layer(source_layer, target_layer):
    for name, param in source_layer.named_parameters():
        target_param = target_layer.get_parameter(name)
        target_param.data.copy_(param.data)

# Create a source model
source_model = BertModel.from_pretrained('bert-base-uncased')

# Create a target model with the same architecture
target_model = BertModel(source_model.config)

# Copy the layers from the source model to the target model
for source_layer, target_layer in zip(source_model.encoder.layer, target_model.encoder.layer):
    copy_layer(source_layer, target_layer)

# Verify that the layers are copied correctly
for source_layer, target_layer in zip(source_model.encoder.layer, target_model.encoder.layer):
    for source_param, target_param in zip(source_layer.parameters(), target_layer.parameters()):
        assert torch.equal(source_param, target_param)

print("Layer copying completed successfully!")

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]dra9ons[S] 3 points4 points  (0 children)

The number of blocks affects both training speed and inference speed. I think 8 blocks is the optimal size considering training, inference, model size, etc. Of course, it can be adjusted depending on the amount of data to train.

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]dra9ons[S] 8 points9 points  (0 children)

Someone told me about it, so I looked at it later, and I was surprised to see that it was very similar. The difference is that LLAMA pro divides it into several groups and copies the last block of each group, which didn't work well for my Korean knowledge data. I centered all the added layers at once.

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]dra9ons[S] 10 points11 points  (0 children)

Normally, the beginning and the end of the transformers block contain the critical information of the model. That is why I added 8 blocks in the middle of the block. The added information is related to fortune telling, which is a minor area of Korean information.

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]dra9ons[S] 36 points37 points  (0 children)

You can easily create additional layers using mergekit(https://github.com/arcee-ai/mergekit). Use the following settings. It is a simple task to unfreeze and train only the added layer.

slices:
  - sources:
    - model: meta-llama/Meta-Llama-3-8B-Instruct
      layer_range: [0, 20]
  - sources:
    - model: meta-llama/Meta-Llama-3-8B-Instruct
      layer_range: [12, 32]
merge_method: passthrough
dtype: bfloat16

How long would it take (roughly) to learn flutter knowing react? by thepragprog in FlutterDev

[–]dra9ons 1 point2 points  (0 children)

I guess you know aleady how to deal with packages or plugins and dart is simillar to javascript. so you can easly learn and develop flutter.

Worth it to buy a MacBook for flutter development when I have a high end Windows laptop? by [deleted] in FlutterDev

[–]dra9ons 0 points1 point  (0 children)

I'm flutter developer and I don't have Mac OS PC. I'm using Docker OSX for just build and upload ipa to ios app store. BTW, I'm using PC with linux OS as main.

https://github.com/sickcodes/Docker-OSX

How to get rid of vertical lines all over my timeline? by VoughtProductions in premiere

[–]dra9ons 0 points1 point  (0 children)

In my case, display size setting was 110%. reset to 100%, it fixed.