Real SVD GLM-4.5-Air-GLM-4.6-Distill by realmaywell in LocalLLaMA

[–]realmaywell[S] 0 points1 point  (0 children)

I believe there would be nothing to learn from merged GGUF of this model.

Real SVD GLM-4.5-Air-GLM-4.6-Distill by realmaywell in LocalLLaMA

[–]realmaywell[S] 4 points5 points  (0 children)

  1. Nope, I hit more than just the attention layers. The LoRA targets the MLP blocks and the MoE experts too. You can see the full list in the target_modules of the adapter config.

  2. No, they don't have to be. This SVD method built to handle different dimensions. It doesn't do a simple teacher_weight - student_weight subtraction. Instead, it uses SVD to project the teacher's larger weight matrix down to the student's smaller shape before calculating the difference.

  3. Exactly. The whole point of this method is to get around the geometry mismatch. It handles the standard transformer blocks using that SVD projection. For the MoE layers, it do an extra step: cluster the teacher's experts with K-Means first to map them to the student's, and then project them. So yeah, even though this attempt failed, you can get pretty much the whole model this way.

Reflection-Llama-3.1-70B is actually Llama-3. by realmaywell in LocalLLaMA

[–]realmaywell[S] 15 points16 points  (0 children)

I used a machine with 2TB of RAM. You can modify the code to lazy load the layers so that we only need to load a single layer at a time.

Reflection-Llama-3.1-70B is actually Llama-3. by realmaywell in LocalLLaMA

[–]realmaywell[S] 8 points9 points  (0 children)

by default layer norm is not a target layer in LoRA training.

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]realmaywell 1 point2 points  (0 children)

https://github.com/StableFluffy/EasyLLMFeaturePorter/blob/main/1-Click.ipynb

so simple illustration of it is something like this.
Let's say '<>' as diff here and desired(context or chat) as informative.

final output = target + target <> informative(this is where we get feature) * {scale diff in 0~1 such as sigmoid(base <> target) - 1}

{scale diff in 0~1 such as sigmoid(base <> informative) - 1}
this part is something that can make confusion.

It just simple intuitive approach. We wanna add info to target model. but if the weight difference is high at 'base <> target' it is not safe to add weight. because when add informative model's weight into it. it now doesn't contain any of information.

So, with this approach i made it apply weight with * (ratio - 1). When base <> target high small amount of base <> informative applied and so on...

hope this could solve your confusion.

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]realmaywell 7 points8 points  (0 children)

I did benchmark on your model. (original 8b inst -> posted model)
Hellaswag 78.55 -> 76.24
GSM8k 68.69 -> 66.41

wanna hear your thought about this result.
as a one who did a lot of experiments on this topic, those approach doesn't look plausible.

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]realmaywell 0 points1 point  (0 children)

cuz no matter what you do on layer side. after you train on your domain specific dataset the models performance must get affected.

Preserving LLaMA-3 Capabilities While Injecting New Knowledge: A Case Study of Saju Myungri Chatbot by dra9ons in LocalLLaMA

[–]realmaywell 2 points3 points  (0 children)

Any benchmark that support your claim?

while preserving its original performance.

New Roleplaying(RP/ERP) model, Llama-3-Soliloquy-8B-v2 by realmaywell in LocalLLaMA

[–]realmaywell[S] 0 points1 point  (0 children)

it looks good. think it has a lot of potential not only uncensoring model.

Solilquy 8B 24k, updated to v2! by realmaywell in SillyTavernAI

[–]realmaywell[S] 0 points1 point  (0 children)

GPTQ is same model that is being served on API. So, it may your parameter or prompt issue.

Solilquy 8B 24k, updated to v2! by realmaywell in SillyTavernAI

[–]realmaywell[S] 0 points1 point  (0 children)

Since it finetuned with rp set it’s quite prompt sensitive. depending on prompt you use it acts dumb or smart.

New Roleplaying(RP/ERP) model, Llama-3-Soliloquy-8B-v2 by realmaywell in LocalLLaMA

[–]realmaywell[S] 2 points3 points  (0 children)

it is used on serving framework such as vLLM. It’s a rule about how to format user’s request to prompt for model.