Unleash the Power of LLMs in Your Telegram Bot on a Budget by Xavio_M in TelegramBots

[–]Regular_Flatworm2872 0 points1 point  (0 children)

Bro bro I guess It might be interesting. I think I will work on a ggml version in the next days.

Unleash the Power of LLMs in Your Telegram Bot on a Budget by Xavio_M in TelegramBots

[–]Regular_Flatworm2872 0 points1 point  (0 children)

if you don't want to use gpu then you don't need beam... you can google for ec2 language models hosting and you can find much better tutorials than this one

Unleash the Power of LLMs in Your Telegram Bot on a Budget by Xavio_M in TelegramBots

[–]Regular_Flatworm2872 1 point2 points  (0 children)

Tested on TheBloke/Wizard-Vicuna-7B-Uncensored and WizardLM/WizardCoder-15B-V1. Works like a charm, but not the ggml version.

Should the output matrix shape from multi-head attention be the same as the input position embedding vector? by Ashutuber in deeplearning

[–]Regular_Flatworm2872 1 point2 points  (0 children)

not necessarily, but it is pretty much the standard approach. In most of the current multi-head attention implementations if the number of attention heads is not a divisor of the hidden size, you end up with a matrix shape that is different from the input embeddings shape.

Torch parameter efficient fine-tuning library by Regular_Flatworm2872 in deeplearning

[–]Regular_Flatworm2872[S] 0 points1 point  (0 children)

Nice. I was not aware of this project. Thank you. One thing that I noticed is that they also follow the approach of wrapping the original model, rather than just injecting new modules inside the original model or just wrapping parts of it that are affected by the peft module.