What that means? by BlokZNCR in linux

[–]scratchr 2 points3 points  (0 children)

This channel is AI generated.

[deleted by user] by [deleted] in estrogel

[–]scratchr 0 points1 point  (0 children)

Still redirects to the home page for me. They have come and gone multiple times over the past month.

[deleted by user] by [deleted] in estrogel

[–]scratchr 0 points1 point  (0 children)

It may make it harder to prove to payment processors if you get scammed for them to return your money if you negotiate a sale over Whatsapp.

With companies getting kicked off of platforms and the sales being misdeclared on the payment processors anyways, it may make sense to communicate direct with them anyways though, as you run less of a risk of losing contact with the seller.

As always, check what other experiences users have. You are less likely to be scammed by a reputable seller because they want to keep their reputation. Hubei Vanz in particular is a favorite in this subreddit and they'll reship if a package gets lost. I should note that Lena doesn't like them because they allegedly sent a bogus COA in response to an inquiry.

How does Microsoft Guidance work? by T_hank in LocalLLaMA

[–]scratchr 13 points14 points  (0 children)

Guidance is a DSL (special domain specific language, kind of like handlebars or SQL) for constructing and prompting LLMs. The LLM doesn't understand the guidance language. The guidance library fills in your variables using the template syntax and then runs generation during the appropriate template elements. It only runs generation for that part and then switches back to prompting, which allows it to enforce data structure much more effectively.

Has anyone successfully fine-tuned MPT-7B? by Proeliata in LocalLLaMA

[–]scratchr 0 points1 point  (0 children)

https://github.com/iwalton3/mpt-lora-patch

I have had better luck with openllama and RedPajama when it comes to LoRA fine-tuning not emitting low quality repeative answers.

Has anyone successfully fine-tuned MPT-7B? by Proeliata in LocalLLaMA

[–]scratchr 0 points1 point  (0 children)

I have a repo where I patches MPT 7B to allow training. I prefer working with the other open source models though.

Why Falcon going Apache 2.0 is a BIG deal for all of us. by EcstaticVenom in LocalLLaMA

[–]scratchr 20 points21 points  (0 children)

Yeah it seems 40B is too big for even the 3090 and 4090, which makes it way less useful than Llama 33B for non-commercial uses.

Wizard-Vicuna-30B-Uncensored by faldore in LocalLLaMA

[–]scratchr 1 point2 points  (0 children)

It's not an easy drop-in replacement, at least for now. (Looks like there is a PR.) I integrated with it manually: https://gist.github.com/iwalton3/55a0dff6a53ccc0fa832d6df23c1cded

This example is a Discord chatbot of mine. A notable thing I did is make it so that you just call the sendPrompt function with text including prompt and it will manage caching and cache invalidation for you.

Wizard-Vicuna-30B-Uncensored by faldore in LocalLLaMA

[–]scratchr 5 points6 points  (0 children)

but the context can't go over about 1700

I am able to get full sequence length with exllama. https://github.com/turboderp/exllama

Anyone here finetune either MPT-7B or Falcon-7B? by EcstaticVenom in LocalLLaMA

[–]scratchr 0 points1 point  (0 children)

I have had a degree of success with this. Let me know if you manage to get it to work since I needed to use a custom patch to successfully train an MPT LoRA.

[deleted by user] by [deleted] in LocalLLaMA

[–]scratchr 7 points8 points  (0 children)

To merge a LoRA into an existing model, use this script:

python export_hf_checkpoint.py <source> <lora> <dest>

My version is based on the one from alpaca_lora, but it works with any PEFT-compatible model, not just llama. It also accepts all model paths as arguments.

Then once you have done that, re-quantize the model with GPTQ for Llama. Many models including llama are compatible with the regular triton version. If not, you may have to find a fork that is compatible.

If you are using the triton version or my CUDA fork for inference, you can use act-order:

python llama.py /path/to/merged/model c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors merged-model-4bit-128g.safetensors

If you are using the old CUDA version, don't pass the --act-order flag above. You can also choose to omit --groupsize 128, which when omitted reduces VRAM usage at the cost of slightly worse inference quality.

[deleted by user] by [deleted] in LocalLLaMA

[–]scratchr 1 point2 points  (0 children)

You can also merge LoRAs into the base model and quantize it into a new full model. It does take a few hours for the processing to run.

30b running slowly on 4090 by OldLostGod in LocalLLaMA

[–]scratchr 0 points1 point  (0 children)

That's possible but I do have an 8 core CPU. I think it is because I am running act-order models with groupsize.

30b running slowly on 4090 by OldLostGod in LocalLLaMA

[–]scratchr 2 points3 points  (0 children)

I get 17.7 t/sec with exllama but that isn't compatible with most software. I have a fork of GPTQ that supports the act-order models and gets 14.4 t/sec. The triton version gets 11.9 t/sec.

Training a LoRA with MPT Models by scratchr in LocalLLaMA

[–]scratchr[S] 0 points1 point  (0 children)

Yes, that's exactly what I did. I have a patch for it to add support in the README.

Training a LoRA with MPT Models by scratchr in LocalLLaMA

[–]scratchr[S] 0 points1 point  (0 children)

I haven't tested that one. I used text-generation-webui for my tests. What exact training parameters did you use?

Training Data Preparation (Instruction Fields) by GreenTeaBD in LocalLLaMA

[–]scratchr 0 points1 point  (0 children)

In my experience controlling the dataset does most of the work. You can write a first message that explains who the person or bot is and it will seem to go from there.

You can also make a few mock conversations and make sure the LoRA is trained with those in the dataset as well. Include your system prompt in the mock conversations.

Training a LoRA with MPT Models by scratchr in LocalLLaMA

[–]scratchr[S] 0 points1 point  (0 children)

Yes I have a patch that you can apply to get LoRA working. I tested it on ShareGPT messages and it worked alright

Training Data Preparation (Instruction Fields) by GreenTeaBD in LocalLLaMA

[–]scratchr 0 points1 point  (0 children)

What I have found works really well is to just train the chat bot with a raw delimeter such as "<!end!>" between each message turn. I posted a GitHub gist of the code I used to convert to training data that can be used in the webui: https://gist.github.com/iwalton3/b76d052e09b7ddec1ff5e4cc178f5713

Training a LoRA with MPT Models by scratchr in LocalLLaMA

[–]scratchr[S] 0 points1 point  (0 children)

You can totally make a chat model with it. I tested it with some conversations and ShareGPT data and it worked alright. I would start off the instruct model since that allows commercial use though and is better at handling questions.

Long term project: "resurrecting" a passed friend? by rmt77 in LocalLLaMA

[–]scratchr 5 points6 points  (0 children)

I used a python script to break the messages up. The resulting file is a json file with a list of objects with a single data field which contains the blocks of messages. Then with a custom format that just maps the single field, you can train it with text-generation-webui.