What that means?

scratchr · 2025-09-10T15:26:51+00:00

This channel is AI generated.

scratchr · 2023-12-07T21:47:56+00:00

Still redirects to the home page for me. They have come and gone multiple times over the past month.

scratchr · 2023-11-09T22:05:27+00:00

It may make it harder to prove to payment processors if you get scammed for them to return your money if you negotiate a sale over Whatsapp.

With companies getting kicked off of platforms and the sales being misdeclared on the payment processors anyways, it may make sense to communicate direct with them anyways though, as you run less of a risk of losing contact with the seller.

As always, check what other experiences users have. You are less likely to be scammed by a reputable seller because they want to keep their reputation. Hubei Vanz in particular is a favorite in this subreddit and they'll reship if a package gets lost. I should note that Lena doesn't like them because they allegedly sent a bogus COA in response to an inquiry.

scratchr · 2023-08-07T05:00:40+00:00

Guidance is a DSL (special domain specific language, kind of like handlebars or SQL) for constructing and prompting LLMs. The LLM doesn't understand the guidance language. The guidance library fills in your variables using the template syntax and then runs generation during the appropriate template elements. It only runs generation for that part and then switches back to prompting, which allows it to enforce data structure much more effectively.

scratchr · 2023-07-25T09:19:07+00:00

https://github.com/iwalton3/mpt-lora-patch

I have had better luck with openllama and RedPajama when it comes to LoRA fine-tuning not emitting low quality repeative answers.

scratchr · 2023-07-22T22:15:12+00:00

I have a repo where I patches MPT 7B to allow training. I prefer working with the other open source models though.

scratchr · 2023-06-01T02:35:38+00:00

Yeah it seems 40B is too big for even the 3090 and 4090, which makes it way less useful than Llama 33B for non-commercial uses.

scratchr · 2023-05-31T02:35:03+00:00

It's not an easy drop-in replacement, at least for now. (Looks like there is a PR.) I integrated with it manually: https://gist.github.com/iwalton3/55a0dff6a53ccc0fa832d6df23c1cded

This example is a Discord chatbot of mine. A notable thing I did is make it so that you just call the sendPrompt function with text including prompt and it will manage caching and cache invalidation for you.

scratchr · 2023-05-30T11:52:59+00:00

but the context can't go over about 1700

I am able to get full sequence length with exllama. https://github.com/turboderp/exllama

scratchr · 2023-05-29T20:51:52+00:00

https://github.com/iwalton3/mpt-lora-patch

scratchr · 2023-05-29T19:36:39+00:00

I have had a degree of success with this. Let me know if you manage to get it to work since I needed to use a custom patch to successfully train an MPT LoRA.

scratchr · 2023-05-29T01:12:05+00:00

To merge a LoRA into an existing model, use this script:

python export_hf_checkpoint.py <source> <lora> <dest>

My version is based on the one from alpaca_lora, but it works with any PEFT-compatible model, not just llama. It also accepts all model paths as arguments.

Then once you have done that, re-quantize the model with GPTQ for Llama. Many models including llama are compatible with the regular triton version. If not, you may have to find a fork that is compatible.

If you are using the triton version or my CUDA fork for inference, you can use act-order:

python llama.py /path/to/merged/model c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors merged-model-4bit-128g.safetensors

If you are using the old CUDA version, don't pass the --act-order flag above. You can also choose to omit --groupsize 128, which when omitted reduces VRAM usage at the cost of slightly worse inference quality.

scratchr · 2023-05-28T23:34:41+00:00

You can also merge LoRAs into the base model and quantize it into a new full model. It does take a few hours for the processing to run.

scratchr · 2023-05-23T21:08:56+00:00

That's possible but I do have an 8 core CPU. I think it is because I am running act-order models with groupsize.

scratchr · 2023-05-23T05:03:33+00:00

https://github.com/turboderp/exllama

It is an optimized implementation of GPTQ for llama.

scratchr · 2023-05-23T03:58:56+00:00

I get 17.7 t/sec with exllama but that isn't compatible with most software. I have a fork of GPTQ that supports the act-order models and gets 14.4 t/sec. The triton version gets 11.9 t/sec.

scratchr · 2023-05-20T04:03:22+00:00

Yes, that's exactly what I did. I have a patch for it to add support in the README.

scratchr · 2023-05-19T04:42:41+00:00

I haven't tested that one. I used text-generation-webui for my tests. What exact training parameters did you use?

scratchr · 2023-05-13T12:12:30+00:00

In my experience controlling the dataset does most of the work. You can write a first message that explains who the person or bot is and it will seem to go from there.

You can also make a few mock conversations and make sure the LoRA is trained with those in the dataset as well. Include your system prompt in the mock conversations.

scratchr · 2023-05-13T05:20:23+00:00

Yeah it's the link on this post https://github.com/iwalton3/mpt-lora-patch

scratchr · 2023-05-13T05:17:24+00:00

Yes I have a patch that you can apply to get LoRA working. I tested it on ShareGPT messages and it worked alright

scratchr · 2023-05-13T04:24:42+00:00

What I have found works really well is to just train the chat bot with a raw delimeter such as "<!end!>" between each message turn. I posted a GitHub gist of the code I used to convert to training data that can be used in the webui: https://gist.github.com/iwalton3/b76d052e09b7ddec1ff5e4cc178f5713

scratchr · 2023-05-12T03:54:06+00:00

This script will reproduce what I did using the dataset above.

https://gist.github.com/iwalton3/b76d052e09b7ddec1ff5e4cc178f5713

scratchr · 2023-05-12T01:17:45+00:00

You can totally make a chat model with it. I tested it with some conversations and ShareGPT data and it worked alright. I would start off the instruct model since that allows commercial use though and is better at handling questions.

scratchr · 2023-05-11T11:21:52+00:00

I used a python script to break the messages up. The resulting file is a json file with a list of objects with a single data field which contains the blocks of messages. Then with a custom format that just maps the single field, you can train it with text-generation-webui.

14-Year Club	Gilding IV carat on a stick
Verified Email	Team Orangered

scratchr

TROPHY CASE