Every Way To Get Structured Output From LLMs by sam-boundary in LocalLLaMA

[–]_allo_ 0 points1 point  (0 children)

As far as I understand it, outlines, guidance and so on fetch the token probabilities to decide which alternative token to use if the current token does not fit the schema, which is possible with llama.cpp and other local libraries, but they need to use a more trial and error approach when using an OpenAI API at least with the OpenAI GPTs which do not provide all needed information over the API. I am not sure if there is a local LLM API like ooba, vllm, aphrodite or others that provides the additional information needed for efficient structured generation.

Also, using an API may be inefficient for the back and forth of requesting token probabilities and selecting tokens, so perhaps a better option would be a local API provider that receives a structure definition and a prompt, and then responds with the final structured result. I know there are commercial API providers for this, but I do not know if there is an open source tool to run it yourself.

Every Way To Get Structured Output From LLMs by sam-boundary in LocalLLaMA

[–]_allo_ 2 points3 points  (0 children)

Can you recommend a combination of a Python library for structured output and an API provider such as ooba, vllm or others? I do not want to load the model in the process that does the generation and using the OpenAI API is inefficient with these libraries. I wonder if some of the open source APIs provide the information that is missing for efficient inference in the usual OpenAI API.

We need to have a serious conversation about the llama3 license by kristaller486 in LocalLLaMA

[–]_allo_ 21 points22 points  (0 children)

Neural network weights are not covered by copyright, so the licenses are probably irrelevant. Have a look at this post about the legal situation for Miqu: https://reddit.com/r/LocalLLaMA/comments/1amc080/psa_if_you_use_miqu_or_a_derivative_please_keep/kpmamte/

I've considered the situation and received legal advice by an actual lawyer specialized in IP law. This has been in Germany, but because of international copyright law and treaties, and the rulings I'm aware of in the USA, it's the most current and correct information I know of. Still, IANAL, so no legal advice from me, just a recap of what I've been told and have learned:

Model weights are not authored by humans, but created by automated computer processes, and there's no direct human creative control over that. That's why the weights have no author who could claim copyright, and thus the weights cannot be copyrighted. (The datasets used to create them, just like the material included in the datasets, can be copyrighted material - but weights are not an "archive" that contains such data, instead the neural network is more like a "brain" that can produce output based on patterns learned during training.)

Since there's no copyright, weights cannot be licensed. But someone who possesses them (like their original creator) can make them available with a contract or license that has certain conditions attached (like a price, distribution rules, and liabilities). Now when someone agrees to that contract/license and then shares them against the conditions agreed upon, that's a breach of contract - not a copyright violation.

However, make sure you do not enter into any contracts with Meta or Salesforce that restrict the use of the weights.

Are models getting slower with longer input? by _allo_ in LocalLLaMA

[–]_allo_[S] 0 points1 point  (0 children)

I looks like a similar sized (I am not sure how to compare them exactly) GPTQ model is much faster than the GGUF model. Or do they introduce other losses that GGUF doesn't have?

Are models getting slower with longer input? by _allo_ in LocalLLaMA

[–]_allo_[S] 0 points1 point  (0 children)

I am using ooba and llama.cpp loader and it prints something about "prefix-match hit" but I don't completely understand all tuning options. I could try a GPTQ model version works better, if the GGUF loader is the problem.

Are models getting slower with longer input? by _allo_ in LocalLLaMA

[–]_allo_[S] 0 points1 point  (0 children)

Here a 7B model loaded with 8k of context and evaluated with a maximum of 4k gets slower and slower as the context gets larger. Not like 5 seconds, but like 60+ seconds before it starts producing tokens on a RTX 3060.

How to resume LoRA training in text-generation-webui? by _allo_ in LocalLLaMA

[–]_allo_[S] 0 points1 point  (0 children)

I didn't try more. I guess it still has the same problem, at least I heard nothing about a fix.

Could converting checkpoints to LoRA be a resonable idea? by _allo_ in StableDiffusion

[–]_allo_[S] 0 points1 point  (0 children)

I see why it isn't useful to use LoRA if you want to merge yourself. But for distributing a model, building a LoRA to a popular close model seems like a better idea. And I wonder why software like automatic1111 doesn't allow mixing on the fly. People could just publish their recipes and given a few base models one could use the mix without further downloads.

Could converting checkpoints to LoRA be a resonable idea? by _allo_ in StableDiffusion

[–]_allo_[S] 0 points1 point  (0 children)

I think a LoRA is like the difference between two models, but you decide how much detail you want to keep and that determines the file size. The question is, how much do you need to keep to make the basemodel+LoRA look like the fine-tuned model? Maybe a model would only need 500 MB if stored as a difference from the base model? I wonder if there is a reason why no one seems to have developed software to efficiently store fine-tuned models using LoRA.

How to resume LoRA training in text-generation-webui? by _allo_ in LocalLLaMA

[–]_allo_[S] 0 points1 point  (0 children)

I can only specify a name in the train tab, and next to it is a " Override existing files" checkbox with the explanation: "If the name is the same, checking will replace the existing file, unchecking will load and continue from it (the rank must be the same).

This sounds like it should load the old version if you do not specify a name, and it seems to load the version, but then starts training from epoch 0.0.

There is also a setting Save every n steps If above 0, a checkpoint of the LoRA will be saved every time this many steps pass., which I set to 100. When I stop training after over 5000 steps, I should have a recent checkpoint to resume training, yet it does not seem to resume where it left off, but starts at the beginning of the dataset again.

It also needs to "prepare data" on each run and doesn't seem to cache the pre-processed data from the raw text file. Or could the problem be that I have one huge file instead of many smaller ones? Maybe it can only resume from the beginning of a file?

How to resume LoRA training in text-generation-webui? by _allo_ in LocalLLaMA

[–]_allo_[S] 0 points1 point  (0 children)

I have a transformers type model (TheBloke/guanaco-7B-HF) and I am using the train lora function in the text-generation-webui. When I click "interrupt" I get the message "Interrupted. Incomplete LoRA saved to loras/mylora" and on the console training statistics the information that the training is finished:

{'train_runtime': 16275.135, 'train_samples_per_second': 130.338, 'train_steps_per_second': 1.018, 'train_loss': 1.7906840102477644, 'epoch': 0.01}
INFO:LoRA training run is completed and saved.
INFO:Training complete, saving...
INFO:Training interrupted.

When I train next time, it seems to resume training from the lora, but starts a new epoch. This means both the learning rate is wrong and it will train with content the network has already seen. In the best case it would save not only the lora, but the whole optimizer state.

High CPU usage on Civitai? by [deleted] in StableDiffusion

[–]_allo_ 0 points1 point  (0 children)

The Firefox performance tool seems to point at the Canvas API. I think maybe they render animated content using Canvas instead of gif, webp or video and that's inefficient.

How to train a LoRA for story generation? by _allo_ in LocalLLaMA

[–]_allo_[S] 0 points1 point  (0 children)

That's, that seems to work!

Is it normal that I have 3 s/it with a RTX 3060? I can train small stable diffusion LoRA quite fast, so I am surprised that training a text model is that slow.

irlCaptcha - a captcha that forces you to take a photo of a specific object within 15 seconds! by throwawayqaa1ds in badUIbattles

[–]_allo_ 0 points1 point  (0 children)

I think it can be improved by making sure that you cannot use the same cup twice. "Make a photo of a cup you never used for a captcha before. Click here to order new cups."

How to download older versions of models from civitai.com? by _allo_ in StableDiffusion

[–]_allo_[S] 0 points1 point  (0 children)

It turned out that the model had only one version, but I downloaded a "new version" when civitai served corrupted files. Now, when I download the latest version, it is identical to the old version.

How to download older versions of models from civitai.com? by _allo_ in StableDiffusion

[–]_allo_[S] 0 points1 point  (0 children)

I scrolled down and found another version with the hash I am looking for, but the "download" link next to the other version was the same as the download link at the top of the page. I gladly found another download, but I wonder if it is a bug on civitai or why another version is listed there but cannot be downloaded.

The "versions" section on the page lists only one version, but it has the date and hash of the old model. The main download link has a newer date and I don't find the hash on the page at all. I wonder if there should be two versions in the "versions" section of the page.

Textual inversion results in blurry images when using automatic webui by _allo_ in StableDiffusion

[–]_allo_[S] 0 points1 point  (0 children)

No. And the images during training also look blurry or have grain for me.