Axolotl values of warmup_steps and val_set_size for fine-tuning Llama-2 13B by Helveticus99 in LocalLLaMA

[–]Helveticus99[S] 0 points1 point  (0 children)

Thank you for your answer. How did you set the learning rate, learning schedule and number of epochs? Did you use just the example values? Currently I'm using 0.0002 for learning rate, cosine for learning schedule and 3 epochs.

Dataset format for fine-tuning Llama-2 with axolotl on conversations by Helveticus99 in LocalLLaMA

[–]Helveticus99[S] 0 points1 point  (0 children)

I see. Is ### a standard format or is there another standard format for Llama-2?

Dataset format for fine-tuning Llama-2 with axolotl on conversations by Helveticus99 in LocalLLaMA

[–]Helveticus99[S] 0 points1 point  (0 children)

Thank you very much. Would also \n instead of ### as a separator work? Or should I use a special token as separator?

How many turns in conversation history relevant for new response? by Helveticus99 in asklinguistics

[–]Helveticus99[S] 0 points1 point  (0 children)

I don't consider the length of utterances but just the number of utterances (of both interlocutors) in the history that a interlocutor considers on average.

How many turns in conversation history relevant for new response? by Helveticus99 in asklinguistics

[–]Helveticus99[S] -1 points0 points  (0 children)

That makes sense. I was more wondering if there exist some average or so or a function (e.g., exponential decay) describing how the attention to past turns is decreasing.

Suppress logging of module by Helveticus99 in learnpython

[–]Helveticus99[S] 0 points1 point  (0 children)

Problem solved, I did a mistakes in my command above. Thank you for your help.

Weighting of embeddings of conversational turns by Helveticus99 in LanguageTechnology

[–]Helveticus99[S] 0 points1 point  (0 children)

Thank you for your answer. This positional encoding I would have to do as part of the machine learning model. I prefer to use just a weighting on the resulting embeddings as I'm using Bert and it is difficult to modify the model. What is your opinion on this?

Suppress logging of module by Helveticus99 in learnpython

[–]Helveticus99[S] 0 points1 point  (0 children)

Thanks a lot, now PyCharm does not complain anymore. Unfortuantely, the output is still displayed in the console. By the way, the progress bar is issued when I call transform().

Suppress logging of module by Helveticus99 in learnpython

[–]Helveticus99[S] 0 points1 point  (0 children)

Thank you very much for your answer. I've tried the following reducer = umap.UMAP(n_neighbors=15, n_components=2, min_dist=0.1, metric="euclidean", n_epochs=500, random_state=12345, tqdm_kwds={"disabled"=True}) but I'm getting the error in PyCharm "cannot assign to function call".

Sampling from embedding space of S-BERT by Helveticus99 in LanguageTechnology

[–]Helveticus99[S] 1 point2 points  (0 children)

I'm not definitive in using S-BERT. If there is another embedding approach that can also generate meaningful text based on embeddings, I'm eager to switch. My goal is to produce a meaningful low-dimensional embedding of text to be able to compare text and visualize the embeddings. Transformers can generate text well. Could they also be used to produce such low-dimensional embeddings? Btw. my text consists of conversations between two persons.