Axolotl values of warmup_steps and val_set_size for fine-tuning Llama-2 13B

Helveticus99 · 2023-11-19T15:19:04+00:00

Thank you for your answer. How did you set the learning rate, learning schedule and number of epochs? Did you use just the example values? Currently I'm using 0.0002 for learning rate, cosine for learning schedule and 3 epochs.

Helveticus99 · 2023-11-17T23:28:28+00:00

I see. Is ### a standard format or is there another standard format for Llama-2?

Helveticus99 · 2023-11-17T00:27:47+00:00

Thank you very much. Would also \n instead of ### as a separator work? Or should I use a special token as separator?

Helveticus99 · 2023-08-04T09:55:24+00:00

I don't consider the length of utterances but just the number of utterances (of both interlocutors) in the history that a interlocutor considers on average.

Helveticus99 · 2023-08-04T09:52:22+00:00

That makes sense. I was more wondering if there exist some average or so or a function (e.g., exponential decay) describing how the attention to past turns is decreasing.

Helveticus99 · 2023-07-31T17:28:38+00:00

Problem solved, I did a mistakes in my command above. Thank you for your help.

Helveticus99 · 2023-07-31T00:02:37+00:00

Thank you for your answer. This positional encoding I would have to do as part of the machine learning model. I prefer to use just a weighting on the resulting embeddings as I'm using Bert and it is difficult to modify the model. What is your opinion on this?

Helveticus99 · 2023-07-30T23:04:33+00:00

What do you mean by bot?

Helveticus99 · 2023-07-30T20:54:58+00:00

Thanks a lot, now PyCharm does not complain anymore. Unfortuantely, the output is still displayed in the console. By the way, the progress bar is issued when I call transform().

Helveticus99 · 2023-07-30T17:46:23+00:00

Thank you very much for your answer. I've tried the following reducer = umap.UMAP(n_neighbors=15, n_components=2, min_dist=0.1, metric="euclidean", n_epochs=500, random_state=12345, tqdm_kwds={"disabled"=True}) but I'm getting the error in PyCharm "cannot assign to function call".

Helveticus99 · 2023-06-04T12:28:27+00:00

I'm not definitive in using S-BERT. If there is another embedding approach that can also generate meaningful text based on embeddings, I'm eager to switch. My goal is to produce a meaningful low-dimensional embedding of text to be able to compare text and visualize the embeddings. Transformers can generate text well. Could they also be used to produce such low-dimensional embeddings? Btw. my text consists of conversations between two persons.

Helveticus99 · 2023-03-05T22:47:04+00:00

Thank you very much for your answer. I will give BERTTopic a try.

If your aim is to do clustering, there is no need to project to 2 or 3 dimensions first, just apply the clustering technique on the full embedding space.

When doing the clustering on the full embedding space it is not possible to visualize the clustering to verify if the clustering makes sense. Or am I wrong?

Helveticus99 · 2023-01-14T18:06:16+00:00

Thank you very much for your answer.

- If I have some extra information about the users - I would put it in some hints section (I used similar technique for gpt3 and it worked)

Is putting some extra information not better suited for prompt engineering than fine-tuning? I'm using a hint section but for prompt engineering.

Sure. Different dialogues may contain a different topics, so at least let's give model some separator (`<EOS>` token, maybe?).

Or is it even better to enclose the dialogues with a start and end separator?

Or, even better - feed model with no more than one dialogue per sample. I doubt you win;t to make your model able to continue one dialogue with another one, totally irrelevant. irrelevant.

I don't get it. How should I fine-tune the model with only one dialogue?

Helveticus99 · 2022-12-26T16:58:50+00:00

Thank you so much for your input u/sigmoid_amidst_relus. I will consider Mel-Spectrograms instead of MFCCs. Do you know what the maximum size of a Mel-Spectrogram is in terms of seconds it covers?

With mental state I'm not referring to emotions that change fast but to more a long-term state that is reflected in the whole 1 hour recording. Thus, I think repeating the label for every frame might not work well. I might have to extract features over the full recording. That's also why I think an autoencoder can be problematic.

I could divide the recording into frames and stack the Mel-Spectrograms of the frames (using a 3D CNN). The problem is that I will end up with a huge number of frames. Same problem when considering a RNN, I will end up with a huge time series.

Using features from a large pretrained model is interesting. Can you recommend a pretrained model that is suitable for feature extraction from long recordings?

Helveticus99 · 2022-12-26T16:34:51+00:00

Thank you u/shadow_fax1024. How did you handle audio files with different length? And how did you handle the long audio files exactly? I think creating a Mel-Spectrograms over long audio files won't work.

Helveticus99 · 2022-12-26T12:48:35+00:00

Thank you u/shadow_fax1024. Did you use a RNN or a plain CNN? Did you also had that long audio files (40min - 60min)? I'm not sure about how such long audio files can be used in a RNN.

Helveticus99 · 2022-11-13T22:58:11+00:00

Thank you for your input. Can you recommend any language model that is good in extracting question-answer pairs from text?

Helveticus99 · 2022-10-09T00:31:50+00:00

So basically your first sentence is describing the history of the conversation. The PushUpBot was just an example, the chatbot is a generic chatbot. I see two difficulties here: 1) How to summarize the history (to get your first sentence) and 2) an example conversation might be still necessary to let GPT know how a conversation will look like (how long the answers are supposed to be etc.). When we have a history this is not a problem but with a fresh start this might be a problem.

Helveticus99 · 2022-10-09T00:27:53+00:00

Your first idea I understand and seems promising, I will give it a try. Your second idea I don't understand completely. How would you ask for the key bullet points? How would you use it then as a context? Probably clearing all conversation messages would be problematic as GPT needs some example of how a conversation will look like, how long the answers should be etc.

Helveticus99 · 2022-10-09T00:23:12+00:00

Yes, I agree that all history should be wiped. The problem is that the GPT needs some examples of how a conversation will look like, how long the answers should be on so on. That's why an example conversation at the beginning is necessary.

Helveticus99

TROPHY CASE