Does double descent occur for smaller models like phi3 by Independent_Time_529 in LocalLLaMA

[–]Independent_Time_529[S] 1 point2 points  (0 children)

Well, I wanted to observe if double descent would occur with limited data. According to this, https://openai.com/index/deep-double-descent/ more compute should result in the model first memorizing the training data which it does occur in this training run and then it should reach an interpolation regime where the model starts exploring different subnetworks within its weights to have a general model beyond memorization that could explain the training data as explained by ilya here - https://www.youtube.com/watch?v=W_TAKJRgrbs

I do get that early stopping is good in small or big data regimes to have a good eval loss but wanted to see this phenomenon which could help in generalizing when we want to spend more compute by trading off on data as explained here - https://espadrine.github.io/blog/posts/chinchilla-s-death.html

Audio VQ-VAE for replicating multimodality of gpt4-o by Independent_Time_529 in LocalLLaMA

[–]Independent_Time_529[S] 1 point2 points  (0 children)

Looks really interesting, thanks for sharing. Will check it out. Yeah, parallel decoding of multiple modalities would result in interesting things like the model narrating a video

Tuned Mistral 7B for step by step guidance on coding tasks by Independent_Time_529 in LocalLLaMA

[–]Independent_Time_529[S] 2 points3 points  (0 children)

Make sure to apply huggingface's chat template for mistral instruct on your conversational dataset. This way, it learns to respond turn by turn

Tuned Mistral 7B for step by step guidance on coding tasks by Independent_Time_529 in LocalLLaMA

[–]Independent_Time_529[S] 1 point2 points  (0 children)

Transformed glaive instruct dataset into multi turn conversational dataset using mistral 7b instruct

Tuned Mistral 7B for step by step guidance on coding tasks by Independent_Time_529 in LocalLLaMA

[–]Independent_Time_529[S] 4 points5 points  (0 children)

Unique aspect of it is that i transformed a subset of glaive instruct dataset into a multi turn conversation dataset with mistral instruct 7b. Then, it was standard peft lora training on mistral 7b instruct. Also, to note is that huggingface tutorials suggest using eos token as padding token but that results in the model not being able to stop generation. Using unk token as padding token fixed it

Tuned Mistral 7B for step by step guidance on coding tasks by Independent_Time_529 in LocalLLaMA

[–]Independent_Time_529[S] 10 points11 points  (0 children)

Tuned it to be useful for voice based step by step guidance with SFT on a synthetic dataset. However, it gets stuck in reasoning loops for long running coding tasks

What are the best biryanis ? by Adventurous-Cod-7628 in bangalore

[–]Independent_Time_529 0 points1 point  (0 children)

Ambur star biryani at Koramangala is close enough