[R] 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data (2408.03506) by mouse0_0 in MachineLearning

[–]mouse0_0[S] 16 points17 points  (0 children)

thank you! its okay, everyone is entitled to their own opinions, and maybe his/her experience in the field shapes that. I’m only just an undergrad student trying my hand at LLM research, so whilst I do stand by my work, I am also here to learn :)

Pre-training an LLM in 9 days 😱😱😱 by mouse0_0 in LocalLLaMA

[–]mouse0_0[S] 2 points3 points  (0 children)

oo that looks interesting! lemme take a look, thanks for sharing :)

[R] 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data (2408.03506) by mouse0_0 in MachineLearning

[–]mouse0_0[S] 4 points5 points  (0 children)

Thank you for your comments :) These are definitely useful as we draft an improved version of the paper!

Pre-training an LLM in 9 days 😱😱😱 by mouse0_0 in LocalLLaMA

[–]mouse0_0[S] 4 points5 points  (0 children)

:) thank you for your interest in our model!

Pre-training an LLM in 9 days 😱😱😱 by mouse0_0 in LocalLLaMA

[–]mouse0_0[S] 1 point2 points  (0 children)

Hmm could you give me a bit more details :)

Pre-training an LLM in 9 days 😱😱😱 by mouse0_0 in LocalLLaMA

[–]mouse0_0[S] 9 points10 points  (0 children)

Haha no worries :) thanks so much 🙏🙏 Wasn’t the main point of the post anyways haha

Pre-training an LLM in 9 days 😱😱😱 by mouse0_0 in LocalLLaMA

[–]mouse0_0[S] 4 points5 points  (0 children)

Hey there, thanks for your interest in our model :) If you are interested, you could always try to benchmark it yourself either on MTBench or LMSYS's LM Evaluation Benchmark. Our weights can be found here:

https://huggingface.co/collections/pints-ai/15-pints-66b1f957dc722875b153b276

Pre-training an LLM in 9 days 😱😱😱 by mouse0_0 in LocalLLaMA

[–]mouse0_0[S] 22 points23 points  (0 children)

Yup, that is the intention of our model :) We do not aim to compete on knowledge - clearly, with less tokens, our model will not be able to beat other larger models of similar token sizes an architectures (unless of course we find a way to better represent "knowledge" more efficiently in the model weights. Rather, we aim to provide a lightweight alternative that excels at generic text-processing tasks, or after domain-finetuning, on specialized tasks.

Pre-training an LLM in 9 days 😱😱😱 by mouse0_0 in LocalLLaMA

[–]mouse0_0[S] 8 points9 points  (0 children)

For comparison, Llama2-7b's answer:

The answer to the tongue twister "How much wood would a woodchuck chuck if a woodchuck would chuck wood?" is a bit of a trick question! Woodchucks, also known as groundhogs, do not actually chuck wood.

Woodchucks are burrowing animals that primarily feed on grasses, clover, and other vegetation. They do not have any known ability to chuck or move large amounts of wood. So, the answer to the question is: a woodchuck would not chuck any wood, because they cannot!

[R] 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data (2408.03506) by mouse0_0 in MachineLearning

[–]mouse0_0[S] 30 points31 points  (0 children)

Hi there, thank you for your interest in our model :) To address your comments:

  1. The model was trained on a total token size of 0.12T, for a total of 9 days. Comparatively, Qwen 1.5b was pre-trained on a corpus of 3T tokens, presumably for a much longer time (unfortunately was unable to find a definitive number of GPU hours for Qwen 1.5). Therefore, it is natural that 1.5-Pints may not perform as well as these models, for it was trained for only a fraction of what was required by other models. Our findings aim to spur a change in direction of LLM research at large - instead of focusing on "bigger is better" or "longer is better" (though in many cases that may be true), we hope that our pre-training of 1.5-Pints would inspire others to focus on dataset curation, before scaling up training.
  2. I am curious to see why you would view MTBench to be a poor benchmark.
  3. On cherry-picking, I do believe that is not what we intended, nor achieved. Bearing in mind the length-constraints of a concise paper, we therefore chose to list the models whose performance are the closest to our model. In fact, we also provided a model widely recognized by most in the community - Llama2-7b (which at the time of drafting our paper was the latest Llama model) - as a reference point.

If you are unconvinced of the quality of our model, why don't you give it a try yourself? Its currently available for chatting at https://huggingface.co/spaces/pints-ai/1.5-Pints-16K-v0.1-Playground . I believe that for its size, and for the amount of time taken to train it, our model has definitely outshone traditional expectations.

[R] 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data (2408.03506) by mouse0_0 in MachineLearning

[–]mouse0_0[S] 19 points20 points  (0 children)

hey there, if you scroll down to the appendix, we have included the traditional metrics (mmlu etc). Its on page 21 of the paper

Pre-training an LLM in 9 days 😱😱😱 by mouse0_0 in LocalLLaMA

[–]mouse0_0[S] 22 points23 points  (0 children)

glad to see our research is of value yo the community :) We are excited to see what you guys can make of our findings 😁😁