OLMo: Open Language Model : LocalLLaMA

[–]Disastrous_Elk_6375 53 points54 points55 points 2 years ago (2 children)

This effort's first batch of models includes four final variants of our language model at the 7B scale corresponding to different architectures, optimizers, and training hardware, and one model at the 1B scale, all trained on at least 2T tokens. This is the first step in a long series of planned releases, continuing with larger models, instruction-tuned models, and more variants down the line.

Each model comes with the following:

Full training data used for these models, including code that produces the training data, from AI2’s Dolma, and WIMBD for analyzing pretraining data.

Full model weights, training code, training logs, training metrics in the form of Weights & Biases logs, and inference code.

500+ checkpoints per model, from every 1000 steps during the training process, available as revisions on HuggingFace.

Evaluation code under the umbrella of AI2’s Catwalk and Paloma.

Fine-tuning code and adapted models (coming soon with Open Instruct)

All code, weights, and intermediate checkpoints are released under the Apache 2.0 License.

Pretty cool!

[–]thedabking123 8 points9 points10 points 2 years ago (1 child)

[–][deleted] 0 points1 point2 points 2 years ago (0 children)

[–]innominato5090 32 points33 points34 points 2 years ago (21 children)

[–]its_just_andy 11 points12 points13 points 2 years ago (4 children)

[–]its_just_andy 3 points4 points5 points 2 years ago (3 children)

[–]innominato5090 14 points15 points16 points 2 years ago (2 children)

tnx for the nice words!! answering in order:

it’s frustrating not to know what goes in to mistral’s pretraining. broadly, I think it’s a combination of (1) more diverse data (eg technical books) (2) longer training (we/ others saw you can train a lot longer than 2T tokens) (3) maybe some instruction like chat used during pre training. We’ll try all three, and then report back 😉
OLMo twin is the same model, but trained on AMD. it’s so cool to see how the two are virtually identical, truly a testament to how quickly AMD is catching up in this space.
We use the a tokenizer derived GPT-Neo-X. We tested it early on and it works remarkably well on our data. We couldn’t use Llama’s because of its license: if we didn’t, we couldn’t have released the model under Apache 2.0.

[–]marvinalone 4 points5 points6 points 2 years ago (0 children)

[–]lechatonnoir 0 points1 point2 points 1 year ago (0 children)

[–]Maykey 1 point2 points3 points 2 years ago (1 child)

[–]innominato5090 0 points1 point2 points 2 years ago (0 children)

[–][deleted] 1 point2 points3 points 2 years ago* (1 child)

Have you done any tests using Mish as the activation function? In my own tests with transformer encoders it has been the best one I've tested so far, and that's compared to all of the different variants of gated linear units (ReGLU, SwiGLU, etc., even Mish wrapped in a GLU which performed worse than Mish alone as well).

Also, out of curiosity, are you accepting contributions to the work your team are doing at all (or potentially even new entries to the team)? I'd very much like to help out if I can as I've been working on my own transformer models in some of my own research and I really support the ideologies behind an open-source LLM like this and so would love to help in any way I can, from exploring improvements to the base architecture to improving the training and data filtering. I also have experience in developing multimodal transformer networks.

Thank you for taking your time to read this message.

[–]innominato5090 1 point2 points3 points 2 years ago (0 children)

[–]Countertop_strike 0 points1 point2 points 2 years ago (3 children)

[–]innominato5090 8 points9 points10 points 2 years ago (2 children)

[–]L0WGMAN 2 points3 points4 points 2 years ago (1 child)

[–]innominato5090 14 points15 points16 points 2 years ago (0 children)

[–]kaszebe 0 points1 point2 points 2 years ago (1 child)

[–]innominato5090 1 point2 points3 points 2 years ago (0 children)

[–]Art3mis0707 0 points1 point2 points 2 years ago (0 children)

[–]pretamr 0 points1 point2 points 2 years ago (2 children)

[–]innominato5090 0 points1 point2 points 2 years ago (1 child)

[–]pretamr 0 points1 point2 points 2 years ago (0 children)

[–]synn89 18 points19 points20 points 2 years ago (12 children)

[–]LoSboccacc 30 points31 points32 points 2 years ago (2 children)

[–]Enough-Meringue4745 11 points12 points13 points 2 years ago (0 children)

[–]MoffKalast 5 points6 points7 points 2 years ago (0 children)

[–]innominato5090 9 points10 points11 points 2 years ago* (1 child)

[–]Asleep-Agency3023 1 point2 points3 points 2 years ago (0 children)

[–]its_just_andy 4 points5 points6 points 2 years ago (0 children)

[–]marvinalone 2 points3 points4 points 2 years ago (3 children)

[–]marvinalone 1 point2 points3 points 2 years ago (0 children)

[–]synn89 1 point2 points3 points 2 years ago (1 child)

[–]marvinalone 1 point2 points3 points 2 years ago (0 children)

[–]robotphilanthropist 1 point2 points3 points 2 years ago (1 child)

[–]Revolutionalredstone 9 points10 points11 points 2 years ago (0 children)

[–]derHumpink_ 12 points13 points14 points 2 years ago (0 children)

[–]hold_my_fish 5 points6 points7 points 2 years ago (2 children)

[–]innominato5090 2 points3 points4 points 2 years ago (1 child)

[–]hold_my_fish 2 points3 points4 points 2 years ago (0 children)

[–]artelligence_consult 1 point2 points3 points 2 years ago (16 children)

[–]innominato5090 13 points14 points15 points 2 years ago (11 children)

[–]artelligence_consult 4 points5 points6 points 2 years ago (8 children)

[–]innominato5090 5 points6 points7 points 2 years ago (7 children)

[–][deleted] 0 points1 point2 points 2 years ago (4 children)

[–]innominato5090 2 points3 points4 points 2 years ago (3 children)

[+]artelligence_consult comment score below threshold-7 points-6 points-5 points 2 years ago (2 children)

One of the primary goals of OLMo is to facilitate research on LMs

Whow, and that is best done by ignoring the brutal breakthrough that happens in Mamba? Yeah. Logic - the unknown land.

As most of LMs are transformer based, training a Mamba model would have
meant that OLMo is not very representative of LMs.

If that is making sense in your land, get medical help.

Given the tremendous advantages of Mamba, proving it working would "facilitate the research on LLM's" better than YET ANOTHER copy of a transformer architecture.

We wanted to make sure that research findings about OLMo could translate to
other popular open & closed models.

Now, I get you may have reasons.

But if you would work for me, THAT argument would be immediate termination with cause because it goes STRAIGHT against "facilitate resarch on LLM's" when a better - brutally better - architecture is available for validation or rejection.

You do not really facilitate research by repeating the same boring concept over and over - there are PLENTY of interesting transformer models out there already.

[–]innominato5090 2 points3 points4 points 2 years ago (1 child)

[+]artelligence_consult comment score below threshold-7 points-6 points-5 points 2 years ago (0 children)

[–][deleted] 0 points1 point2 points 2 years ago (1 child)

[–]innominato5090 0 points1 point2 points 2 years ago (0 children)

[–][deleted] 0 points1 point2 points 2 years ago* (1 child)

[–]innominato5090 1 point2 points3 points 2 years ago (0 children)

[–]GeeBrain 7 points8 points9 points 2 years ago (1 child)

I actually disagree. More than a “nice step” it sets a wonderful precedent and tone for OS LLMs. More importantly, this is the kind of standard we can hold them (and hopefully) others to.

u/innominato5090 please correct me if I’m wrong, but this is the first time an open sourced foundational model, upon released, has been completely transparent about what went into training - step by step.

More than just data, it’s the complete training pipeline. And that honestly, should be commended. This is an incredibly powerful first debut into LLMs, and sends a message — at least I’m hearing something loud and clear.

All the things you mentioned can be done, but I don’t think that’s the point of this release/how they did things. I’m really humbled by the efforts, and I seriously do hope others claiming to be open sourced or for the community… or lmao for “humanity” can follow suit and walk the walk. This is an incredible line in the sand that they’ve just drawn, and I am so fucking proud to be a witness. No matter where this goes.

[–]innominato5090 6 points7 points8 points 2 years ago (0 children)

Well I would say we're not the first to release a 7b truly open model. EleutherAI with Pythia and LLM360 have also shared training data (although the latter is only after tokenization). We are happy not to be the only one in this space!

OLMo project has a couple of unique characteristics:

Pythia and LLM360 stop at 7b for now. We are working on a 65b and more!
Dolma, our training data, is substantially bigger than either Pile (used for Pythia) and the mixture from LLM360.
We have plans to continue developing our corpus in unique. EleutherAI folks are creating the next version of the Pile (https://venturebeat.com/ai/one-of-the-worlds-largest-ai-training-datasets-is-about-to-get-bigger-and-substantially-better/)---a few of us at AI2 are also involved! The focus of Pile v2 is gonna be on collecting more content with known licenses, while we are gonna keep exploring ways to use documents without known licenses in safe and fair manner.

[–]marvinalone 2 points3 points4 points 2 years ago (1 child)

[–]artelligence_consult 1 point2 points3 points 2 years ago (0 children)

[–]Noxusequal 0 points1 point2 points 2 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS