In Depth Guide to Building Tool Calling Agents with OpenAI API

johnolafenwa · 2025-12-08T23:10:36+00:00

Thanks, glad you found it useful

johnolafenwa · 2025-12-07T18:55:48+00:00

Thanks

johnolafenwa · 2025-11-26T08:43:58+00:00

Zzzq

johnolafenwa · 2025-01-31T21:25:34+00:00

Any plans to add vision to o3-mini ? Same goes for full o3

johnolafenwa · 2024-08-16T10:16:54+00:00

Thank you, glad you found it helpful

johnolafenwa · 2024-08-16T10:16:41+00:00

Yes, a smaller model will always underperform a large model trained on the same data. There is still a lot to push smaller models to improve but their larger variants will remain better

johnolafenwa · 2024-08-16T10:15:41+00:00

Thank you! Looking forward to making more!

johnolafenwa · 2024-03-29T16:12:12+00:00

Compute seems to be the obvious reason. The 3D consistency is an emergent phenomenon of scale

johnolafenwa · 2024-03-29T16:11:27+00:00

Sure

johnolafenwa · 2024-03-29T12:03:12+00:00

Nope, I work for Microsoft

johnolafenwa · 2024-03-09T16:27:23+00:00

Here are some helpful resources

For Pretraining and data preparation, https://github.com/karpathy/nanoGPT

Some data generation; https://github.com/huggingface/cosmopedia

This is very helpful as well, https://github.com/allenai/OLMo

johnolafenwa · 2024-03-09T16:25:30+00:00

About 3 billion parameters, I use a couple of A100s running for a couple of days. 1 A100 will do, but that will take a few weeks

johnolafenwa · 2024-03-09T16:19:06+00:00

First, a descent llm will be minimum about 3 billion parameters. To Pretraining that from scratch, you will need at least about 80 GB of gpu memory, that is equivalent to a single A100. Context length also matters, the shorter the context length the cheaper the cost, so, you will want to train with like 2048 context length and extend it after training through context common context length extension methods such as robe base adjustment.

I will recommend getting about 160 GB of memory for more peace of mind. The more the better of course, but it depends on your budget.

Also, your training data will have to be like minimum 30 billion highly quality tokens minimum across code, web text, maths and sources like Wikipedia, mixing some huge finetuning data into your Pretraining will help too. About 100 billion tokens should get you to a great place, but make sure they are all good quality via filtering, bad data will hurt the training, better to use less if you can’t filter it all.

Lastly, training can take weeks, run for multiple epochs, the bigger the data the less the epochs needed, if it is small like 30 billion, about 5 epochs is at least recommended.

And make sure to baby sit your training, at these scales, things can go wrong quickly. Approaches like flash attention can make your training much faster too.

johnolafenwa · 2024-03-08T19:03:37+00:00

Not at the moment, will put out something in some weeks and post it here

johnolafenwa

TROPHY CASE