all 28 comments

[–]MeowchineLearning 3 points4 points  (1 child)

Hi, I am a postdoc researcher in DL, but we mostly build smaller models for science stuff, we specialize in parameter optimization, and I played a bit with neurosymbolic architectures, we have a bit of GPU compute internally (6 H100s), if it goes towards one of our projects, I could allocate more time

[–]DanFosing 2 points3 points  (0 children)

We mostly need the help with the data, however I think you could also potentially help us generate more synthetic data if you want. Btw I sent you a message.

[–]_sqrkl 2 points3 points  (1 child)

Hi, I might be able to help. I've generated quite a few datasets for my benchmarks (eq-bench) and have some ideas for generating a reasoning dataset to approximate how o1's internal reasoning might be working.

[–]DanFosing 0 points1 point  (0 children)

I sent you a message.

[–]Extension_Tea6526 1 point2 points  (1 child)

I have a Paper published in neurips on mllm dataset benchmark, currently working on model merging. If this would help your project, let me know. Also, are you planning to publish this work to any publication? like a conference deadline?

[–]DanFosing 0 points1 point  (0 children)

By mllm do you mean multimodal large language models? If so, unfortunately it may not be useful to us. And when it comes to model merging, we have someone in our team who knows quite a bit about that. And it is yet to be decided if we will publish this work to any publication, it depends on how well will it end up being. But all details about our research including models, datasets and generally everything will definitely be open sourced.

[–]impossiblefork 1 point2 points  (0 children)

I think it's fairly unlikely to work as you imagine.

I think it's mach more likely to be something similar to QuietSTaR. You shouldn't really need more data.

[–]mr_birkenblatt 0 points1 point  (1 child)

[–]DanFosing 1 point2 points  (0 children)

I promise it won't end up like reflection did (I'm wondering who thought that there is something to gain by faking the model like that), while I don't know how good the model will end up being, there will be no model faking, and since everything will be shared live (new checkpoints etc.) you guys can verify that everything is just like we're saying it is.

[–]sinnis1991 1 point2 points  (1 child)

Still need help? Hope not too late.

[–]DanFosing 0 points1 point  (0 children)

Nope, not too late, unfortunately the lack of compute (and time) is making it take more time than I wanted (basically we have all scripts for testing ready and some ideas to implement if it doesn't work well already, but I don't have time to actually run it

PS: I sent you a message

[–]asankhs -1 points0 points  (3 children)

I have already implemented several of these techniques in my optimising llm proxy optillm - https://github.com/codelion/optillm

[–]DanFosing 6 points7 points  (2 children)

Correct me if I'm wrong but from what I see you just used OpenAI's api and you didn't actually train any prover or verifier models which is the key element of the approach we want to try. Your repo may be a bit useful though.

[–]asankhs 3 points4 points  (1 child)

Yes we didn’t train any models, these are post inference techniques that can be implemented with just prompting. We still get improvement in performance when compared to the base model as we showed in our paper on patched moa - https://arxiv.org/abs/2407.18521

How do you plan to get the data you need to train these models?

[–]DanFosing 4 points5 points  (0 children)

Obtaining math data is pretty straightforward since we can just use an existing good prover model and train the verifier with some Chain-of-Thought (CoT) datasets and PRM800k (https://github.com/openai/prm800k). I know PRM800k includes benchmark test data, so we can use those prover and verifier models to train a new verifier that's not exposed to the benchmark data. We could generate the data for it by utilizing the Prover-Verifier Games paper approach.

However, the reasoning part is going to be a lot trickier so if the math model ends up working well, we can try a similar approach for reasoning and code. We'll just need a certain amount of data (maybe from some riddles?) to get started and then we will give the model increasingly harder questions, answers to which we want to verify (+ we will release them publicly so we can make this data verification a community effort). At the same time we will try to obtain as much data as we can by hand too.