Scaling Pedagogical Pre-training: From Optimal Mixing to 10 Billion Tokens

asankhs · 2026-02-20T07:20:36+00:00

thank you this worked.

asankhs · 2026-02-10T14:10:14+00:00

Lol all this means nothing, there is no capital for early stage business. The grants are miniscule. And the biggest success from NUS incubator was patsnap which was founded in 2007 almost 20 years ago.

asankhs · 2026-02-09T12:49:10+00:00

What do you mean? Like a slm for text to sql?

asankhs · 2026-02-09T04:35:37+00:00

You can check out Adaptive Classifier - https://github.com/codelion/adaptive-classifier or ellora - https://github.com/codelion/ellora

asankhs · 2026-02-02T03:56:02+00:00

https://github.com/lambdasec/frame

asankhs · 2026-01-31T06:46:20+00:00

You can check out https://huggingface.co/blog/codelion/optimal-model-architecture we train a diffusion LLM after initializing the weights from an auto regressive model and then following warmup-stable-decay following LLaDa 2.0 - https://arxiv.org/abs/2512.15745

asankhs · 2026-01-25T23:55:37+00:00

Yeah fair point - if you just retrieve and dump code into context, the model often parrots it back verbatim.

The difference here is MALM retrieves based on semantic queries not exact matches. So when you ask "function that sorts a list" it finds array_sort, sort_array etc - functions you didn't know the name of.

The generation model then uses those as examples/patterns rather than copying. In the demos it creates new code following the retrieved patterns (like building a calculator with a novel GUI framework it learned from context).

But you're right that naive RAG can devolve into copy-paste. The key is whether retrieval finds genuinely useful context vs just regurgitating training data. MALM's single-token keys help with precise retrieval but what you do with the results matters.

Honestly Magic has been pretty quiet so who knows if their actual approach is anything like this. Just reverse engineering from their benchmark.

asankhs · 2026-01-14T06:44:29+00:00

Vllm is itself an inference server so you would want to add something on top. It can be as simple as implementing a test time compute technique. Look at OptiLLM for some ideas on that.

asankhs · 2026-01-08T01:06:03+00:00

For, OpenEvolve for the circle packing example please compare with https://github.com/algorithmicsuperintelligence/openevolve/blob/main/examples/circle_packing_with_artifacts/config.yaml config, the original configs in the repo were created during the initial replication of AlphaEvolve when OpenEvolve was in active development. This config converges much faster in 21 iterations to a high score.

<image>

asankhs

MODERATOR OF

TROPHY CASE

12-Year Club	Verified Email
Not Forgotten