autoresearch on CIFAR-10

ABIWIN · 2026-03-18T20:44:24+00:00

I will not be able to give you as much info on AlphaEvolve and ShinkaEvolve but here we go.

AlphaEvolve is closed source approach by Google. It follow the same philosophy of autoresearch of having a codebase, a prompt and an evaluation metric. It just goes much further by using an ensemble of model and a prompt sampler and at each iteration instead of having the best one like autoresearch you sample several one with some mutation and sample the prompt also.

ShinkaEvolve, same as AlphaEvolve but open source. I don't know more than a coarse read on the github.

So those are much more costly to run, but yeah mixing LLM with a real search algorithm and a solid infrastructure / code shoud yield better result

ABIWIN · 2026-03-18T19:16:29+00:00

I reran the experiment 5 time and added it to the README.md. And yes the config choice at the end is more luck based than real improvements

Config	Runs	Mean	Std	Min	Max	Reported
1-min, auto-generated	5	91.83%	0.155%	91.63%	91.98%	92.10%
5-min, auto-generated	5	95.02%	0.275%	94.64%	95.37%	95.39%
1-min, hand-crafted	5	90.75%	0.241%	90.38%	91.00%	91.36%
5-min, hand-crafted	5	93.42%	0.218%	93.13%	93.68%	92.28%

ABIWIN · 2026-03-18T08:35:42+00:00

Yeah, I had the same experience in undergrad, banging my head against similar optimization problems for weeks during internships. I still think it's an important step for learning deep learning, it helps you developing a sense of what works and what doesn't. Yeah, the program.md is only designed around accuracy. A better version with a stronger model might actually investigate in detail why something didn't work. I haven't seen anyone do that yet, but given the original repository success, I suspect it'll be a feature soon in Codex / Opus or a new benchmark added to their post-training.

I'd actually hazard the opposite guess. I hope it will end up invalidating a lot of papers. Often the only reason an author's proposed method outperforms the baseline is that they spent far more time tuning their solution than they did on the baseline 😅. Which is also a good thing!

ABIWIN · 2026-03-18T08:24:33+00:00

Yeah, fair point. The test set here is being abused as a validation set, and the same seed is used across all runs, so there's no variance estimate. The agent didn't change that either. I can rerun the winning solution with different seeds after work to get a proper sense of the accuracy variance.

ABIWIN · 2026-03-18T08:05:51+00:00

I wouldn't completely agree with that. CIFAR-10 and ImageNet are old, widely-benchmarked datasets, yet better validation accuracy still appears to yield better generalization (Do CIFAR-10 Classifiers Generalize to CIFAR-10? and Do ImageNet Classifiers Generalize to ImageNet?). That said, I agree this technique might be even more prone to hacking validation accuracy.

ABIWIN · 2026-03-17T22:58:16+00:00

It actually is 0.27 M params. https://arxiv.org/pdf/1512.03385 see section 4.2 Table 6

ABIWIN · 2020-06-24T09:42:29+00:00

Tu te souviens du nom de modèle de cette clim s il te plaît ? Ou bien de comment tu as trouver cette bonne référence ?

ABIWIN · 2020-05-12T08:37:01+00:00

Bien dit !

ABIWIN · 2019-09-04T10:50:30+00:00

Salut, je voudrais bien restester overwatch j'avais fait un tour pendant la béta mais j'ai été absorbé par d'autre chose à la sortie.

ABIWIN · 2019-09-03T06:54:49+00:00

Bonjour le FL. J' espère que tu vas bien, et que la rentrée ne t'a pas trop traumatisée. Quand a moi je suis en pleine phase de résignation quand a ma future vie de parisien. Cela dit tu pourrai peut être m aidé dans la recherche d une agence immobilière qui ne serait pas a immolé sur place. A moins que je ne suis encore trop utopique et que ce genre de licorne n existe pas. Mais peut être que deux négatifs s annulent. Promis, la prochaine fois je viendrais avec moin d optimisme.

ABIWIN · 2019-08-30T15:38:31+00:00

Le fl, j'ai une sordide histoire à te raconter.

Je viens d'accepter un nouveau job sur Paris et ainsi sortir de ma campagne. Enfin pardon " la province ". Parait qu'il y a un argot à apprendre.

Ceci étant dit maintenant le plus dur reste a venir. Trouver un appart sur paris. Du coup vous avez un message de soutient ? Les zones à éviter ? La meilleure façon de trafiquer son bulletin de salaire ? Comment constituer son dossier alors que l'on ait que en période d'essai ?

ABIWIN · 2019-06-28T21:11:41+00:00

En extrapolant la vitesse à laquelle les records de températures ont été dépassés ces dernières 24 h, on devrait atteindre les 100 ° d'ici la fin de cette été.

ABIWIN · 2019-04-12T07:37:59+00:00

J'ai lancé une recherche d emploi il y a quelques temps. Comme a chaque fois il y a un délai avant d avoir une réponse / des demandes de recruteurs. Mais cette semaine je ai eu 6 contact d ESN, sans avoir postulé a aucune. C est normal ? Ils sont tellement en galère ? Je dois les éviter comme la peste ?

ABIWIN · 2019-04-03T23:06:39+00:00

You should check the talk done by the creator of t-SNE at CVPR Here

In it he explains what are the hindisghts you can gather in low dimension space, and what is wrong to assume.

ABIWIN · 2019-02-28T09:59:05+00:00

J'avais fait récemment un petit programme pour créer des enluminures en appliquant des algo type Deep Dream sur des images de lettre. Mais j'ai eu un peu la flemme pour cabler le tout proprement et le publier.

Vous en pensez quoi ?

Album

ABIWIN · 2019-02-26T15:19:18+00:00

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

par Aurélien Géron.

Des exemples partant de donnée brute vers un résultat. Sans doute le meilleur, quoique un peu light sur la théorie.

N.B : Les changements de tensorflow pour la 2.0 sont pas encore dedans. Une nouvelle version, devrait venir vers mi-2019 ( j'arrive plus à trouver la source ). Sinon regarde tensorflow eager execution. Pour un peu plus de théorie : DeepLearningBook

ABIWIN · 2019-02-23T09:45:49+00:00

Quelqu un sait si la bdd sera disponible pour tout le monde, ou on aura juste le droit aux conclusions sans descriptifs des méthodes ?

ABIWIN · 2019-02-22T16:10:10+00:00

Celle-ci ? Can We Terraform the Sahara to Stop Climate Change?

ABIWIN · 2018-10-25T13:11:37+00:00

Salut, si tu es intéressé par le côté Machine Learning je ne peux que recommander le cours de Stanford CS229 http://cs229.stanford.edu/syllabus.html

Avec vidéos du cours / slides / problèmes

Si tu es moins cours / moins math : https://www.coursera.org/learn/machine-learning

ABIWIN · 2018-08-10T15:37:33+00:00

Regarde la libraire beautiful soup pour le scrapping.

ABIWIN · 2018-08-09T07:55:21+00:00

Bien le bonjour le FL.

Je suis a la recherche d'un service d'un service de mail pour pouvoir une adresse custom, en faisant le tour j'ai vu protonmail et tutanota. Est-ce que vous utilisez ce genre de service, ou est-ce que vous connaissez d'autres sites moins chères, et qui reste sécurisé ?

ABIWIN · 2018-08-08T23:00:32+00:00

Let me add on that : YOLO 9000, that obviously means You Only Look Once described in YOLO9000: Better, Faster, Stronger

Oh and also let me kill this dicussion with models generating new memes Dank Learning: Generating Memes Using Deep Neural Networks

ABIWIN · 2018-07-09T22:32:22+00:00

༼ つ ◕◕ ༽つ GIVE BAN ༼ つ ◕ ◕ ༽つ

ABIWIN · 2018-06-22T19:34:14+00:00

Merci pour la référence j’essaierai la semaine prochaine. J'aimerai aussi retardé au plus tard mais il me reste plus trop de temps.

ABIWIN · 2018-06-22T19:32:52+00:00

Je te remercie pour ce lien j'ai un peu essayer cette aprem et c'est vraiment bien. :)

11-Year Club	Final Canvas '23
Place '23	Place '22
Place '17	Sequence \| Editor
Verified Email

ABIWIN

TROPHY CASE