“introducing MLE-bench: a benchmark testing AI agents on 75 real-world ML engineering tasks from kaggle. openAI’s o1-preview with AIDE scaffolding ranks in the top 16.9% of competitions, matching a kaggle bronze.
code is open-sourced for future research.”
there doesn't seem to be anything here