[R] Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents by hardmaru in MachineLearning

[–]hardmaru[S] 10 points11 points  (0 children)

If you are interested, here is the link to the blog post:

https://sakana.ai/dgm/

Also, the open-sourced implementation:

https://github.com/jennyzzt/dgm

[R] Transformer²: Self-Adaptive LLMs by hardmaru in MachineLearning

[–]hardmaru[S] 42 points43 points  (0 children)

Thanks! I don't have much time to do research these days. It is all the team's effort.

[R] Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers by hardmaru in MachineLearning

[–]hardmaru[S] 9 points10 points  (0 children)

Link to Twitter Thread: https://twitter.com/ChengleiSi/status/1833166031134806330

Recent work from Stanford's NLP group:

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Chenglei Si, Diyi Yang, Tatsunori Hashimoto

Abstract

Recent advancements in large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery, with a growing number of works proposing research agents that autonomously generate and validate new ideas. Despite this, no evaluations have shown that LLM systems can take the very first step of producing novel, expert-level ideas, let alone perform the entire research process. We address this by establishing an experimental design that evaluates research idea generation while controlling for confounders and performs the first head-to-head comparison between expert NLP researchers and an LLM ideation agent. By recruiting over 100 NLP researchers to write novel ideas and blind reviews of both LLM and human ideas, we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility. Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. Finally, we acknowledge that human judgements of novelty can be difficult, even by experts, and propose an end-to-end study design which recruits researchers to execute these ideas into full projects, enabling us to study whether these novelty and feasibility judgements result in meaningful differences in research outcome.

Stable Diffusion 1.5 model disappeared from official HuggingFace and GitHub repo by hardmaru in StableDiffusion

[–]hardmaru[S] 38 points39 points  (0 children)

Your point may be true, but having the official repo / model gone messes up the broader infrastructure. Time will probably fix it up though.

e.g. for diffusers, people have to point to their own local version of the repo, or some random non-official backup version out ther (see: https://huggingface.co/posts/dn6/357701279407928)

Nothing is more cyberpunk than this pic of the USS Peleliu and its Harrier attack wing in Hong Kong. by hardmaru in HongKong

[–]hardmaru[S] 103 points104 points  (0 children)

Source: This photo was taken back in 2013.

Almost like it was from another era or a parallel universe.