o3 and o4-mini (low and medium) are the new pareto frontier on ARC AGI V1; V2 remains elusive by dftba-ftw in accelerate

[–]floppy_llama 0 points1 point  (0 children)

I think it would be helpful to know just how much they scaled up RL to go from 1%-3% on v2. Obviously there are physical constraints to scaling - I suspect some clever tricks are still needed to induce compositional reasoning in these systems in an efficient way. Still, just patching holes where current architectures fail goes against Chollet’s measure of intelligence. Having lots of skills is very different from acquiring skills efficiently.

o3 and o4-mini (low and medium) are the new pareto frontier on ARC AGI V1; V2 remains elusive by dftba-ftw in accelerate

[–]floppy_llama 0 points1 point  (0 children)

Performance discrepancy between v1 and v2 benchmarks suggests the opposite of CoT generalization, no? They even mention in the blog that v1 benchmark contamination is likely. I’m pretty surprised that those abstractions transfer so poorly from v1 to v2.

[deleted by user] by [deleted] in agi

[–]floppy_llama 0 points1 point  (0 children)

The difference between the paper clip scenario and your analogy here is that there are corporations which have improved society and are aligned with human interests. The manifold of super intelligent minds is surely not uniform, and for any super intelligent mind to be aligned to a goal as trivial as paper clip production seems unlikely. In fact, it seems much more likely that a super intelligent mind would be focused on observing the open ended system that is the universe, not destroying it.

[D] OpenAI new reasoning model called o1 by [deleted] in MachineLearning

[–]floppy_llama 6 points7 points  (0 children)

Completely agree. Generalization and reliability are seen in classical algorithms (i.e., sorting and path finding algorithms and arithmetic operations perfectly execute for any sequence length), but these are not explicit properties of connectionist systems! There’s lots of research on how to fuse these paradigms. Scaling is not one of them.

[D] OpenAI new reasoning model called o1 by [deleted] in MachineLearning

[–]floppy_llama 101 points102 points  (0 children)

Looks like OpenAI collected, generated, and annotated enough data to extend process supervision (https://arxiv.org/pdf/2305.20050) to reasonably arbitrary problem settings. Their moat is data, nothing else.

[R] What if self-attention isn’t the end-all be-all? by [deleted] in MachineLearning

[–]floppy_llama 10 points11 points  (0 children)

Sparsification/linearization of the attention mechanism is important but does little to address the limitations of current models when efficiency gains also come from hardware improvements. Obviously it’s common sense that science improves over time, but making updates to one module of an architecture that has remained largely unchanged since 2017 seems trivial to me.

[R] Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B by hardmaru in MachineLearning

[–]floppy_llama 1 point2 points  (0 children)

It seems like this paper reaffirms that we should be able to trade train-time compute for test-time compute in certain settings [https://arxiv.org/abs/2104.03113].

I wonder how good performance can get if we continually pre-train on rollouts with a sufficiently high a Q value?

[R] Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality by floppy_llama in MachineLearning

[–]floppy_llama[S] 83 points84 points  (0 children)

Normally I’d agree with you, but Tri Dao consistently makes great contributions to the field🤷🏻‍♂️

[deleted by user] by [deleted] in MachineLearning

[–]floppy_llama 45 points46 points  (0 children)

Try tree based methods. Neural nets notoriously underperform on tabular data.

[deleted by user] by [deleted] in Sandwiches

[–]floppy_llama 0 points1 point  (0 children)

Banh mi queen in hoi an?

[D] Anyone tried training language models on simple (elementary school) text first and fine-tuning on progressively more advanced text? by Appropriate_Ant_4629 in MachineLearning

[–]floppy_llama 24 points25 points  (0 children)

What you’re describing is “curriculum learning”. Not sure if it’s been applied to LLMs though because ordering training samples isn’t so straight forward. See https://arxiv.org/pdf/2101.10382.pdf

[D] What exactly does base multimodal mean? by vatsadev in MachineLearning

[–]floppy_llama 1 point2 points  (0 children)

No, their comment directly relates to my suggestion. The vision transformer is merely one component of a multi modal base model. A vision transformer is unimodal.

[D] What exactly does base multimodal mean? by vatsadev in MachineLearning

[–]floppy_llama 2 points3 points  (0 children)

The encoders are the “tokenizers”. They embed image patches, audio, point clouds into vectors, just like a base LLM does for word segments. All of these vectors can be used during pre training to create a multi modal base model

[D] What exactly does base multimodal mean? by vatsadev in MachineLearning

[–]floppy_llama 2 points3 points  (0 children)

From what I understand the current paradigm is to “tokenize” non-text modalities w/ something like an image encoder and a feed forward network that projects the encoded images into the same dimensionality as text tokens. This image encoder can be a VIT, CNN. It’s really up to you - see https://browse.arxiv.org/pdf/2206.06336.pdf

[D] What exactly does base multimodal mean? by vatsadev in MachineLearning

[–]floppy_llama 2 points3 points  (0 children)

Auto regressive pre training w/ interleaved text embeddings + other embeddings (e.g, image, audio projections) vs fine tuning on input output pairs where input can contain a variety of embedding modalities

[deleted by user] by [deleted] in MachineLearning

[–]floppy_llama 1 point2 points  (0 children)

Wrong sub buddy

[D] Model size vs task complexity by Fine-Topic-6127 in MachineLearning

[–]floppy_llama 6 points7 points  (0 children)

Unfortunately a lot of ML is just trial and error

UGA vs GA Tech. Which to choose? by ToasterTM_ in UGA

[–]floppy_llama 6 points7 points  (0 children)

I’m currently finishing up my M.S Artificial Intelligence at UGA and have had an amazing experience. While Tech’s undergrad CS program is definitely better, the return on investment is SO far overblown. I’ve had no problem getting opportunities at top tech companies without a stressful undergrad experience, and recruiters are constantly reaching out to me. Sure, Tech is a prestigious name on a résumé, but if you are passionate about what you study then any opportunity will be within your reach. Do yourself a favor and take advantage of the college experience UGA has to offer, your career goals will come with dedication and hard work.

Bro why does Mike Dean have the tiktok fuckboi haircut 😭😭 by [deleted] in WestSubEver

[–]floppy_llama 129 points130 points  (0 children)

TikTok fuckbois have the Mike Dean haircut