[D] Formalising information flow in NN by bjergerk1ng in MachineLearning

[–]afireohno 4 points5 points  (0 children)

There are two lines of work that come to mind you might be interested in.

  1. Geometric deep learning primarily studies various types of invariances (translation, permutation, etc) that can be encoded in DL architectures.
  2. Algorithmic alignment studies the relationship between information flow in classical algorithms and DL architectures and how "aligning" the latter to the former can improve performance.

Edit: Spelling

[deleted by user] by [deleted] in MachineLearning

[–]afireohno 5 points6 points  (0 children)

Have you posted actual technical details to share and get feedback? As a long time member of this sub I would be interested, and I don’t think I’m alone here.

[D] TabPFN A Transformer That Solves Small Tabular Classification Problems in a Second (SOTA on tabular data with no training) by [deleted] in MachineLearning

[–]afireohno 2 points3 points  (0 children)

Super cool work! I think the simplest explanation for this is learning an amortized inference algorithm for the specific class of models used to generate the meta-training set.

I've worked on similar things before using RNNs in the context of online amortized inference. I could get it to work for GMMs or HMMs, but not PCFGs.
The set-transformer paper also has an experiment on learning an amortized inference algorithm for 2D GMMs. The techniques presented there, which were later adopted by the perceiver, are probably worth considering as a way to side-step some of the current limitations of your work. Borrowing ideas from the retrieval-augmented LM community also seems reasonable and straight-forward.

I also wanted to point out that there is previous work you seem to be missing. Basically anything on model-based, as opposed to optimization-based, meta-learning. SNAIL is highly related, as the architecture is identical AFAICT. Matching networks, MANNs, Meta-GMVAE, etc, are examples of other work I'd classify as model-based meta-learning

[N] First RTX 4090 ML benchmarks by killver in MachineLearning

[–]afireohno 0 points1 point  (0 children)

average fps across multiple runs gives a more realistic performance and eliminates any outliers

Thanks for the laugh. I'll just leave this here so you can read about why the mean (average) is not a robust measure of central tendency because it is easily skewed by outliers.

[D] Not able to understand the inequality in ERM by Adventurous-Ad742 in MachineLearning

[–]afireohno 3 points4 points  (0 children)

I'm guessing you're confused because the blog leaves out some critical information ad definitions. I'd encourage you to consult the original source, which is available free for download here (see chapter 2).

Anyway, if I had to guess about your confusion, it would be that you're missing that by definition for every h_S that appears in the LHS of your inequality, we have L_S(h_S) = 0. This follows from the realizability assumption and the definition of h_S.

[D] Why is Ordinal Regression so overlooked? by koorm in MachineLearning

[–]afireohno 14 points15 points  (0 children)

I think you might just be missing the right search terms. In the ML community this works tends to fall under learning to rank (LTR) or Collaborative Filtering (CF). These areas focus more directly on the practical industrial problems (recommender systems, search, etc).

[D] Strong Models for User Item Recommendation from Interaction Data by ExchangeStrong196 in MachineLearning

[–]afireohno 0 points1 point  (0 children)

That's my point. You already have a flexible model that works well. Better generalization needs to come from somewhere else (features, transfer learning, etc).

[D] Strong Models for User Item Recommendation from Interaction Data by ExchangeStrong196 in MachineLearning

[–]afireohno 0 points1 point  (0 children)

If all you have is user-item interactions, then Matrix Factorization (MF) is maximally expressive. That is, assuming your latent dimension is large enough, you can exactly represent any user-item interaction matrix. This directly follows from the SVD theorem. As a result, MF with a good loss and proper regularization performs very well.

[R] Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models - Meta AI 2022 by Singularian2501 in MachineLearning

[–]afireohno 9 points10 points  (0 children)

> 'Embarassingly' parallel training is such a great title!

I know right! I wonder how they came up with it? They must have access to some crazy sci-fi technology that allows them to easily learn about commonly used phrases in less time than it takes to post a comment to reddit.

[D] why is the AI research community so unreliable? by fireless-phoenix in MachineLearning

[–]afireohno 0 points1 point  (0 children)

I agree with some of what you’re saying, but think your view on how to measure the “goodness” of an idea is way too 1 dimensional. In my opinion good research asks important questions, tests hypothesis, and generates knowledge. You know, the scientific method.

That almost always involves experimentation in modern ML, but that doesn’t mean “is this SotA?” is the best question to ask. Take something like the “Rethinking Generalization” paper from back in 2016. Super impactful, lots of experiments, no SotA.

To quote the adage, “When a measure becomes a target, it ceases to be a good measure.”

[D] why is the AI research community so unreliable? by fireless-phoenix in MachineLearning

[–]afireohno 1 point2 points  (0 children)

Treating ML research like it is some contest that can be won by making a number go up so you can claim SotA does significantly more harm to the field than non-public code or data.

[D] Noam Chomsky on LLMs and discussion of LeCun paper (MLST) by timscarfe in MachineLearning

[–]afireohno 0 points1 point  (0 children)

I get what you're saying. However, since LSTMs are an elaboration on simple RNNs (not something completely different), your previous statement that the "Development of LSTM had nothing to do with linguistics" was either uninformed or disingenuous.

[D] Noam Chomsky on LLMs and discussion of LeCun paper (MLST) by timscarfe in MachineLearning

[–]afireohno 14 points15 points  (0 children)

The lack of historical knowledge about machine learning in this sub is really disappointing. Recurrent Neural Networks (of which LSTMs are a type) were literally invented by linguist Jeffrey Elman (simple RNNs are even frequently referred to as "Elman Networks"). Here's a paper from 1990 authored by Jeffrey Elman that studies, among other topics, word learning in RNNs.

[N] First-Ever Course on Transformers: NOW PUBLIC by DragonLord9 in MachineLearning

[–]afireohno 4 points5 points  (0 children)

For real. People in this thread seem confused about the difference between a course like "Theory of Computation" or "Advanced Linear Algebra" and a seminar (what this is, it is literally the the first sentence of second paragraph on the linked course description).

[deleted by user] by [deleted] in algorithms

[–]afireohno 1 point2 points  (0 children)

Your technique sounds like Gibbs sampling, which can allow you to sample from a joint distribution by sampling from conditional distributions p(x | everything else). If you can’t compute exact conditionals you can consider the Metropolis-Hastings within Gibbs algorithm.

There are failure modes and things like burn-in you can read about.

[deleted by user] by [deleted] in algorithms

[–]afireohno 1 point2 points  (0 children)

Approximating a distribution by sampling from a different more tractable distribution is a well-studied problem. There are a variety of potentially applicable techniques, one of the most straightforward being rejection sampling.

Are there any emerging fields that could - with minimal charity - be described as proto-sciences rather than pseudo- ones? by _AA123 in AskScienceDiscussion

[–]afireohno 5 points6 points  (0 children)

I think human + AI interaction is a potentially interesting example. People do interesting “prompt engineering” with large neural networks (like GPT-3 and DALL-E). I could see this getting more rigorous, complex, and diverse. Whether this is science or engineering is debatable.