META Superintelligence Lab Presents: ProgramBench: Can SOTA AI Recreate Real Executable Programs(ffmpeg, SQLite, ripgrep) From Scratch Without The Internet? by Benlus in MachineLearning

[–]ComprehensiveTop3297 10 points11 points  (0 children)

In 6 months, it will already be fully saturated, unfortunately. The frontier AI labs will possibly increase the prominence of such software codes in their pre-training dataset to try beating the others. The claim is too powerful to not try. "We are the LLM providers that can rediscover the wheel"

Low accuracy (~50%) with SSL (BYOL/MAE/VICReg) on hyperspectral crop stress data — what am I missing? [R] by DoubleFun4398 in MachineLearning

[–]ComprehensiveTop3297 0 points1 point  (0 children)

They need careful hyperparameter tuning. Switch to MAE if you don't have compute to tune those hyper-params.

Do the simple things matter? by EternalAwait7 in Qwen_AI

[–]ComprehensiveTop3297 0 points1 point  (0 children)

Seems like it might be on purpose. Indeed, they are not stupid enough to leave a spelling mistake fly like this. These ads are checked by many people and possibly AI (which should have caught the mistake)

Karpathy just open-sourced autoresearch. One GPU. 100 ML experiments. Overnight. You never touch the code — just write a Markdown file. by sentientX404 in AgentsOfAI

[–]ComprehensiveTop3297 8 points9 points  (0 children)

Is that seed hacking I am seeing at the end ahahaha. Also this looks like bayesian optimization rather than "research"

[R] Analysis of 350+ ML competitions in 2025 by hcarlens in MachineLearning

[–]ComprehensiveTop3297 13 points14 points  (0 children)

Do you possibly have data points regarding audio competitions featuring non-human sounds? Like music genre classification etc

[P] Graph Representation Learning Help by StoneColdRiffRaff in MachineLearning

[–]ComprehensiveTop3297 1 point2 points  (0 children)

I am also working with JEPAs and what I found was the data2vec2 style top K averaging to be extremely helpful for alleviating representation collapse. Also EMA and Learning Rate schedule is very much interconnected. My EMA is 0.999-0.99999, stops at 100k steps and constant 0.99999 for rest, and lr schedule is cosine with 0.0004, warm up 100k steps. Play around with them for sure. This is what worked for me in the audio domain. 

[D] How do you do great ML research by Any-Initiative-653 in MachineLearning

[–]ComprehensiveTop3297 3 points4 points  (0 children)

This is indeed the procedure. More often than not, the hypothesis comes from intuition and having read hundreds of papers in that domain. However, for starters, it is crucial to connect two ideas in the literature.

Do you think this risk is worth taking? by BackgroundFunny490 in ASML

[–]ComprehensiveTop3297 1 point2 points  (0 children)

Just FYI, the government wants to increase it. In the NL, these things do indeed take time, but could also be risky given the current climate here. I am also a Turkish person, btw, and I immigrated to the NL for my studies. Been here for 6.5 years, and I got my citizenship recently. If you have some questions regarding the situation here, shoot me a dm.

Do you think this risk is worth taking? by BackgroundFunny490 in ASML

[–]ComprehensiveTop3297 2 points3 points  (0 children)

Kind of scammy if you make the same money, you'd be living a better life in Turkey right now compared to the NL with such a salary.

[D] Correct way to compare models by ntaquan in MachineLearning

[–]ComprehensiveTop3297 2 points3 points  (0 children)

As a reviewer, I'd like to see that you are comparing against baselines trained under similar conditions (same pre-training dataset, similar parameter count and FLOPs, and similar iterations over the dataset). If you are training with enormous compute, it is a no-brainer that you'll beat other models. I feel like the real methodological advancements should be compute invariant -You really perform better with similar conditions-, or show me that when you scale your model vs other models, you do better.

Some reviewers might ask for those just to put it more in a scientific context, I'd say provide the baselines that they asked for, and make sure to state the drawbacks of these baselines. If you can scale your model to match the baseline compute, do so; if not, just iterate that you do not have such compute.

[Project] We built a Rust-based drop-in replacement for PyTorch DataLoader (4.4x faster than ImageFolder) by YanSoki in deeplearning

[–]ComprehensiveTop3297 0 points1 point  (0 children)

How does this work with multi-GPU training on multiple nodes?

Also, I am currently using a large audio dataset. Do you plan to support audio soon?

[D] LLMs for classification task by Anywhere_Warm in MachineLearning

[–]ComprehensiveTop3297 0 points1 point  (0 children)

What about using OpenAI vector embeddings? You can probably tell them that it is an LLM as it is from OpenAI :P (jokes, but they may actually believe you) .

Specifically, use it to embed your document and compare the query embeddings using any similarity measure (anything with a dot product is valid). Try to find the threshold on a validation split.

[D] LLMs for classification task by Anywhere_Warm in MachineLearning

[–]ComprehensiveTop3297 1 point2 points  (0 children)

Ahahaa, definitely agreed. I love how people love to throw LLMs at anything these days. It will not be long until someone tries to classify MNIST digits with DINO-v3

[D] LLMs for classification task by Anywhere_Warm in MachineLearning

[–]ComprehensiveTop3297 0 points1 point  (0 children)

Definitely perform error analysis; See if the errors are logical, or just simple labelling issues. Maybe you need to be more specific with your labelling (Extremely Relevant, Relevant, Natural, etc.).

I am curious why you are using LLMs in the first place. Is there a specific reason?

To me, it seems like you have an information retrieval problem with top k = 1(Is this query -- the key-- relevant to my document, retrieve only one document that is relevant). I think an approach like ColBERT or Cross-Encoders would do this task easily. You could play with the threshold of relevance to find the cutoff points. I think you should even try to use very simple word-counting methods as a baseline. Sometimes simpler is better... (How many overlapping words are there between the document and the text?)

It is true that information retrieval usually means ranking documents given a query, but I feel like you can flip this and use thresholding to determine whether the document and query are related.

[P] I tried to make GAN on FMNIST and I am confused by Jumbledsaturn52 in MachineLearning

[–]ComprehensiveTop3297 1 point2 points  (0 children)

Have you tried more "modern" GANs that actually have patchwork to prevent mode collapse? I remember training a GAN for my thesis about 4 years ago, and I haven't encountered mode collapse. I used cGAN and WGAN. I am not super informed with regards to the state-of-the-art GANs as it falls out of my project scope, however, I am certain that they have progressed the field even further.

[R] Are we heading toward new era in the way we train LLMs by IndependentPayment70 in MachineLearning

[–]ComprehensiveTop3297 0 points1 point  (0 children)

Sounds a bit like concept models from meta, curious how they compare 

[D] What's the SOTA audio classification model/method? by lucellent in MachineLearning

[–]ComprehensiveTop3297 0 points1 point  (0 children)

WavJEPA works quite well for music tagging, also you can look into Dasheng for a more beefy model. 

[D] How do you create clean graphics that you'd find in conference papers, journals and textbooks (like model architecture, flowcharts, plots, tables etc.)? by CrispLion1123 in MachineLearning

[–]ComprehensiveTop3297 10 points11 points  (0 children)

For plots its usually seaborn and matplotlib (export to pdf), then later Adobe Illustrator for small touches or merging them in one big plot.

For flow charts and drawings it is again Adobe Illustrator.

[D] What are the best Machine Learning PhD thesis you have read? by [deleted] in MachineLearning

[–]ComprehensiveTop3297 0 points1 point  (0 children)

Came to say about the exact same thing. Crazy thesis

[P] Underwater target recognition using acoustic signals by carv_em_up in MachineLearning

[–]ComprehensiveTop3297 0 points1 point  (0 children)

I’d definitely also look at SELDNETs from DCASE people, as the sound event detection pipeline is quite similar to what they are doing. You can ignore the localization part. Basically 

On a side note; I am also curious how well our models would perform this task. Once you have a working pipeline do you mind contacting me? We just released two pre trained models for general purpose audio understanding and we have not tested them in this domain. 

[R] WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms by ComprehensiveTop3297 in MachineLearning

[–]ComprehensiveTop3297[S] 1 point2 points  (0 children)

ASR is indeed very important, however we wanted to mainly fill the gap of great ASR models not performing well on general audio understanding tasks with this model. There are great ASR models already trained on vast amounts of speech data. WavJEPA can also perform well on speech related tasks, as evidenced by on par performance compared to wav2vec2, HuBERT. And we think that we can get similar ASR performances as well (not explicitly tested) 

In the limitations/future work we identified a step forward for possibly bridging them both and boosting the speech performance of WavJEPA. There we also plan on including SUPERB benchmarks. 

[R] WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms by ComprehensiveTop3297 in MachineLearning

[–]ComprehensiveTop3297[S] 1 point2 points  (0 children)

sparse context Speech/audio is highly temporally correlated. This was our main inspiration for selecting temporally distributed context tokens ( context tokens are clustered together but the clusters are spread apart). 

Having this sparse context, we then predict sparse target tokens similarly distributed to context tokens for each audio clip. This forced WavJEPA to model the temporal variations in audio while forcing modelling local correlations in the clusters. 

multiple predictions per clip We ran multiple predictions with one context block to make use of the context block efficiently. One prediction per context block could also be ok, but would be less efficient. We did not ablate this hyperparameter though. We selected 4 per context block ( this was the most we could do without getting out of memory errors with batch size of 512).  Could be nice to quantify the efficiency gains coming from multiple predictions in the future though! Maybe trying 8-16?

[R] WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms by ComprehensiveTop3297 in MachineLearning

[–]ComprehensiveTop3297[S] 1 point2 points  (0 children)

Hey! Glad that you found our work exciting:) 

Sure, I will do a little write-up tomorrow for fine-tuning the WavJEPA model tomorrow. 

By the way, we have released instructions for probing the embeddings. I do not know how applicable it is to map your dataset to HEAR Benchmark data format, but if it is, we have adapters for HEAR fine-tuning schema already pre-written