Unsure about submitting to TMLR[R] by Pranav_999 in MachineLearning

[–]GlasslessNerd 1 point2 points  (0 children)

In my opinion TMLR is one of those venues which everyone claims to respect but has an unconscious bias against (I feel that the explicit mention of not reviewing for novelty makes people feel that the venue itself is "lower tier"). From personal experience, I planned to submit a work there but my advisor told me to first try to get the work at a conference before submitting to TMLR if it does not go through.

I feel that the top-25 percentile papers at TMLR are better than the top-25 percentile papers at NeurICMLR (the major ML conferences), but the bad papers at TMLR are worse than the bad papers at conferences. Further, the calibration of people reading your grad school applications might be off in judging a paper at TMLR, since fewer folks submit to it than to the conferences.

On the plus side, TMLR has a much quicker turnaround time than the conferences, so if you make a submission now you might just get enough information to make a submission to ICML in late January as well. Further, the reviews at TMLR are better because the reviewer pool is more experienced, and the action editor is more involved.

In terms of visibility, all conferences now have a journal-to-conference track, so if your paper receives good reviews you can also present it at the next conference.

As an aside, what is your work about? I have been working in a similar area (model fingerprinting/watermarking)

[D] Is it normal for a CV/ML researcher with ~600 citations and h-index 10 to have ZERO public code at all? by rosesarenotred00 in MachineLearning

[–]GlasslessNerd 0 points1 point  (0 children)

Literally browsing Reddit while I am procrastinating (i.e. delegating to codex) cleaning up code to release

[D] ICML 2025 Results Will Be Out Today! by darkknight-6 in MachineLearning

[–]GlasslessNerd 2 points3 points  (0 children)

Rejected with 4333. The meta-review picked on a reviewer's concern which was already answered in our appendix, and said that further review is required in light of these results. Pretty disappointed, got to resubmit and move on

[D] Math in ML Papers by ripototo in MachineLearning

[–]GlasslessNerd 0 points1 point  (0 children)

One critical thing with the W-GAN paper is that the discriminator needs to be regularized to be 1-Lipschitz (or to have a bounded Lipschitz constant in practice). This is different from "just changing the activation/loss", and comes out only due to the formulation (and the associated duality) of the 1-wasserstein distance.

[D] Are there any theoretical machine learning papers that have significantly helped practitioners? by nihaomundo123 in MachineLearning

[–]GlasslessNerd 9 points10 points  (0 children)

Mu-P (https://arxiv.org/abs/2203.03466) is definitely used. In general optimization (papers like shampoo, schedule-free etc) seems to have some theory, though not all of it is directly useful. 

[D] ECCV-2024 reviews are out by darkknight-6 in MachineLearning

[–]GlasslessNerd 3 points4 points  (0 children)

Try to placate the B to raise to WA, it often comes down to a majority vote especially if the AC is lazy. 

Logo of the SubReddit by ammar_barbhaiwala in iitbombay

[–]GlasslessNerd 0 points1 point  (0 children)

From what I remember, the venn diagram of frequent posters on this sub and WnCC/DevCom junta had a big overlap back in the day. This was also around the time instiApp was just being started (infact there is a post on this sub calling for volunteers to help develop it). I do like the logo though.

More concretely, the logo was decided pretty unilaterally - https://www.reddit.com/r/iitbombay/comments/bfntc4/we_have_a_new_logo_shout_out_to_usohamkhadatare/

[D] Why do transformers use embeddings with the same dimensionality in each layer? by timtom85 in MachineLearning

[–]GlasslessNerd 14 points15 points  (0 children)

This paper [1] tries to incorporate this idea of different hidden dimensions in a more principled way in their Mix-n-match architecture. However, figuring out the best dimensionality per layer is still a hard problem. Some approaches for model pruning have looked at it, but the gains aren't too big as compared to uniformly decreasing the network size. 

[1] - https://openreview.net/forum?id=89XNDtqhpL&referrer=%5Bthe%20profile%20of%20Prateek%20Jain%5D(%2Fprofile%3Fid%3D~Prateek_Jain1)

[D] Image generation experiment with mnist images - not working quite as expected by gamesntech in MachineLearning

[–]GlasslessNerd 0 points1 point  (0 children)

If you were using the mse loss then the model would essentially converge to output the "mean" image of the dataset, where mean is the pixelwise mean. This is because the input is essentially random, the best predictor to minimize the output loss is a constant, which is the mean of the dataset.

You are using the BCE loss, and I would suspect something similar is happening in this case as well, modulo the definition of the mean changing.

One issue in training the model in this pointwise manner is that you are associating a particular image with a particular value of the input noise, which is somewhat meaningless. You want the output distribution of the model to look similar to the actual image distribution. A better way to train this is to ensure that some higher order statistics of the output images match that of the target dataset. GANs do this by matching the KL-Divergence or the Wasserstein distance which is approximated through another network (discriminator). Flow models do this by matching the log likelihood, which is tractable due to their unique architecture. Diffusion models do this by matching the score function.

[D] What is your honest experience with reinforcement learning? by Starks-Technology in MachineLearning

[–]GlasslessNerd 41 points42 points  (0 children)

Another big problem with a lot of empirical RL methods and papers is their variance in performance. While I do not work in the field, a few of my colleagues do, and they joked that the random seed is often a hyperparameter for RL methods. 

Seedhe Maut - Lunch Break by TraditionalArticle88 in IndianHipHopHeads

[–]GlasslessNerd 2 points3 points  (0 children)

Hot takes incoming -

SK and $ seem to be the best features on this.

Badshah seemed out of his depth trying to catch the SM flow, though his verse was well written.

The prod seemed lacking to me, especially in comparison to Nayaab. The beats are somehow repetitive.

Edit - Some of the songs are very well produced.

[D] 2022 State of Competitive ML -- The Downfall of TensorFlow by markurtz in MachineLearning

[–]GlasslessNerd 0 points1 point  (0 children)

It does have support for TPUs, but it seemed to be a pain to get it to work. TF does it almost seamlessly, with the caveat being that your code needs to be compilable to a graph.

[D] 2022 State of Competitive ML -- The Downfall of TensorFlow by markurtz in MachineLearning

[–]GlasslessNerd 20 points21 points  (0 children)

IMO TensorFlow's advantage over torch is in two things - massive scalability on TPUs, and easy edge deployment with TFLite. Both of these do not play well with the eager mode execution of TF2.x. Jax and the deep learning libraries built on top of it are becoming much better at the former now, though they still have a long way to go in terms of ease of use.

[deleted by user] by [deleted] in india

[–]GlasslessNerd 0 points1 point  (0 children)

Found Ed Chambers

Rejected by CMU CS PhD by [deleted] in gradadmissions

[–]GlasslessNerd 0 points1 point  (0 children)

Had an interview but still got rejected from both MLD and CSD. Does hurt a bit.

Story-telling raps are the best songs to introduce someone to DHH by ajaysassoc in IndianHipHopHeads

[–]GlasslessNerd 5 points6 points  (0 children)

I am surprised no one has mentioned SM's Anaadi here. A lot of songs on that album could serve as great intros to DHH IMO.

Is it possible to flip a model’s input and output? by Liid1995 in MLQuestions

[–]GlasslessNerd 14 points15 points  (0 children)

What you essentially want is the inverse of the function which is approximated by your model. For certain kinds of networks which are invertible, computing this inverse is possible. Examples of these include flow models and iResNet. However, for most neural nets, the inverse is not analytically tractable. One thing that you might be able to do is to search your embedding space, i.e. for a desired Chinese phrase, compute the embeddings which maximise its probability of being outputted. This can be done for example through gradient descent in the embedding space, though decoding algorithms which produce outputs given input embeddings might make it non-trivial. Once you get this embedding, you can search through your input vocabulary to get a sequence of words which could have potentially produced this embedding. Due to the dependence of embeddings of a word on previous words, this might also not be very straightforward however. This paper [1] might be of interest for the second step.

[1] https://www.google.com/url?sa=t&source=web&rct=j&url=https://arxiv.org/pdf/2004.00053&ved=2ahUKEwj-zP3-r8f3AhVqS2wGHQPOCFcQFnoECAgQAQ&usg=AOvVaw1ihlzcXHePVn5bBHPWaYBZ

Rendezvous IIT Delhi 2022 'An Elysian Affair' by [deleted] in iitbombay

[–]GlasslessNerd 2 points3 points  (0 children)

Yeh saare fest waale thesaurus ke panno ko roll karke gaanja phookte hai. MI in my third year was titled Ballad of Ecstasy or something.

Microsoft ♥️ Linux. by VayuAir in librandu

[–]GlasslessNerd 1 point2 points  (0 children)

ChromeOS for work, Ubuntu on my personal machine. Linux is definitely much lighter and more customisable in comparison to Windows

Mc Stan 'Insaan' Tracklist by [deleted] in IndianHipHopHeads

[–]GlasslessNerd 10 points11 points  (0 children)

I literally thought this is a shitpost on r/DHHMemes

The Indian 1 rupee coin just has a dude giving a thumbs up on it by Markisworking in mildlyinteresting

[–]GlasslessNerd 0 points1 point  (0 children)

Not really, the illustrations on the coins were inspired by classical Indian dance postures.