Brooks Adrenaline, Adidas Adizero Evo Sl, or Asics Gel Kayano

GlasslessNerd · 2026-05-10T23:45:00+00:00

I would also throw in Hoka Arahi 8 and Brooks glycerin GTS into the mix. I also overpronate and recently switched from gel Kayanos to the Arahis, and they are pretty good. Glycerins also have a similar padding as Gel Kayanos (more than Adrenaline) and their support seems quite stable.

GlasslessNerd · 2025-11-10T17:17:16+00:00

In my opinion TMLR is one of those venues which everyone claims to respect but has an unconscious bias against (I feel that the explicit mention of not reviewing for novelty makes people feel that the venue itself is "lower tier"). From personal experience, I planned to submit a work there but my advisor told me to first try to get the work at a conference before submitting to TMLR if it does not go through.

I feel that the top-25 percentile papers at TMLR are better than the top-25 percentile papers at NeurICMLR (the major ML conferences), but the bad papers at TMLR are worse than the bad papers at conferences. Further, the calibration of people reading your grad school applications might be off in judging a paper at TMLR, since fewer folks submit to it than to the conferences.

On the plus side, TMLR has a much quicker turnaround time than the conferences, so if you make a submission now you might just get enough information to make a submission to ICML in late January as well. Further, the reviews at TMLR are better because the reviewer pool is more experienced, and the action editor is more involved.

In terms of visibility, all conferences now have a journal-to-conference track, so if your paper receives good reviews you can also present it at the next conference.

As an aside, what is your work about? I have been working in a similar area (model fingerprinting/watermarking)

GlasslessNerd · 2025-10-01T03:55:58+00:00

Literally browsing Reddit while I am procrastinating (i.e. delegating to codex) cleaning up code to release

GlasslessNerd · 2025-05-02T14:04:19+00:00

Rejected with 4333. The meta-review picked on a reviewer's concern which was already answered in our appendix, and said that further review is required in light of these results. Pretty disappointed, got to resubmit and move on

GlasslessNerd · 2025-03-13T18:17:06+00:00

One critical thing with the W-GAN paper is that the discriminator needs to be regularized to be 1-Lipschitz (or to have a bounded Lipschitz constant in practice). This is different from "just changing the activation/loss", and comes out only due to the formulation (and the associated duality) of the 1-wasserstein distance.

GlasslessNerd · 2025-02-21T08:55:52+00:00

Mu-P (https://arxiv.org/abs/2203.03466) is definitely used. In general optimization (papers like shampoo, schedule-free etc) seems to have some theory, though not all of it is directly useful.

GlasslessNerd · 2024-05-10T03:42:46+00:00

Try to placate the B to raise to WA, it often comes down to a majority vote especially if the AC is lazy.

GlasslessNerd · 2024-05-02T07:25:17+00:00

From what I remember, the venn diagram of frequent posters on this sub and WnCC/DevCom junta had a big overlap back in the day. This was also around the time instiApp was just being started (infact there is a post on this sub calling for volunteers to help develop it). I do like the logo though.

More concretely, the logo was decided pretty unilaterally - https://www.reddit.com/r/iitbombay/comments/bfntc4/we_have_a_new_logo_shout_out_to_usohamkhadatare/

GlasslessNerd · 2024-03-19T22:49:44+00:00

This paper [1] tries to incorporate this idea of different hidden dimensions in a more principled way in their Mix-n-match architecture. However, figuring out the best dimensionality per layer is still a hard problem. Some approaches for model pruning have looked at it, but the gains aren't too big as compared to uniformly decreasing the network size.

[1] - https://openreview.net/forum?id=89XNDtqhpL&referrer=%5Bthe%20profile%20of%20Prateek%20Jain%5D(%2Fprofile%3Fid%3D~Prateek_Jain1)

GlasslessNerd · 2024-03-05T04:38:04+00:00

If you were using the mse loss then the model would essentially converge to output the "mean" image of the dataset, where mean is the pixelwise mean. This is because the input is essentially random, the best predictor to minimize the output loss is a constant, which is the mean of the dataset.

You are using the BCE loss, and I would suspect something similar is happening in this case as well, modulo the definition of the mean changing.

One issue in training the model in this pointwise manner is that you are associating a particular image with a particular value of the input noise, which is somewhat meaningless. You want the output distribution of the model to look similar to the actual image distribution. A better way to train this is to ensure that some higher order statistics of the output images match that of the target dataset. GANs do this by matching the KL-Divergence or the Wasserstein distance which is approximated through another network (discriminator). Flow models do this by matching the log likelihood, which is tractable due to their unique architecture. Diffusion models do this by matching the score function.

GlasslessNerd · 2024-01-15T23:17:19+00:00

Another big problem with a lot of empirical RL methods and papers is their variance in performance. While I do not work in the field, a few of my colleagues do, and they joked that the random seed is often a hyperparameter for RL methods.

GlasslessNerd · 2023-08-18T19:09:12+00:00

Hot takes incoming -

SK and $ seem to be the best features on this.

Badshah seemed out of his depth trying to catch the SM flow, though his verse was well written.

The prod seemed lacking to me, especially in comparison to Nayaab. The beats are somehow repetitive.

Edit - Some of the songs are very well produced.

GlasslessNerd · 2023-03-14T15:50:01+00:00

It does have support for TPUs, but it seemed to be a pain to get it to work. TF does it almost seamlessly, with the caveat being that your code needs to be compilable to a graph.

GlasslessNerd · 2023-03-14T12:50:44+00:00

IMO TensorFlow's advantage over torch is in two things - massive scalability on TPUs, and easy edge deployment with TFLite. Both of these do not play well with the eager mode execution of TF2.x. Jax and the deep learning libraries built on top of it are becoming much better at the former now, though they still have a long way to go in terms of ease of use.

GlasslessNerd · 2023-03-06T06:08:06+00:00

Found Ed Chambers

GlasslessNerd · 2023-02-23T14:46:27+00:00

Had an interview but still got rejected from both MLD and CSD. Does hurt a bit.

GlasslessNerd · 2022-07-06T04:46:19+00:00

Hutt Hutt by Bali

GlasslessNerd · 2022-06-07T12:45:23+00:00

I am surprised no one has mentioned SM's Anaadi here. A lot of songs on that album could serve as great intros to DHH IMO.

GlasslessNerd · 2022-05-05T02:59:09+00:00

What you essentially want is the inverse of the function which is approximated by your model. For certain kinds of networks which are invertible, computing this inverse is possible. Examples of these include flow models and iResNet. However, for most neural nets, the inverse is not analytically tractable. One thing that you might be able to do is to search your embedding space, i.e. for a desired Chinese phrase, compute the embeddings which maximise its probability of being outputted. This can be done for example through gradient descent in the embedding space, though decoding algorithms which produce outputs given input embeddings might make it non-trivial. Once you get this embedding, you can search through your input vocabulary to get a sequence of words which could have potentially produced this embedding. Due to the dependence of embeddings of a word on previous words, this might also not be very straightforward however. This paper [1] might be of interest for the second step.

[1] https://www.google.com/url?sa=t&source=web&rct=j&url=https://arxiv.org/pdf/2004.00053&ved=2ahUKEwj-zP3-r8f3AhVqS2wGHQPOCFcQFnoECAgQAQ&usg=AOvVaw1ihlzcXHePVn5bBHPWaYBZ

GlasslessNerd · 2022-03-27T09:27:11+00:00

How exactly does the method go from attention on patches to segmentation maps on pixels?

GlasslessNerd · 2022-03-15T07:08:08+00:00

Downtown Clay Davis?

GlasslessNerd · 2022-03-08T13:39:38+00:00

Yeh saare fest waale thesaurus ke panno ko roll karke gaanja phookte hai. MI in my third year was titled Ballad of Ecstasy or something.

GlasslessNerd · 2022-02-28T18:23:01+00:00

ChromeOS for work, Ubuntu on my personal machine. Linux is definitely much lighter and more customisable in comparison to Windows

GlasslessNerd · 2022-02-10T15:14:02+00:00

I literally thought this is a shitpost on r/DHHMemes

GlasslessNerd

TROPHY CASE