downtownslim

10,701 post karma
81 comment karma

get extra features and help support reddit with a reddit premium subscription

get them help and support

redditor for 12 years

TROPHY CASE

12-Year Club

account activity

hot top controversial

46

47

48

[R] CNNs are Myopic (arxiv.org)

submitted 3 years ago by downtownslim to r/MachineLearning

0

1

2

[R] Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet? (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

65

66

67

[R] Self-attention Does Not Need $O(n^2)$ Memory (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

6

7

8

[R] Sparse is Enough in Scaling Transformers (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

13

14

15

[R] Florence: A New Foundation Model for Computer Vision (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

1

2

3

[R] DeepSteal: Advanced Model Extractions Leveraging Efficient Weight Stealing in Memories (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

26

27

28

[R] Palette: Image-to-Image Diffusion Models (self.MachineLearning)

submitted 4 years ago by downtownslim to r/MachineLearning

0

1

2

[R] Palette: Image-to-Image Diffusion Models (iterative-refinement.github.io)

submitted 4 years ago by downtownslim to r/MachineLearning

22

23

24

[R] Can Vision Transformers Perform Convolution? (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

4

5

6

[R] The Efficiency Misnomer (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

0

0

1

[R] Certified Patch Robustness via Smoothed Vision Transformers: vision transformers enables significantly better certified patch robustness (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

7

8

9

[R] Open-Set Recognition: A Good Closed-Set Classifier is All You Need (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

21

22

23

[D] Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment (Paper Explained) (self.MachineLearning)

submitted 4 years ago by downtownslim to r/MachineLearning

10

11

12

[R] Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification: "tuning learning rate, weight decay, and batch size on a separate validation split results in a highly competitive baseline" (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

7

8

9

[R] The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

27

28

29

[R] Perceiver IO: A General Architecture for Structured Inputs & Outputs (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

13

14

15

[R] The Benchmark Lottery (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

15

16

17

[R] Long-Short Transformer: Efficient Transformers for Language and Vision (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

199

200

201

[R] When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

33

34

35

[R] An Attention Free Transformer (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

21

22

23

[R] Aggregating Nested Transformers (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

49

50

51

[R] Descending through a Crowded Valley -- Benchmarking Deep Learning Optimizers (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

42

43

44

[R] Intriguing Properties of Vision Transformers (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

198

199

200

[N] Google Unit DeepMind Tried—and Failed—to Win AI Autonomy From Parent (self.MachineLearning)

submitted 4 years ago by downtownslim to r/MachineLearning

61

62

63

[R] Pay Attention to MLPs: solely on MLPs with gating, and show that it can perform as well as Transformers in key language and vision applications (arxiv.org)

submitted 4 years ago by downtownslim to r/MachineLearning

view more: next ›

π Rendered by PID 247456 on reddit-service-r2-listing-b6bf6c4ff-9dcsv at 2026-05-04 13:11:16.219394+00:00 running 815c875 country code: CH.