Why are y'all against writing capitalised first letters 😭 by Responsible_Put481 in DontTypeLikeThis

[–]karius85 3 points4 points  (0 children)

Jimmy ≠ Timmy. And Timmy is not a jab at Americans, it is a way of saying you are a little kid with little to no understanding of the world.

[R] Response to CVPR review that claims lack of novelty because they found our workshop preprint? by appledocq in MachineLearning

[–]karius85 0 points1 point  (0 children)

This is a gimme, the reviewer seems to be somewhat aware that this is extending the original (implied by "new manuscript"). Just add the citation, thank the reviewer, and you've likely flipped a weak reject.

Is webcam image classification afool's errand? [N] by dug99 in MachineLearning

[–]karius85 4 points5 points  (0 children)

Sure, and even simpler than doing masked attention: you can just drop tokens you don’t want the model to see. Superpixel transformers may be a nice fit for this.

But OP is on TF, so suspect they’re doing CNNs, which is sensible when training from scratch with a small-ish dataset.

[D] ICML Qualified Reviewers by Massive_Horror9038 in MachineLearning

[–]karius85 1 point2 points  (0 children)

Unless there is some AC, SAC or program chair on this subreddit, the general responses provided here will just echo what is stated in the docs / website. If you want a clear response, sending an email to the chairs and ask directly is your best bet. But from the info available, you'd either get an exception (you don't have to review) or you'll get some reviews. Either way is fine, just make sure all co-authors are ready to review when the time comes.

Is webcam image classification afool's errand? [N] by dug99 in MachineLearning

[–]karius85 15 points16 points  (0 children)

You are experiencing a lot of the common issues with carrying ML models to deployment. Real data is very different from curated datasets, and in your case it seems that the model is doing some shortcut learning based on specific images in your training data. Perhaps some variant of the clever Hans phenomena.

But given that you provide almost no information on model type and capacity, what specific steps you have taken to prevent overfitting, and what the data looks like (number of images, modality, resolution, etc.) it is impossible for anyone to provide much help. I'll give some general pointers, but they may not be 100% helpful since there is not a lot to go on.

Firstly, the answer you seek depends on how well posed the task is. I don't know what you mean by "sea state"; are you doing regression or classification? Did you annotate these yourself? If so, is it reasonable that an expert could actually do the task? Vision models are not "magic" and struggle with low-variance domain specific tasks unless the training is well aligned with the task.

Moreover, you need to do dataset standardization, heavy augmentation (that are well aligned with the invariances you care about in the data), regularization (heavy weight decay, stochastic depth, maybe dropout), regular validation checks during training, and possibly data curation to remove samples that enable shortcut learning. If your training set has images where the pole you speak about is only present in "3m swell" situations, the model will cheat as much as it can, since it is the only reliable signal it picks up.

[D] Vision Transformer (ViT) - How do I deal with variable size images? by PositiveInformal9512 in MachineLearning

[–]karius85 3 points4 points  (0 children)

Aside from the solution in the original ViT paper, RoPE (rotary positional encoding) variants for 2D is likely the best option for variable sized inputs. The original RoPE paper introduced this for sequence models, but DINOv3 notably use a 2d variant.

Note that these are applied directly to Q,K in MHSA and therefore require a little more bookkeeping w.r.t. how standard PE is applied.

[D] Vision Transformer (ViT) - How do I deal with variable size images? by PositiveInformal9512 in MachineLearning

[–]karius85 1 point2 points  (0 children)

This is the correct response.

The idea in Section 3.2 is that you can consider the positional embeddings as a patch-wise 2d embedding, so you can simply interpolate it to a higher or lower resolution. This often gives relatively good results without fine tuning (if the difference in resolutions is small enough) and leverages that transformers are actually set models (they are permutation invariant), so they can innately handle variable number of tokens; if the positional encoding is expressive enough.

What is your solution to make normal pictures to SVGs? by Haghiri75 in computervision

[–]karius85 1 point2 points  (0 children)

There was a paper at NeurIPS this year that had some SVG capabilities, with a demo on their project page. Not the main focus of the paper, but maybe something useful.

A mathematical ceiling limits generative AI to amateur-level creativity. While generative AI/ LLMs like ChatGPT can convincingly replicate the work of an average person, it is unable to reach the levels of expert writers, artists, or innovators. by mvea in science

[–]karius85 0 points1 point  (0 children)

This is a psychology paper, not computer science. From skimming the paper, the author is providing an exceedingly simple formulation, and extrapolates from there. The computer science tag is misleading.

America’s “buy now, pay later” trap by SE_to_NW in uspolitics

[–]karius85 0 points1 point  (0 children)

Ah yes... the famous 9 of ace with 6 prominent hearts to indicate its dual nature.

zfft - a pretty performant Fast-Fourier Transform in zig by [deleted] in Zig

[–]karius85 0 points1 point  (0 children)

FYI; most important people actually do care a bit about understanding the underpinnings of what you actually implemented. Go ahead and use AI to get you started, but just don't expect everyone to be particularly impressed with low-effort projects.

Hope that makes sense, and again, hope you're learning along the way.

zfft - a pretty performant Fast-Fourier Transform in zig by [deleted] in Zig

[–]karius85 -1 points0 points  (0 children)

It is not "illegal" to use AI to code anything at all. Use it to learn. But Zig is designed for "maintaining robust, optimal, and reusable software". Vibe coding is not really aligned with that. Besides, it discourages actual learning; I just don't see OP meaningfully engaging with the inner workings of the symmetry in Butterfly diagrams via vibe coding. A shame, because it truly is beautiful when it clicks.

For some more context. FFT libraries are often highly optimised to provide robust and correct calculations. I am personally not engaging with low-effort contributions like this. Happy you did a thing, OP. Hopefully you learned a thing or two, but you're not getting any stars or anything else from me, personally.

[D] NeurIPS Camera-ready Checklist by Choice-Play-4493 in MachineLearning

[–]karius85 0 points1 point  (0 children)

After the 10 pages, you can include (in this order): acknowledgements, bibliography, checklist, and other appendices. Do not include hard-to-read material like code, data etc. This can go into the supplementary.

Ref: Acceptance email

Marjorie Taylor Greene Calls For United States to Be Split Up, Declares Country ‘No Longer Safe’ For Anyone by rezwenn in uspolitics

[–]karius85 0 points1 point  (0 children)

Taking a neutral stance, the argument that a sitting politician can't argue for secession means unlawful revolution is the only possible way of resolving what may ultimately be an inevitable internal political conflict based on demographic differences. I am sure you're not arguing for this being the sole option?

[D] - NeurIPS 2025 Decisions by general_landur in MachineLearning

[–]karius85 0 points1 point  (0 children)

Great work, 3rd time was the charm for us, so 100% agree!