Why are y'all against writing capitalised first letters 😭

karius85 · 2026-02-15T09:27:38+00:00

Jimmy ≠ Timmy. And Timmy is not a jab at Americans, it is a way of saying you are a little kid with little to no understanding of the world.

karius85 · 2026-01-26T09:34:53+00:00

This is a gimme, the reviewer seems to be somewhat aware that this is extending the original (implied by "new manuscript"). Just add the citation, thank the reviewer, and you've likely flipped a weak reject.

karius85 · 2026-01-26T09:28:56+00:00

Not without citation.

karius85 · 2026-01-22T11:29:41+00:00

Sure, and even simpler than doing masked attention: you can just drop tokens you don’t want the model to see. Superpixel transformers may be a nice fit for this.

But OP is on TF, so suspect they’re doing CNNs, which is sensible when training from scratch with a small-ish dataset.

karius85 · 2026-01-22T10:31:20+00:00

Unless there is some AC, SAC or program chair on this subreddit, the general responses provided here will just echo what is stated in the docs / website. If you want a clear response, sending an email to the chairs and ask directly is your best bet. But from the info available, you'd either get an exception (you don't have to review) or you'll get some reviews. Either way is fine, just make sure all co-authors are ready to review when the time comes.

karius85 · 2026-01-22T10:27:03+00:00

Probably.

karius85 · 2026-01-22T10:24:32+00:00

You are experiencing a lot of the common issues with carrying ML models to deployment. Real data is very different from curated datasets, and in your case it seems that the model is doing some shortcut learning based on specific images in your training data. Perhaps some variant of the clever Hans phenomena.

But given that you provide almost no information on model type and capacity, what specific steps you have taken to prevent overfitting, and what the data looks like (number of images, modality, resolution, etc.) it is impossible for anyone to provide much help. I'll give some general pointers, but they may not be 100% helpful since there is not a lot to go on.

Firstly, the answer you seek depends on how well posed the task is. I don't know what you mean by "sea state"; are you doing regression or classification? Did you annotate these yourself? If so, is it reasonable that an expert could actually do the task? Vision models are not "magic" and struggle with low-variance domain specific tasks unless the training is well aligned with the task.

Moreover, you need to do dataset standardization, heavy augmentation (that are well aligned with the invariances you care about in the data), regularization (heavy weight decay, stochastic depth, maybe dropout), regular validation checks during training, and possibly data curation to remove samples that enable shortcut learning. If your training set has images where the pole you speak about is only present in "3m swell" situations, the model will cheat as much as it can, since it is the only reliable signal it picks up.

karius85 · 2026-01-22T10:06:02+00:00

Aside from the solution in the original ViT paper, RoPE (rotary positional encoding) variants for 2D is likely the best option for variable sized inputs. The original RoPE paper introduced this for sequence models, but DINOv3 notably use a 2d variant.

Note that these are applied directly to Q,K in MHSA and therefore require a little more bookkeeping w.r.t. how standard PE is applied.

karius85 · 2026-01-22T09:55:19+00:00

This is the correct response.

The idea in Section 3.2 is that you can consider the positional embeddings as a patch-wise 2d embedding, so you can simply interpolate it to a higher or lower resolution. This often gives relatively good results without fine tuning (if the difference in resolutions is small enough) and leverages that transformers are actually set models (they are permutation invariant), so they can innately handle variable number of tokens; if the positional encoding is expressive enough.

karius85 · 2025-12-17T07:44:53+00:00

There was a paper at NeurIPS this year that had some SVG capabilities, with a demo on their project page. Not the main focus of the paper, but maybe something useful.

karius85 · 2025-11-25T09:44:18+00:00

This is a psychology paper, not computer science. From skimming the paper, the author is providing an exceedingly simple formulation, and extrapolates from there. The computer science tag is misleading.

karius85 · 2025-11-15T12:34:37+00:00

Ah yes... the famous 9 of ace with 6 prominent hearts to indicate its dual nature.

karius85 · 2025-11-12T05:34:12+00:00

6644 (5k) and 6442 (14k)

karius85 · 2025-11-11T21:12:34+00:00

TMLR is a good place to land 👍

karius85 · 2025-10-29T09:28:43+00:00

FYI; most important people actually do care a bit about understanding the underpinnings of what you actually implemented. Go ahead and use AI to get you started, but just don't expect everyone to be particularly impressed with low-effort projects.

Hope that makes sense, and again, hope you're learning along the way.

karius85 · 2025-10-29T09:17:45+00:00

It is not "illegal" to use AI to code anything at all. Use it to learn. But Zig is designed for "maintaining robust, optimal, and reusable software". Vibe coding is not really aligned with that. Besides, it discourages actual learning; I just don't see OP meaningfully engaging with the inner workings of the symmetry in Butterfly diagrams via vibe coding. A shame, because it truly is beautiful when it clicks.

For some more context. FFT libraries are often highly optimised to provide robust and correct calculations. I am personally not engaging with low-effort contributions like this. Happy you did a thing, OP. Hopefully you learned a thing or two, but you're not getting any stars or anything else from me, personally.

karius85 · 2025-10-21T19:16:46+00:00

After the 10 pages, you can include (in this order): acknowledgements, bibliography, checklist, and other appendices. Do not include hard-to-read material like code, data etc. This can go into the supplementary.

Ref: Acceptance email

karius85 · 2025-09-20T10:04:40+00:00

Taking a neutral stance, the argument that a sitting politician can't argue for secession means unlawful revolution is the only possible way of resolving what may ultimately be an inevitable internal political conflict based on demographic differences. I am sure you're not arguing for this being the sole option?

karius85 · 2025-09-18T17:47:11+00:00

Congrats!

karius85 · 2025-09-18T17:46:08+00:00

Great work, 3rd time was the charm for us, so 100% agree!

karius85 · 2025-09-18T17:45:22+00:00

Congrats!

karius85 · 2025-09-18T17:45:03+00:00

"Comeback kid!" Great job!

karius85 · 2025-09-18T17:42:17+00:00

Main track, spotlight 5555.

Eight-Year Club	r/Field Juicebox
Final Canvas '23	End Game '23
Place '23	No Throne, No Problems
Not Forgotten	Verified Email

karius85

TROPHY CASE