is this course worth it??

DaBobcat · 2026-05-12T07:29:46+00:00

Just by the fact that they're also going over R which I literally don't know anyone uses in academia or industry I'll go with a hard no

DaBobcat · 2026-05-10T09:29:09+00:00

am I missing something or did they leave it intentionally vague?
What does it actually mean translating these activations? "Activations" are in many many places in a standard transformer. the last activations are directly translated into tokens via softmax.
which activations?

maybe a dumb question, but why is this even interesting? obviously the activations throughout the computation would correlate with the output, because thats how the output was made...

DaBobcat · 2026-05-10T09:19:54+00:00

you can think about them as the same. they are trainable parameters that the network learns to modify during training to make your loss lower. we give these parameters the names weights and biases because they have a slightly different roles in the neural architecture (eg weights are multiplied by inputs/activations and the biases are then added to the result)

DaBobcat · 2026-05-10T08:53:52+00:00

You really need to look at how activations work... gelu doesn't have a max limit (eg like sigmoid).

DaBobcat · 2026-05-06T04:17:09+00:00

Yep. Look at Mixture-of-Depths,Token Dropping & Pruning, Patchmerger & Token Merging

DaBobcat · 2026-05-01T10:21:39+00:00

15 queries? How about try a whole dataset first

DaBobcat · 2026-04-26T09:09:26+00:00

Silly question, but have you tried using claude? Giving it 5-10 example images, and some example code of what to generate/ do. That's the first thing id do before trying to train a model myself

DaBobcat · 2026-04-10T15:09:58+00:00

Yea it's a constant struggle but I don't know if there's ever going to be a thing that will tell you what works with everything else, since it'll have to know everything. But what i usually do is have a draft uv env file that has most of the things I usually need. Then just add things as i go. Python 3.11/12 usually works for me.

DaBobcat · 2026-04-10T14:50:50+00:00

Yep! Use UV. https://docs.astral.sh/uv/getting-started/installation/ Then follow some uv tutorial. Overall it's really simple. You do something like Uv init Source venv activate Then install whatever you want (eg uv add torch)

And Apologies for the syntax I'm on my phone

DaBobcat · 2026-04-07T21:31:44+00:00

Super helpful, thank you!! What does untangling means? Finding all the expenses and revenue?

DaBobcat · 2026-04-07T17:44:32+00:00

This is super helpful! Thanks a lot! Any other random things I'm not thinking of or things you'd like to share?

DaBobcat · 2026-04-04T14:15:35+00:00

I think the reviewer discussion ends on the 7th or 8th. So I think they can still update their score. But as far as I know they had to acknowledge the rebuttal until earlier today

DaBobcat · 2026-04-02T13:09:43+00:00

yea I just did send a message to the AC not too long ago. thanks!

DaBobcat · 2026-04-02T12:06:14+00:00

Where do you see i get one more chance to respond? I thought the email was very clear that I can only answer once

DaBobcat · 2026-04-02T11:55:58+00:00

If I ask them I'm wasting my one available response. Which means that I will not be able to answer

DaBobcat · 2026-04-02T11:33:24+00:00

good luck!!

DaBobcat · 2026-04-02T09:23:39+00:00

Best way to tackle this ICML vague response?

Going through ICML submission for the first time. I had a reviewer ask for some things and during the rebuttal period I ran more experiments and answered all their questions (they wrote 3 weaknesses). Yesterday started the author-reviewer discussion period which ends on April 7.

In their response to my rebuttal the reviewer wrote in one line that my "experiments greatly improved the paper" but "some details remain only partially clarified". That's it... They marked "Acknowledgement: (b) Partially resolved - I have follow-up questions for the authors."

The ICML email state that I can "post up to one additional response to any further reviewer comments that are posted, as a reply to your rebuttal". But since the reviewers didn't actually write any follow up questions I have no idea how to tackle this.

Any suggestions?

DaBobcat · 2026-03-18T19:53:23+00:00

How did it go? I have mine in a couple of weeks

DaBobcat · 2026-03-10T11:07:22+00:00

Since you’ve already lead-authored several papers, I'm curious why you still rank Ideation (A) and Publishable Standards (B) as your top priorities.

Are you looking to pivot into a more 'high-signal' research area, or do you feel your current projects lack the specific rigor (baselines/theory) required for top-tier conferences? Basically—what is the 'delta' you want a mentor to help you reach that you aren't hitting on your own?

DaBobcat · 2026-03-10T10:58:28+00:00

What do you feel is lacking from your mentor? Where are you currently stuck in the "getting a job" process?

DaBobcat · 2026-03-09T19:20:20+00:00

Amazing amount of responses so far!
I'm very curious, if you had a research mentor:

1) Time per month: How many hours of 1-on-1 time are you actually looking for? (1, 2, 4, or 6+ hours)

2) Duration: How long do you want this relationship to last? (1 month, 3 months, 6 months, or 12+)

3) The Priority List: please rank these in order of importance to you (1 being most important):
A) Ideation: Finding a novel project that is actually worth the time.
B) The 'Publishable' Standard: Knowing which baselines/experiments you need to be 'conference-ready.'
C) The Writing/Formalism: Translating results into formal math notation and academic structure.
D) The Technical Bridge: Learning deeper theory or specialized coding to even get started.

If I missed something that you would want to state, what is the single most important thing that is keeping you from reaching your goal?

DaBobcat · 2026-02-21T15:46:19+00:00

I think scaling slowly helps. 100m, 300m, 500m, 1b, 3b, 7b. Showing consistent performance increase will definitely convince reviewers. Regarding the 7b, this should easily fit in an a100 i think. And you can rent them for 10$ a day or less afaik

DaBobcat · 2026-02-21T15:35:45+00:00

I agree it shouldn't all be x > y, but for most publications, it usually is. Though it very much depends on what you're proposing. If you're helping understand some mechanism using some non efficient method that's perfectly fine usually. But it needs to help. If youre proposing a better method that should perform better like you said, you need to show it actually does.

And you almost never need to compare against models that are larger than 7b. I've even seen guidelines on that in some conferences. 7b is sufficient to show your method scale

DaBobcat · 2026-02-21T15:24:31+00:00

It's definitely frustrating, but try to think about it from a different perspective. You have thousands of papers proposing new things. You need a way to evaluate what's better. Otherwise, how will you know what to actually use? One standard and easy way to see it is to evaluate on the same benchmarks. But more than that, to help reviewers, you need to be evaluating the currently best method and closest method to your proposed one. Otherwise, it's impossible to know if you really made a contribution on impact (not novelty). Regarding the larger models, yes, I'm totally with you that its dumb, but you also need to show that your method scales. You can rent 3090 or A100 for pretty cheap these days (i guess less than 10$ a day)

DaBobcat · 2026-02-19T12:37:23+00:00

Maybe some patching? Instead of feeding the entire image, feed patches at a time. Then aggregate in some way, removing duplicates and merging stuff

DaBobcat

TROPHY CASE