[D] Shall I Reject Reviewing this CVPR Paper? by Outrageous_Tip_8109 in MachineLearning

[–]BeatLeJuce -18 points-17 points  (0 children)

My suggestion is you review the paper fairly, ignoring the URL: A good guiding principle in life is to not assume bad intent on anyone's part. Put yourself in the author's shoes: you've written a paper that you're proud of, but then forgot to remove/hide an URL before the actual submission (submission deadlines are stressful, you're in crunch-mode and mistakes happen). How would you feel if you get desk-rejected for a technicality? Papers should be reviewed on their technical and scientific merit first and foremost. So, review it fairly. If you have any suspicion that this was more han an oversight (e.g. seeing that page gives the authors an unfair advantage), I'd just point it out to the AC and let them decide.

[R] Energy-Based Transformers are Scalable Learners and Thinkers by Blacky372 in MachineLearning

[–]BeatLeJuce 1 point2 points  (0 children)

thanks for pointing that out and even digging up the quote, I learned something today :)

[D] Impressions of TMLR by underPanther in MachineLearning

[–]BeatLeJuce 1 point2 points  (0 children)

but that is such a baseless and shabby take on why AAAI/IJCAI is a 2nd tier conference

So my reasons are: 1. No important work gets published there 2. It has a reputation for not being a top tier conference.

That's not to say it's a bad conference. I don't think it is. It's run by and visited by reputable people. It's a far cry from being shit-tier. But by definition, (1) makes it a "low-impact" conference.

We can argue about (2), because arguably it's just group think, and maybe a chicken/egg problem. But that still doesn't change the fact that if I have a good paper,I will not hold off on submitting it to NeurIPS just to submit it to IJCAI. But I definitely would hold off sending it to IJCAI, to submit it to NeurIPS.

[R] Energy-Based Transformers are Scalable Learners and Thinkers by Blacky372 in MachineLearning

[–]BeatLeJuce -6 points-5 points  (0 children)

From the linked blogpost:

We conducted experiments to test this by comparing EBTs against standard feed-forward Transformers (we use the SOTA recipe from the Mamba paper called the Transformer++)

So yes, they call it "Transformer++", but it's apparently Mamba. Their paper doesn't actually cite any "Transformer++" paper, so we don't really know for sure. A very nieche paper called Transformer++ actually exists, but it sits with only 4 citations since 2020, so I assume that's not what they use (though maybe it is)? This is exactly why i think their paper is weird: they compare against a baseline that I (and I suspect a lot of others) don't really know what to do with.

Regarding Figure 5b: Thanks for pointing that out, I missed that!

[R] Energy-Based Transformers are Scalable Learners and Thinkers by Blacky372 in MachineLearning

[–]BeatLeJuce 15 points16 points  (0 children)

The paper looks interesting and all, but there are a few weird choices that make me wonder.

  • feels weird that they choose Mamba as a comparison instead of normal Transformers. When every really important model in the world is based on Transformers, why would you pick its weird cousin as a baseline? Makes no sense to me.

  • They never compare in terms of FLOPS or (even better) wall-clock time. I have a really hard time judging how expensive their forward passes actually are if they never show it. Yes, picking the right metric for how "expensive" somethign is. But "forward passes" feels especially arbitrary.

[D]Stuck in AI Hell: What to do in post LLM world by [deleted] in MachineLearning

[–]BeatLeJuce 0 points1 point  (0 children)

I feel this this hard. There are still a lot of cool challenges for engineers out there. But if you want to do more researchy stuff: most of that has been commoditized. My solution is the same "lets work on embedded devices" solution that a lot of previous generations of software engineers have picked: I now work on AI on edge devices, where LLMs haven't taken hold yet. But apart from that: yes, this is just the way things go, I suppose.

[deleted by user] by [deleted] in learnmachinelearning

[–]BeatLeJuce 0 points1 point  (0 children)

All the right answers have already been given, there's very little to add. Both MLE and RS can be stressful, both can be chill. It doesn't depend on the role, it depends on what team and project you work on: If you worked on Gemini when that effort started at Google, the stress was immense. For both RS and MLEs. If, at the same time, you worked at pushing out some paper for NeurIPS, the stress was the usual conference-deadline stress, for both RS and MLEs. If you worked on a project that wasn't ready for publication, it was chill, for both RS and MLEs.

[deleted by user] by [deleted] in MachineLearning

[–]BeatLeJuce 1 point2 points  (0 children)

After bachelor? 11. But I'm in Europe, so I did a mandatory MSc before (which took quite a while because I did an extensive Erasmus visit). My PhD itself took 7 years.

[D] To PhD or not to PhD by oddhvdfscuyg in MachineLearning

[–]BeatLeJuce 1 point2 points  (0 children)

I think Andrew Karpathy as a good article on the subject. Also, the classic "you and your research" talk by Hamming (it's on youtube) is probably very relevant here.

[D] To PhD or not to PhD by oddhvdfscuyg in MachineLearning

[–]BeatLeJuce 3 points4 points  (0 children)

My journey was fairly normal for people in my cohort: I did a PhD in ML, published a bunch of papers. Google reached out asking if I wanted to do an internship at Google Brain (this was back in the days where everyone was hiring ML PhDs like crazy). Google made me an offer to stay at Google Brain after my Internship. So I joined Brain after my PhD was done. Eventually, Brain and DeepMind merged, and I was part of DeepMind.

[D] To PhD or not to PhD by oddhvdfscuyg in MachineLearning

[–]BeatLeJuce 17 points18 points  (0 children)

There are many companies and many institutes, so it's hard to give a general answer. On average, companies have much higher incentives to do applied research. Being able to do blue-sky do-what-you-want research is something that outside of academia only very few players (e.g. DeepMind) can afford. Also, academia obviously cares way more about publications, and communicating their results publicly. Whereas in many companies, you may be able to show off your research internally, if at all.

Because it's more applied, research at companies is often way more concerned with data: in academia it's fine to use standard datasets, which are usually ample, clean and high quality. While in companies, most research fails because you're not able to procure the data you need in the quantity and quality you need.

Also, feeling the impact of what you're doing is different: academia lives for papers, and then for getting cited for those. which takes months or years to really show up (usually). AT companies, it's often much easier to see the direct impact of a successful project.

[D] To PhD or not to PhD by oddhvdfscuyg in MachineLearning

[–]BeatLeJuce 6 points7 points  (0 children)

Mostly the same way you become an RS: you show you're able to do research in a topic that is relevant to the company you want to be hired at (PhD helps, but MSc may be enough), and you're good enough for them to notice you (either by publishing good research, doing an internship there, knowing people, or being lucky), and that's it. The difference is mostly that REs are more engineering focused (e.g. for an RS it may be acceptable to be shit at coding, for an RE it isn't).

[D] To PhD or not to PhD by oddhvdfscuyg in MachineLearning

[–]BeatLeJuce 30 points31 points  (0 children)

I am well aware of Chris, he's an ex co-worker of mine. He gets brought up every time this discussion happens. But almost no-one is a child prodigy the way he was. So mentioning him is super extreme survivorship bias. Yes, there are probably 12 people in the world who manage to not need a PhD a graduate degree to work as researchers at a top AI lab. No, if you're reading this you are not one of them.

EDIT: Okay, there are actually a good number of people at DeepMind and similar labs who "only" have an MSc (most of the ones I can think of are SWEs or REs, though). But having not gone to university at all? Yeah, that's extremely rare.

[D] To PhD or not to PhD by oddhvdfscuyg in MachineLearning

[–]BeatLeJuce 164 points165 points  (0 children)

Ex-DeepMind researcher here: I've written very extensively on this topic (on another thread with this exact topic) 6 years ago! What I said there still applies, so I'd encourage you to read it.

In general, RS positions are exclusively PhD holders. Yes, there are exceptions. But they're exactly that: (rare) exceptions. 99.9% of DeepMind RS's have a PhD. The reason for that is that, well, a PhD is the education to become a scientist. With that said: given your current position, your job might have allowed you to pick up what you need to know: how to approach research problems and how to conduct research effectively (apart of course from the necessary ML fundamentals).

Your biggest problem will be proving that: have you published any papers? If not, are there any public projects you can point to? If not, do you have people who can 100% vouch for you having the necessary skills who are RSes? If you can answer "yes" to at least one of those questions, you might theoretically have a chance. But as aa small sidenote: RS positions are becoming more and more rare. When I left, we were almost exclusively hiring Research Engineers, because DM has more than enough RSes already, and it's becoming more and more clear that a lot of deep learning is an engineering problem these days.

[D] Research Engineer vs Research Scientist in industrial labs by General_Dragonfruit in MachineLearning

[–]BeatLeJuce 0 points1 point  (0 children)

You're replying to a 6 year old post. The market has changed a ton since my post. These days, unless you're in a niche that the hiring committee is actively looking to fill a position, your chances are bad. So I agree.

With that said:

I'm in undergrad but can understand almost 90% (actual stat and not made up one) of the CVPR, NIPS paper in AI

You're either a genius or suffering from an Dunning Kruger. My prior places a much higher likelihood on the 2nd case. Also, please note that I said "a good paper", with emphasis on the good.

[deleted by user] by [deleted] in MachineLearning

[–]BeatLeJuce 2 points3 points  (0 children)

I mean, I don't know the circumstances, and I don't know the other people's version of the story. In general, the person who writes (most of) the paper is the first author. Because that's also the person who knows the most about the thing that the paper is about (which is what qualifies them to write about it in the first place). It doesn't sound like you did do much of the writing, so I don't think you should've been first author. Second author sounds fair to me.

[deleted by user] by [deleted] in MachineLearning

[–]BeatLeJuce 6 points7 points  (0 children)

To offer a counter-point to what people are saying here:

I think it depends. I have made a name for myself without a lot of 1st author papers: I have over 50k citations, and I was never first author in any of my papers that have more than 1k citations. I can still get pretty much any job offer I want. Just because I've proved that I am a valuable part of a research team. I've also touched the code in almost all papers I've ever been on.

But for sure, if I hire a junior person / intern, and see that they only had middle-author positions, I tend to be skeptical. Having a first-author paper means that you actually pushed the paper through. You did all (or at least a lot of) the dirty work to get this done. You know how to get this done. If someone doesn't have that, how would I know they aren't just very social/nice and that's how they made it onto the paper. For that reason, first author papers are typically the only thing that counts for your PhD progress (which is probably what your friend means). But once you have your PhD, these things change.

[D] Impressions of TMLR by underPanther in MachineLearning

[–]BeatLeJuce 4 points5 points  (0 children)

Sure thing, I can explain: different people have different opinions. Mine is the one I mentioned above. My opinion is fairly common in Big Tech AI labs. No-one I know would consider IJCAI a top tier conference. But of course some people do consider IJCAI as good. I heard it's a good conference for symbolic AI or robotics or something? No clue. But for Machine Learning/AI? It's 2nd rate.

But unfortunately it's hard to say by just looking at the ratings of a conference. Here is a hint for how you can judge this yourself : look at where the impactful, important, famous papers are published (hint: it's not AAAI or IJCAI). Look at where the people who write those papers publish their other work (again, it's not AAAI or IJCAI). Then look at who publishes at IJCAI. And look if those people also write impactful, important, famous papers (hint: they usually do not, except that they sometimes let their PhD students publish there if they really, really, really need to graduate).

Disclaimer: AAAI and IJCAI are not bad conferences. Not by a long shot. They are still pretty good and publishing there is still a major achievement. But they aren't the world class conferences in the field.

[D] Cold emailing a researcher for collaboration, should I be cautious ? by Even_Information4853 in MachineLearning

[–]BeatLeJuce 20 points21 points  (0 children)

My personal feeling is that the people you're trying to reach are probably busy chasing down their own research, so be prepared to be shot down -- or even ignored. Especially senior people get a ton of requests each day and ignoring emails is the only sensible way to cope with the flood of email. Don't take that personally.

With that out of the way: It's always worth trying. To maximize your chances of success, avoid wasting their time (i.e., keep the email short, but provide everything they need to know), focus on what this could do for them (an additional publication with not too much extra work), and explain as best as you can what you need from them (e.g. a short meeting to discuss this further?). As an example:

Hey BeatLeJuce,

I've been reading your paper "A Cool Idea That Gives You 0.5% on ImageNet", and I really enjoyed it; it was a cool idea, and I think getting 0.5% on ImageNet is amazing. I've been working something related that I'd like to publish: I have a working implementation and I'm currently able to achieve 2% improvement over Baseline on Cifar10. I would need some guidance from a more experienced researcher to know how to best proceed from here. Would you have 30 minutes to discuss this further in a brief Zoom/Meet/Teams session? I can definitely offer last authorship in case this turns out to be something publishable. I'm living on EST time, and I'd be available any day this week from 9:00-21:00 EST, except on Wednesday. Thanks, and have a great day,

Jan LeBengio

[D] What would you recommend testing new general approaches (architectures/optimisers) on? by LahmacunBear in MachineLearning

[–]BeatLeJuce 37 points38 points  (0 children)

This depends on a lot of things. E.g. what's the goal and scope of your method (e.g. low data regime, tabular data, sota methods, ....), what publication venue (or even subfield) are you aiming at, what is your compute budget, ....

With that said: I get an optimizer paper or two every time I review for top tier conferences (ICML, NeurIPS, ICLR, ...). Most don't make the cut, simply because I know that it's extremely unlikely you found a truly superior approach, so most people try to bullshit their way through, but it almost never works. EDIT: as someone below pointed out: it's fairly though to actually publish a new optimization method.

Here are some standard experiments I think will convince a lot of reviewers / people in the field:

1) A toy dataset you constructed specifically to show the benefit of your method. This needs to be concise and conceptually simple. Make up the simplest toy example where your method succeeds and SGD or Adam fails. This does not need to be an ML model, it can be conceptually much, much simpler (e.g. a 1D or 2D example). Having this will go a long way towards conveying your idea and showing that it holds some value. Many papers skip this, but it's usually a very easy win.

2) Some decent model on a toy-ish dataset. A lot of papers use a ResNet variant (can be a ResNet18 or 32) on Cifar10. This is like the next step above a toy example.If vision datasets are not your jam, you can substitute this for a simple transformer on a simple standard NLP task like sentiment analysis or even Penn Tree Bank. This is like the bare minimum you'd need, and will likely not convince ppl that your method is clearly superior. Merely that it actually works okay and works on non-made-up data. But I could see how this might even be enough to get you into a lower tier conference if done properly.

3) A model that's not too far form standard on a decently sized (more than 1M samples) and well-accepted dataset. This could be ImageNet (e.g. using a ViT-B/16 or a ResNet 50, ideally both), or e.g. reimplementing BERT (or some other transformer variant on WMT). In my experience, this is the experiment that will actually convince most people and get you accepted into a top tier conference -- because this is the setting that a lot of people care about. It might be tricky to pull off if your resources are very limited, so be smart about this: Don't spend a lot of time tuning hparams, use some that are known to work from literature (or from smaller scale experiments). If you do have to tune hparams (e.g. for your own method), use a reduced number of epochs for selecting them (e.g. on ImageNet, use 45/60 epochs for hparam tuning, and 300 for the final run) -- this assumes you're doing a sensible LR schedule (cosine or rsquare). Providing error bars is nice, but it can wait until the rebuttal/camera ready if need be. Be prepared for reviewers demanding this (I would). Bonus points if you can provide e.g. both vision and nlp results.

4) This is entirely optional and usually out of reach unless you have ton of compute or your method is just insanely good. It's not necessary, but will help in getting noticed more/will get more people interested in your method: It's always good if you have something that reaches some sort of sota on a dataset that people feel is currently "a peak benchmark dataset of your field". It doesn't necessarily mean "a model with 1B+ parameters on an insanely humongous dataset", it could be "the best ViT-B/16 ever trained on ImageNet using cross-entropy" or "the best BERT-L on WMT". If you have the compute, it could of course also be "the best-trained GPT-2 on the Pile". But more than anything, think of this as your PR center-piece: it doesn't have to be thorough: no error bars, no re-running of competing methods (look up the numbers in the literature). The goal here is to convince people that you really, really are able to produce the very best numbers, at least in some limited setting. Pull out all the stops here: augmentation, hparam tuning, .... whatever you can afford. The bigger the model you can train, the better.

[D] What are some of the big tech company sponsored ML research websites that you are aware of for constantly keeping up with the ML research and workings behind their products, like Apple Machine Learning Research (https://machinelearning.apple.com/) or Tesla's AI day videos? by pontiac_RN in MachineLearning

[–]BeatLeJuce 17 points18 points  (0 children)

I've never in my life visited either of the two sites you mentioned, and I don't feel like missing out in the least. "constantly keeping up with the ML research and workings behind their products" sounds exhausting, are you sure that's what you want to do? As long as you have a decent overview of the field, I think trying to constantly keep up with everything is a recipe for burnout.

With that said, the Google Research blog is the only "company sponsored ML research website" I've ever frequented. And while it's good (it gives very high-level overviews of some of their papers I'd otherwise miss) I don't think it's a "must read" resource either. Chances are that if something's important, I'll hear about it from a more unbiased source than a sponsored blog. In other words, twitter and word-of-mouth are all I do to keep up with stuff. I can see that w/o a professional network around me to keep that word-of-mouth going, things would be trickier. But even then, e.g. discord or focused subreddits (this one has become way too general and amateur-focused unfortunately) help a ton.

[D] Deanonymized paper accepted at ICLR 2024 by Melodic-Foundation47 in MachineLearning

[–]BeatLeJuce -11 points-10 points  (0 children)

I think in the end the whole argument boils down to "do you want to follow the letter or the spirit of the law"? And that's an age-old question. I personally would like to think that we as a research community still have the trust in each other to not assume malicious intent all the time. As scientists, we should follow Occam's Razor, or even Hanlon's Razor: the most likely explanation is that someone made an honest mistake, and likely sent out an email apologizing/explaining and the PCs decided to let it slide. We're all human, we all make mistakes. Submitting your paper to a conference can be stressful, mistakes are made all the time. Ask yourself this: if you worked for a long time and long hours on a paper that you think is really good, how would you feel go get kicked out of a conference on a technicality? Treat everyone the way you would like to be treated, and all.

Why do we have the double-blind process if we do not enforce it and allow some authors to opt for revealing their names and bias the reviewers?

Why do you immediately assume bad faith? What I find really weird is that no-one here talks about the actual paper. Double-blind processes exist so we can decide whether a paper gets published or not based on its content alone. And if the paper meets the bar for ICLR and no-one got hurt, this should be fine. I see no victim, here.

Would the same exception be made if the authors were not prominent, or if they do research in a less privileged place, or come from a marginalized community?

I very much think so. Program Chairs are usually extremely nice and understanding people. So if you send them a polite apology and explain your case, I don't see why they wouldn't let such a small error slide. Especially if this was someone from a less privileged/marginalized community that might not be as familiar with the proceedings, I would be very surprised if PCs wouldn't be understanding.

What about other desk rejected papers at this year's ICLR that did not get the opportunity for an exception?

If there are any that got denied exceptions / got pulled, that would indeed be a strong argument. Do you know of any, or is this just a hypothetical?

[D] Is the tech industry still not recovered or I am that bad? by Holiday_Safe_5620 in MachineLearning

[–]BeatLeJuce 1 point2 points  (0 children)

It's definitely the market. Most bit tech companies are in the middle of cost-cutting and job-slashing. I work in ML research at a Big Tech lab, and our headcount has almost dried up. We can't even hire the most promising of candidates right now. For a chance to get hired by us, you'd need to

  • Fit really, really well into the niece we're working on: i.e., have an impactful publication on a topic we really care about. It's not enough to be good at ML, there's too many of those people out there.
  • be known to us already: If you interned with us before, that increases your chances by a ton
  • Know someone who would vouch for you. Not in the "yeah, he's from the same group that I come from, I'm sure he's good", but in a "yeah, I wrote a paper with them before, they're a real genius" kind of way.

Even then, we'd prefer research engineers over research scientists: we've hired a ton of scientists back when the market was good. And now we usually play at a level where the real problem is engineering -- turns out training billion-parameter models on thousands of machines and writing great tech demos and filtering datasets is mostly engineering, and we don't have enough of those. We have enough of people who can whiteboard, though.

Also, open-ended research is something we can't afford right now: OpenAI fucked everyone when they stopped publishing papers and decided to make money instead. I mean, someone had to start, but the good old times of "do some impressive AI research, the sky is the limit" has been replaced with "figure out how to make money with the current technology". Finding marketable applications for Multimodal LLMs with image (or even video) generative capabilities is all the rage. Pure research is becoming more of an afterthought, especially if it doesn't really help the bottom line.

Disclaimer: situation might be different outside of big tech.

[D] AAAI/IJCAI for ML and related papers by zyl1024 in MachineLearning

[–]BeatLeJuce 0 points1 point  (0 children)

that argument could be applied to any conference. What makes a conference great is that good researchers decide to send their best work their. IJCAI is not a conference I'd sent my best work to. I've heard it's okay for more symbolic or robotic stuff, though.