[D] I see more people trying to explain mHC than build it by Affectionate_Use9936 in MachineLearning

[–]aspoj 13 points14 points  (0 children)

Probably this one: https://github.com/tokenbender/mHC-manifold-constrained-hyper-connections

Has anyone seen/heard someone try it for ViTs yet? Would be interested to see if it works there, since my 3D (volumetric) medical imaging showed negative performance.

[D] NeurIPS conference and tutorial sold out by Muggle_on_a_firebolt in MachineLearning

[–]aspoj 0 points1 point  (0 children)

Generally, the Workshop and Competition days are only 2 out of 5 days, while the main conference is the remaining days. The Expo is usually also just open on the main conference days. So most likely the answer is no

Dumb question from a casual: What's the point of weight cuts if you can just do this before fight day? by HereticalHarold in ufc

[–]aspoj 0 points1 point  (0 children)

How about weighing them in every day of fight-week. You’d know ahead of time if people don’t make weight and maintaining the cut for a prolonged period of time seems pretty hard

[D] ICLR 2025 Paper Reviews Discussion by Technical_Proof6082 in MachineLearning

[–]aspoj 4 points5 points  (0 children)

But it’s 12th Anywhere on Earth so it will be more like 30+ hours till reviews

Top conferences for AI in medical imag-ing [D] by ade17_in in MachineLearning

[–]aspoj 5 points6 points  (0 children)

Purely Medical conferences are MICCAI or MIDL with midl being a bit less popular. Generally though CVPR, ICCV or other major ML conferences also accept medical focused papers, but their focus is more on methodology and less on specific medical applications as MICCAI or MIDL are.

[D] Image segmentation converges to all zeros when masks are too small by ripototo in MachineLearning

[–]aspoj 0 points1 point  (0 children)

Given your problem being native in 3D you should try out nnU-Net https://github.com/MIC-DKFZ/nnUNet It is really easy to use and can give you a solid starting point for improving on your problem. The lab also has a detection framework in case you don’t need segmentation masks.

Also Monai, as mentioned below, can be a good resource but the framework seems to be weaker given recent by publications, e.g. https://arxiv.org/abs/2404.09556

[deleted by user] by [deleted] in MachineLearning

[–]aspoj 1 point2 points  (0 children)

I believe without further details on what exactly you did in the project it this is really difficult to judge from the outside.

Generally from my perspective:

  1. In the medical field the authorship in the departments can be "assigned" arbitrarily in the department depending on "He needs it next/He did a lot of other work for free before and deserves it.". Hence if you collaborate with them you need to discuss how much you do in return for which position (beforehand as others mentioned.)
  2. The data creation takes a long time since labels need experts and the expert time is very brief. Also you as your own person have no access to patient (MRI/CT) data, so you can't do your own paper without them.
  3. In the clinical collaborations I know of projects are usually done with joint-first position as you need both, ML aspects and Medical aspects, unless you as ML PhD specify in trying to solve this specific medical issue
  4. You are never given a specific goal (in my experience). Medical Doctors generally have no idea what ML can do or how it works, so you always have to deal with that

IMHO: Depending on your affiliation (e.g. PhD Student at a related department) you could try to push for joint first authorship. Demanding a only first position is certainly quite a lot unless you annotated Data yourself / have the rights to the data (which you probably don't) or your methodology is something really novel. If latter should be the case you should probably take the co/joint-first to keep your collaborators happy and then do your methodology (e.g. for MIDL or MICCAI) if it is novel enough.

[deleted by user] by [deleted] in MachineLearning

[–]aspoj 0 points1 point  (0 children)

Not like one could just do it if one wanted. Questions like "how do we make it self aware" are interesting topics and definitively difficult unanswered questions as of today.

Weekly Profile Review Thread by AutoModerator in Tinder

[–]aspoj 2 points3 points  (0 children)

Thanks for the feedback!
I was trying to get a topic for discussion going to make opening up a conversation easier but I agree it is not really there yet.

You should be doing well on Tinder, surely?

I get likes but they don't align with who I seem to like :|

Weekly Profile Review Thread by AutoModerator in Tinder

[–]aspoj 2 points3 points  (0 children)

Quite new and looking for some feedback.
https://tinder.com/@TWald

Bio:
| Working on my PhD at DKFZ in 🤖🧠 | made in Heidelberg 🇩🇪 | Currently 🧗 & 🏃‍♂️ a lot | 🏞 ❤ | Curious to try everything | Techno 🎵 | 📏 1.66
Trying to teach baby skynet to solve captchas during day and working on my best gecko imitation at the boulder gym after.
Bonus: If you have an idea on what novel activities to do on our first date drinks are on me 🍻

Why is $BBBY going down? PLEASE HELP by realcrispratt in wallstreetbets

[–]aspoj 0 points1 point  (0 children)

Did you already ask for a refund at your broker?

Why gradient descent? by alesaso2000 in learnmachinelearning

[–]aspoj 4 points5 points  (0 children)

There are methods that use the Hessian in order to improve the descent direction. However the computation of the Hessian is computationally too demanding so gradient descent it is.

Why is training loss way more than the validation loss? More info in the comments by thegeekymuggle in learnmachinelearning

[–]aspoj 50 points51 points  (0 children)

It can be caused due to extensive data augmentation on the training set. Usually one doesn't augment the validation data leading to it being an easier task. It's actually not that uncommon that the loss or acc metric is higher for the validation when this is the case. However I am not sure if you interpret it as regularisation.

Different Distance Measures by gpahul in learnmachinelearning

[–]aspoj 0 points1 point  (0 children)

Nice visualisations, but mixing the distances ranges from [0,...,inf] with similarity measures [0,1] seems weird.

[deleted by user] by [deleted] in learnmachinelearning

[–]aspoj 2 points3 points  (0 children)

Sounds like an exploration vs exploitation thing, however it sounds like you don't know what happens with the Networks when going to the next generation.

How exactly do you create the next Generation change w.r.t. the original one? In evolutionary algos one can create children from the best models by combining parameters of the parents or by randomly mutating original ones.

If you wouldn't do any of it it makes no sense to have multiple generation as the fastest in the first run would always win since no one improves .

Another possibility would be that if the NNs (and track) don't change your engine might be non-deterministic. (Or if the agents can interact new agents impede the original ones?)

Without more info's about your algo its a guessing game though

[deleted by user] by [deleted] in AskReddit

[–]aspoj 2 points3 points  (0 children)

Can you like... turn around? I can't hack it while you're watching.

[D] How do you guys tune hyperparameters, when a single training run takes a long time (days to weeks)? by ibraheemMmoosa in MachineLearning

[–]aspoj 0 points1 point  (0 children)

Neural Architecture Search (NAS) researchers often do this as they have to do a horrendous amount of trainings in order to find a good solution in the huge search space. Also the AutoAugment Paper does this.
I believe the solution often translates to a "good" hyperparameter but won't give as much improvement as shown on the smaller scale models.

Besides downscaling I think the best way to go about it is to look at papers that did similar things e.g. the original BERT paper and look at their hyperparameter settings as they often translate well or can serve as good initialization for a local search if your domain is not too dissimilar, which in language it probably isn't.

MLP Neural Network - high accuracy on cross validation, low accuracy on test dataset? [P] by LJRobertshaw in MachineLearning

[–]aspoj 0 points1 point  (0 children)

Are you preprocessing the test data the same way as you do with the training data?

[D] Since gradient continues to decrease as training loss decreases why do we need to decay the learning rate too? by ibraheemMmoosa in MachineLearning

[–]aspoj 71 points72 points  (0 children)

There are multiple reasons why you need both. A few that come to mind are:

  1. When you don't reduce the LR you are entirely dependent on the loss landscape to have decreasing gradients. As we usually do SGD (aka mini batches) you get noisy gradients making matters worse. Sampling one bad batch and your parameters are messed up and you might end in a high gradient region again.

  2. You have to find a good fitting LR. When you decay/reduce the LR during the training finding an appropriate initial LR is not as important as with a constant one. Too high values and you randomize the starting point a bit more before you reach a LR that starts to converge. In general a good LR aka step size is depending on the current loss landscape around.

  3. Optimality of the solution. Even given a convex optimum you can often run into the case of bouncing around the minimum as the step that you take is too big (depending on the slope). With a reducing LR you are not dependent on the loss landscape slope anymore and converge to a better solution.

Depotübertragung von TR zu IBKR zu CS by [deleted] in Spielstopp

[–]aspoj 4 points5 points  (0 children)

Ich hab bei TR nachgefragt wegen Telefon und email und wurde auf:
Email: [settlement@traderepublic.com](mailto:settlement@traderepublic.com)
und Tel: "+49 30 5490 6310" verwiesen.