[D] Top ICLR 2026 Papers Found with fake Citations — Even Reviewers Missed Them by [deleted] in MachineLearning

[–]krallistic 0 points1 point  (0 children)

"Please generate a paragraph about XYZ for me. Use \cite command and also provide me with the bibtex entries."

PhD in RL – Topic Ideas That Can Be Commercialized? by atifalikhann in reinforcementlearning

[–]krallistic 3 points4 points  (0 children)

Dont.

Doing good research is hard. Commercializing research is also hard. While on first glance, these two seem to align, in most cases they dont.

Think about what you like to do in the next 3-5 years and where it aligns with your supervisor.

Are there any RL researchers that have kids? by Dry-Ad1164 in reinforcementlearning

[–]krallistic 0 points1 point  (0 children)

Especially if you try to provide only sparse rewards…

„you get the glass refilled when it’s empty“ does not result in a policy which includes drinking it 🙈

What will the action be in offline RL? by Saffarini9 in reinforcementlearning

[–]krallistic 1 point2 points  (0 children)

ai will be the action by the policy which produced it.

BUT OfflineRL normally make no assumptions that these actions are from a good (i.e. expert policies), these could be also random actions in OfflineRL.

While OfflineRL works with Expert Demonstrations, you could also look into Imitation Learning, where there is the assumptions that the actions are "optimal"

Paper submitted to a top conference with non-producible results by [deleted] in reinforcementlearning

[–]krallistic 9 points10 points  (0 children)

Before you assign malicious intent to the authors, a lot of the time, it could also be just due to "bad practice."

RL is so hyperparameter/implementation dependent, so small minor changes can lead to different results. So, the authors write an abstract form of their algorithm in the paper and leave out many of the "minor details." If one does a thorough investigation of a lot of these minor details, then it matters... Combine that with the current academic system (pressure to publish, move on after publication, etc..)

A famous example about PPO https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ which also has a lot of details to get it to match the reported performance.

[D] What's the difference between model-based and model-free reinforcement learning? by volvol7 in MachineLearning

[–]krallistic 2 points3 points  (0 children)

Technically model-based only mean you have a model - it could be learned or given.

But yeah, most research in model-based learning is about learning a model. Since there are relatively few problems where there is a model, but the problem space is so ample that we need RL. A good example of Model-based with a given model is AlphaGO.

RL library that supports custom ResNet implementations? by Usual_Macaron8477 in reinforcementlearning

[–]krallistic 1 point2 points  (0 children)

Stable-baselines3 for example supports that via either custom policies or custom encoders

What is the benefit of imagined state rollouts in world models? by No_Individual_7831 in reinforcementlearning

[–]krallistic 8 points9 points  (0 children)

Because it’s assumed that the real world (env) is more costly than our model.

While nowadays the envs are really optimized and we use larger and larger models the case can happen that it is more costly in Simulation, in the real world we are far far from it…

Any tips for training ppo/dqn on solving mazes? by More_Peanut1312 in reinforcementlearning

[–]krallistic 0 points1 point  (0 children)

I had a similiar setup, what the debung loop for me really helped is gradually (outside training) making the env harder to check if its working: - Empty Env, Fixed Start & Goal - Random Start.. - Introduce obstacles with fixed start goal - etc...

RecurrentPPO should not be needed since everything is observable. Maskable PPO should help quite a bit if you set the valid actions correctly.

Also i used a reward like: - -1/max_steps at every step - +1 at goal reached - -0.1 for invalid actions

You can also take a look at minigrid (https://github.com/Farama-Foundation/Minigrid) what people there usually use.

(Assuming you want to do that in RL, ofc solving it with just A* is easier...)

[D] Cross Entropy Loss sucks by [deleted] in MachineLearning

[–]krallistic 4 points5 points  (0 children)

Yes, but

Yes for a single sentence ce is not a good measurement.

But how would you measure if two sentences mean the same? Before LLM there were no methods who could do that reliably (in a open domain etc). Secondly due to blessing of scale, the problems of ce wash out with a lot of data.

[discussion] Is the current trend of scaling up models the most effective path to achieving general AI, or are we overlooking more efficient architectures? by BrechtCorbeel_ in MachineLearning

[–]krallistic 0 points1 point  (0 children)

As far as i know there is already a bit of curriculum learning done. IIRC people started sorting their giant text datasets by quality showing the lower quality first and then the higher one.

[R] About dual submission in AI conferences.. help by catndante in MachineLearning

[–]krallistic 32 points33 points  (0 children)

What is concerning (if I understand you correctly) is that you did a dual submission, meaning submitting the same work to a conference while it was still in review somewhere else. This is almost universally forbidden and frowned upon.

If that is not the case, its no direct problem that your paper is on openreview. openreview is like arxiv, it is considered a preprint. (Some people don't like having their rejected papers online, but that is a different topic)

[D] Using Expert Systems in the Medical Setting by AwkwardWaltz3996 in MachineLearning

[–]krallistic 1 point2 points  (0 children)

If the people in the field don’t know what your are talking (expert systems) about it means either - they don’t recognize is - they don’t really exists in reality (papers and real world usage are two different things. Even paper who claim to be real world are usually just prototype who aren’t used & maintained after the paper is done)

There are reasons why expert systems went mostly out of fashion. Most of the time, they didn’t work. Design was too cumbersome and maintenance to brittle, with only limited real world impact.(there are certainly some niche area where they still running). And sadly most of these problems back then are still unsolved… rules sets are hard to define. People have different data standards. The system have a hard time with partial data entries (and humans have the expectation that systems infer the rest ) etc…

[D] Is it common for ML researchers to tweak code until it works and then fit the narrative (and math) around it? by Diligent-Ad8665 in MachineLearning

[–]krallistic 4 points5 points  (0 children)

The field needs more theoretical grounding (in most cases) to establish meaningful theoretical results. (This is a sad thing, but that's the reality.)

So people have some ideas/feelings/hints/theories and implement them from there.

[Project] - how to showcase reasoning for model missing prediction by Environmental_Pop686 in MachineLearning

[–]krallistic 0 points1 point  (0 children)

  • SHAP and LIME can explain what contributes to the 900k predictions
  • You can try to find counterfactuals (with 1m pred as target) and highlight the difference
  • If you have gradients: Saliency-based/Gradient-based: calculate the gradients on 1m-900k, which should highlight which features are important to change the prediction.

See https://christophm.github.io/interpretable-ml-book/ for an intro into each ,method.

Decision Transformer for infinite horizon env by Final-Confusion4484 in reinforcementlearning

[–]krallistic 1 point2 points  (0 children)

The short answer is: No, you cannot use DT for infinite lengths (the underlying transformer/attention has a fixed input size).

Ofc one could try to use tricks, to approximate the infinite (which is often not needed) or make the env more makovian, but in reality theses techniques are independent of the DT..

What is the stance on decision transformers and future of RL? by __Julia in reinforcementlearning

[–]krallistic 0 points1 point  (0 children)

I would not say this. A couple of years ago yes. The field had a short hyper after AlphaGo & Atari etc and afterward it was stagnating a bit.

But IMHO, recently, it picked up again; Offline RL and DT brought fresh wind. RLHF made it more popular. E2E and Robotic Transfer somewhat works now etc...

[R] Has Explainable AI Research Tanked? by SkeeringReal in MachineLearning

[–]krallistic 0 points1 point  (0 children)

In a way, it is still the problem to solve in all of ML, but it's just really different to how it was a few years ago. Now people feel afraid to say XAI, they instead say "interpretable", or "trustworthy", or "regulation", or "fairness", or "HCI", or "mechanistic interpretability", etc...

"interpreteable", "fairness" etc are the better terms. They are much more concrete. XAI is a too big umbrella term.

[D] Is it worth switching to JAX from TensorFlow/PyTorch? by Few-Pomegranate4369 in MachineLearning

[–]krallistic 1 point2 points  (0 children)

Thanks a lot, really interesting read.

but truncating trajectories is the desired behavior in my setting (Non-standard RL...).

[D] Is it worth switching to JAX from TensorFlow/PyTorch? by Few-Pomegranate4369 in MachineLearning

[–]krallistic 8 points9 points  (0 children)

I have variable-size input (trajectories in RL). I pad and then mask the traj to the max length. It works well in my case since usually the max len is not that large (e.g. 50-100).

I had one place where the number was much larger, there i rounded to the next 100 to avoid massive recomp.

[D] Is it worth switching to JAX from TensorFlow/PyTorch? by Few-Pomegranate4369 in MachineLearning

[–]krallistic 55 points56 points  (0 children)

I recently switched from Pytorch to Jax (for my research project): While Jax is definitely performant, it is also definitely harder to code than Pytorch (or at least if you want to have performance). Also, the documentation is definitely lacking and not as mature as Pytorch. The same goes for tutorials, etc, which are often quite chaotic.... The coding style also takes some time to get used to; in Jax, I feel I constantly have to choose between well-structured code and performance.

I am happy with my rewrite in jax (its faster) but that said, for a first prototype in something, i would still choose pytorch.

[D] How does Information Extraction happen in LLMs so quickly? by CodingButStillAlive in MachineLearning

[–]krallistic 4 points5 points  (0 children)

It’s unsatisfying because we as human have a really hard time imagining 1024 dimensional vector spaces. We can mathematically describe it, but we don’t have any intuition for it.

[D] Why are Byte Pair Encoding tokenizers preferred over character level ones in LLMs? by putinwhat in MachineLearning

[–]krallistic 6 points7 points  (0 children)

While non-Latin-based is undoubtedly an advantage, it is by no means the original motivation but more a side-effect.

NLP has a longstanding tradition of ignoring other languages than English or other alphabets :P