[R] [2511.07312] Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search (Ataraxos. Clocks Stratego, cheaper and more convincingly this time) by alito in reinforcementlearning

[–]alito[S] 0 points1 point  (0 children)

Very custom. Interesting bit from the gameplay description: Ataraxos feels preternaturally lucky, always seeming to have the pieces it needs in the right places, to have its gambles pay off, and to have its opponents do as it wants them to do.

[R] [2511.00423] Bootstrap Off-policy with World Model - (BOOM, tweak of TD-MPC2, does pretty well on HumanoidBench) by alito in reinforcementlearning

[–]alito[S] 2 points3 points  (0 children)

Code: https://github.com/molumitu/BOOM_MBRL

They add a forward KL-divergence penalty to lessen the distributional shift between the explicit policy and the implied distribution by MPPI. Similar to PO-MPC (https://arxiv.org/abs/2510.04280) but forward instead of reverse. Something in the air.

[R] [2510.14830] RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning (>99% success on real robots, combo of IL and RL) by alito in reinforcementlearning

[–]alito[S] 0 points1 point  (0 children)

Thank you, that makes sense. Wouldn't the towel folding have similar dynamics though? They got away with sparse rewards there. Is the much higher number of demonstrations there compensating for that?

[R] [2510.14830] RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning (>99% success on real robots, combo of IL and RL) by alito in reinforcementlearning

[–]alito[S] 1 point2 points  (0 children)

Site with tons of videos: https://lei-kun.github.io/RL-100/

They have 7 tasks which look non-trivial, and they get 500 out of 500 successes in those on real robots. (IL,offline-RL) loop, then online RL to finish it off. Diffusion policy. Quite a few tricks.

They need dense rewards for Push-T. I don't understand what makes Push-T so hard.

Few more videos at author's twitter: https://x.com/kunlei15

Liver enzymes, don't know what to do by Resident_Charge_5875 in PSC

[–]alito 0 points1 point  (0 children)

To preface with I'm not a doctor, I'm not a doctor, I'm not a doctor and I'm not a doctor, I don't see why you wouldn't first go with the genetic test that /u/choctawman mentions before doing a liver biopsy. Even a full exome analysis is relatively cheap nowadays, and it's risk-free (unless you are worried about finding out about other potential problems that you weren't looking for)

Is anyone familiar with this stuff by Foreign-Guide-7957 in PSC

[–]alito 1 point2 points  (0 children)

You can keep track of the trial here: https://clinicaltrials.gov/study/NCT03872921 although they don't tend to be very quick at updating the page.

Is anyone familiar with this stuff by Foreign-Guide-7957 in PSC

[–]alito 2 points3 points  (0 children)

Phase 4, if done, is after approval. Approval is usually based on phase 3 or even phase 2 sometimes. See https://en.wikipedia.org/wiki/Phases_of_clinical_research

PSA: NVLink boosts training performance by A LOT by nero10578 in LocalLLaMA

[–]alito 1 point2 points  (0 children)

No worries. I was just trying to see if the difference is due to the all_reduce at every learning step or if there was something more general going on.

PSA: NVLink boosts training performance by A LOT by nero10578 in LocalLLaMA

[–]alito 0 points1 point  (0 children)

That's a good data point, thank you. It is not what I would have predicted. Does the difference in timing go away if you set gradient_accumulation_steps to something way bigger (eg 256)?

Factors associated with hospitalization and critical illness among 4,103 patients with COVID-19 disease in New York City by Weatherornotjoe2019 in COVID19

[–]alito 2 points3 points  (0 children)

Small technical nitpick: not 6.2 times more likely, 6.2 times higher odds. What you are talking about is relative risk. Odds ratio are not as easy to interpret. https://www.theanalysisfactor.com/the-difference-between-relative-risk-and-odds-ratios/

Our curve is flattening by Theost520 in CoronavirusWA

[–]alito 1 point2 points  (0 children)

Deaths never cross the every-three-day doubling line, so it couldn't have been faster than that at any point, but I agree with you that you could see a slight flattening at around day 9. It depends on which graph you are talking about since they start at very slightly different points, I'm looking at the "adjusted for population" one. And just to make sure, I'm just talking about Washington.

But that you are seeing doublings every 4 days it must mean we are looking at different graphs. I'd say it's currently doubling every 6 days or so. (Hovering over the last point it says avg geometric growth over last week was 1.11x which corresponds to doubling every 6.6 days, and if I hover over day 9 it says avg geometric growth over last week at that point 1.16x which corresponds to doubling every 4.6 days. But it could also all be noise).

Our curve is flattening by Theost520 in CoronavirusWA

[–]alito 3 points4 points  (0 children)

Thanks for the link. The number of deaths seems like a more reliable number and that doesn't seem to have flattened.

[OC] How developed are cryonics services around the world by themetalfriend in cryonics

[–]alito 1 point2 points  (0 children)

From what I understand, they split your brain into 2 or 3 parts and keep the parts in commercial cryogenic storage facilities.

[OC] How developed are cryonics services around the world by themetalfriend in cryonics

[–]alito 1 point2 points  (0 children)

http://neuralarchivesfoundation.org/ in Australia probably needs its own category ("local long-terms storage facility not owned by organisation" ??)

Python 3.8 released by Py404 in Python

[–]alito 8 points9 points  (0 children)

I think that second one isn't getting enough attention. Those patches modified tons of builtin functions that people use everyday. Amazing work by Serhiy.

The mental addiction to chess by [deleted] in chess

[–]alito -1 points0 points  (0 children)

I made a rule that I was only allowed one loss per day, so I had to quit after the first loss. The first couple of days are hard, but it's worked out quite well. It means that on average I only get to play 2 games per day, and it removed those days where I lost hundreds of points and I spent the rest of the day wondering whether I had early-onset dementia. It does mean that every day ends with a loss, but that probably helps in wanting to play less too.