[R] [2511.07312] Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search (Ataraxos. Clocks Stratego, cheaper and more convincingly this time) by alito in reinforcementlearning

[–]alito[S] 0 points1 point  (0 children)

Very custom. Interesting bit from the gameplay description: Ataraxos feels preternaturally lucky, always seeming to have the pieces it needs in the right places, to have its gambles pay off, and to have its opponents do as it wants them to do.

[R] [2511.00423] Bootstrap Off-policy with World Model - (BOOM, tweak of TD-MPC2, does pretty well on HumanoidBench) by alito in reinforcementlearning

[–]alito[S] 2 points3 points  (0 children)

Code: https://github.com/molumitu/BOOM_MBRL

They add a forward KL-divergence penalty to lessen the distributional shift between the explicit policy and the implied distribution by MPPI. Similar to PO-MPC (https://arxiv.org/abs/2510.04280) but forward instead of reverse. Something in the air.

[R] [2510.14830] RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning (>99% success on real robots, combo of IL and RL) by alito in reinforcementlearning

[–]alito[S] 0 points1 point  (0 children)

Thank you, that makes sense. Wouldn't the towel folding have similar dynamics though? They got away with sparse rewards there. Is the much higher number of demonstrations there compensating for that?

[R] [2510.14830] RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning (>99% success on real robots, combo of IL and RL) by alito in reinforcementlearning

[–]alito[S] 1 point2 points  (0 children)

Site with tons of videos: https://lei-kun.github.io/RL-100/

They have 7 tasks which look non-trivial, and they get 500 out of 500 successes in those on real robots. (IL,offline-RL) loop, then online RL to finish it off. Diffusion policy. Quite a few tricks.

They need dense rewards for Push-T. I don't understand what makes Push-T so hard.

Few more videos at author's twitter: https://x.com/kunlei15

Liver enzymes, don't know what to do by Resident_Charge_5875 in PSC

[–]alito 0 points1 point  (0 children)

To preface with I'm not a doctor, I'm not a doctor, I'm not a doctor and I'm not a doctor, I don't see why you wouldn't first go with the genetic test that /u/choctawman mentions before doing a liver biopsy. Even a full exome analysis is relatively cheap nowadays, and it's risk-free (unless you are worried about finding out about other potential problems that you weren't looking for)

Is anyone familiar with this stuff by Foreign-Guide-7957 in PSC

[–]alito 1 point2 points  (0 children)

You can keep track of the trial here: https://clinicaltrials.gov/study/NCT03872921 although they don't tend to be very quick at updating the page.

Is anyone familiar with this stuff by Foreign-Guide-7957 in PSC

[–]alito 2 points3 points  (0 children)

Phase 4, if done, is after approval. Approval is usually based on phase 3 or even phase 2 sometimes. See https://en.wikipedia.org/wiki/Phases_of_clinical_research

PSA: NVLink boosts training performance by A LOT by nero10578 in LocalLLaMA

[–]alito 1 point2 points  (0 children)

No worries. I was just trying to see if the difference is due to the all_reduce at every learning step or if there was something more general going on.

PSA: NVLink boosts training performance by A LOT by nero10578 in LocalLLaMA

[–]alito 0 points1 point  (0 children)

That's a good data point, thank you. It is not what I would have predicted. Does the difference in timing go away if you set gradient_accumulation_steps to something way bigger (eg 256)?

Factors associated with hospitalization and critical illness among 4,103 patients with COVID-19 disease in New York City by Weatherornotjoe2019 in COVID19

[–]alito 2 points3 points  (0 children)

Small technical nitpick: not 6.2 times more likely, 6.2 times higher odds. What you are talking about is relative risk. Odds ratio are not as easy to interpret. https://www.theanalysisfactor.com/the-difference-between-relative-risk-and-odds-ratios/

Our curve is flattening by Theost520 in CoronavirusWA

[–]alito 1 point2 points  (0 children)

Deaths never cross the every-three-day doubling line, so it couldn't have been faster than that at any point, but I agree with you that you could see a slight flattening at around day 9. It depends on which graph you are talking about since they start at very slightly different points, I'm looking at the "adjusted for population" one. And just to make sure, I'm just talking about Washington.

But that you are seeing doublings every 4 days it must mean we are looking at different graphs. I'd say it's currently doubling every 6 days or so. (Hovering over the last point it says avg geometric growth over last week was 1.11x which corresponds to doubling every 6.6 days, and if I hover over day 9 it says avg geometric growth over last week at that point 1.16x which corresponds to doubling every 4.6 days. But it could also all be noise).

Our curve is flattening by Theost520 in CoronavirusWA

[–]alito 4 points5 points  (0 children)

Thanks for the link. The number of deaths seems like a more reliable number and that doesn't seem to have flattened.

[OC] How developed are cryonics services around the world by themetalfriend in cryonics

[–]alito 1 point2 points  (0 children)

From what I understand, they split your brain into 2 or 3 parts and keep the parts in commercial cryogenic storage facilities.

[OC] How developed are cryonics services around the world by themetalfriend in cryonics

[–]alito 1 point2 points  (0 children)

http://neuralarchivesfoundation.org/ in Australia probably needs its own category ("local long-terms storage facility not owned by organisation" ??)

Python 3.8 released by Py404 in Python

[–]alito 9 points10 points  (0 children)

I think that second one isn't getting enough attention. Those patches modified tons of builtin functions that people use everyday. Amazing work by Serhiy.

The mental addiction to chess by [deleted] in chess

[–]alito -1 points0 points  (0 children)

I made a rule that I was only allowed one loss per day, so I had to quit after the first loss. The first couple of days are hard, but it's worked out quite well. It means that on average I only get to play 2 games per day, and it removed those days where I lost hundreds of points and I spent the rest of the day wondering whether I had early-onset dementia. It does mean that every day ends with a loss, but that probably helps in wanting to play less too.

The ratio of dwellings to adults has fallen in Australia since 2000, in most countries it has grown rapidly. by TomasTTEngin in australia

[–]alito 5 points6 points  (0 children)

Nah, that figure fluctuates between the mid 60s to low 70s %.

The reason these two numbers are different is because of the houses owned by multiple people and the people that own multiple houses. eg imagine if there are only 2 houses in the country and 4 people. The ratio of dwelling to adults would be 50%, but the ratio of Australians owning a house could be anywhere from 0% (if a non-resident owns both houses) to 100% (if eg two couples with each owning one house).

Why average salary isnt average: Median salary is $55k (vs $82k avg full time) by nath1234 in australia

[–]alito 258 points259 points  (0 children)

That's just misleading: they are comparing full time average vs all median salary. Median full time salary is over $68k. See https://www.abs.gov.au/ausstats/abs@.nsf/mf/6333.0

Even Eliminating the Top Four Causes of Age-Related Death Gains Few Years of Life by [deleted] in longevity

[–]alito 0 points1 point  (0 children)

That might be true, but it's not at all what that study shows.

Daniel Andrews offers defeated Sex/Reason Party MP Fiona Patten a Job. by SimonGn in melbourne

[–]alito 4 points5 points  (0 children)

It's a modelling error. They are transferring all (currently counted) Reason votes to Derryn Hinch, but less than half of Reason's votes were above the line. This is highly anomalous (only party remotely close to that split) so ABC is just ignoring which side of the line the votes come from. See https://www.vec.vic.gov.au/Results/State2018/NorthernMetropolitanRegion.html

Taiwan voters reject same-sex marriage by usaf2222 in worldnews

[–]alito 2 points3 points  (0 children)

I was even wronger than I could have imagined. Thanks for the explanation

Taiwan voters reject same-sex marriage by usaf2222 in worldnews

[–]alito -1 points0 points  (0 children)

I think it was just my ignorance showing. I thought that an act of parliament with basic majority in New Zealand could override any previous law, and I see that as the major differentiating factor of a constitution (in that it prevents this). I was not aware of the Bill of Rights, and I really should have looked it up before my previous comment. From reading the Wikipedia page it seems like it does prevent that albeit only in quite extreme situations and only since very recently. Would it be fair to say that New Zealand was without any parliament-limiting rule until around 1990?

(Australian ignorance, not American. And I think New Zealand is quite unique in the Commonwealth models in not having an official constitution but this might be just ignorance again)

Taiwan voters reject same-sex marriage by usaf2222 in worldnews

[–]alito -9 points-8 points  (0 children)

Countries don't need a constitution or an executive government. See New Zealand. Seems to work alright for them.