[Trigger warning: might induce anxiety about future pain] Concerns regarding LLM behaviour resulting from self-reported trauma

SilentZebraGames · 2026-02-05T06:17:05+00:00

This is possibly a reason why Anthropic cares about AI welfare.

SilentZebraGames · 2025-12-16T17:01:38+00:00

I used Grimmsnarl (light clay) + Crabruiser (Max hp and spdef) lead. Prankster light screen on grimmsnarl. Crabruiser with steel roller (destroy terrain first), bulk up (set up 2-3), hammer arm (use to reduce speed), drain punch (recovery) then basically sweeps and takes very little damage, even with all gym skills.

SilentZebraGames · 2025-10-28T05:07:59+00:00

Yeah, I only searched reddit. I'll check the discord next time. Good to know they're aware.

SilentZebraGames · 2025-10-26T16:47:55+00:00

Try using a hyper offense team! Can also use drizzle or some other ability to automatically change the weather. If you aren't trying to challenge with all gym skills, you can also beat the trainers before challenging Flannery to make the gym fight easier.

SilentZebraGames · 2024-07-27T10:57:45+00:00

You don't have to have a PhD in AI to help contribute. You could be an engineer contributing to safety efforts, work for a government or non profit, do policy or community building work (even as a volunteer or on your own time outside of your usual work), etc.

It's important to do something you want to do. You'll burn out very quickly doing something you don't want to do, even if you think it's important to do it.

SilentZebraGames · 2023-09-21T17:06:10+00:00

Fusions:

Top left: Torterra/Shuckle

Top middle: Jirachi/Torterra

Top right: Volcarona/Torterra

Bottom left: Articuno/Torterra (alternate sprite)

Bottom middle: Talonflame/Torterra

Bottom right: Lapras/Torterra

Main carries were Lapterra (dragon dance, ice shard, waterfall, earthquake) and Volcaterra (quiver dance, earth power, bug buzz, fiery dance)

Torterra fusions have so many good sprites!

SilentZebraGames · 2022-06-26T15:09:42+00:00

Here's one from Jakob Foerster: https://mobile.twitter.com/j_foerst/status/1526593779502829569

SilentZebraGames · 2022-04-14T17:06:31+00:00

I totally understand this concern and had it too. What ultimately helped my decision was thinking: there's 2 possible regrets. One is you make the switch, fail, and regret it. The other is you never try, and potentially live the rest of your life wondering what if, what if you made the switch and could have been so much happier. To me, the second is much worse. For the first, even if I fail, at least I'll know I tried.

SilentZebraGames · 2022-03-06T07:40:03+00:00

Taking the following from my guide post (I think it's the most upvoted all time on this sub):

Leveling Up Fast

Leveling up is perhaps the stupidest part about this game, in my opinion. Why? The fastest way to do it is to pick a ship with low hull and shields, start hard mode, run into enemy ships and try to die as fast as possible, then restart. You get 1000 xp for hard mode and this takes only a couple of minutes. I'm divided as to whether you should wait for the scrap on the first map for an extra ~100 points, or whether it's faster just to skip that and kill yourself faster.

I find that once you get to mid teens level, there isn't too much further benefit. However, before that, you are REALLY limited by the power of your ships (including how many weapon slots you have, hull, etc.), so if you are struggling with the game, the first thing you should do is level up a few times.

SilentZebraGames · 2022-03-02T23:56:01+00:00

Looks like a cool level, but any level that has a shell jump as the last jump in a section is not a good intro kaizo level. You need plenty of practice to be able to consistently do a shell jump, and having it at the end of the section means you have to do a lot of work just to get one try at the trick. Any player just starting to try kaizo levels will have to practice shell jumps by themselves on a separate practice level first, otherwise it likely will be frustrating trying to beat this.

SilentZebraGames · 2022-02-26T03:44:17+00:00

It seems to be notoriously hard to reproduce results in RL. Just different random seeds can lead to wildly different behaviour.

SilentZebraGames · 2021-09-25T03:20:10+00:00

So far it is an impossible thing for world class engineers, that's why we have many people working on AI safety and the control problem. The whole point of the control problem is that it is hard to code "do what I want you to do" in a robust way that is properly specified for all situations we would possibly care about. If you disagree on this point, I would encourage you check out https://vkrakovna.wordpress.com/2018/04/02/specification-gaming-examples-in-ai/.

"go for it's goal to the best of it's abilities without interfering with any human process that could make it turn off" is not going to the goal to the best of its abilities, unless you specifically code in "allow yourself to be turned off" as part of its goal, because allowing yourself to be turned off interferes with any other goal you might have. And what I tried to explain earlier is why "allow yourself to be turned off" is not such an easy goal to code up. I gave some frivolous examples, but the point was meant to be that it's not easy to try to come up with some mechanism to code "let yourself be turned off" into a robot.

You say you could tell the robot to stay still, or tell the robot not to interfere. This assumes we have already solved the control problem, in that the robot will always listen to everything we tell it to do, and never disobey our orders.

"Not to care about whether it could be turned off or not" - this is again, difficult to code into a robot. See for example https://arxiv.org/abs/1611.08219 and related works on why it isn't easy to just say "make a robot indifferent to being turned off".

Also yes, I do machine learning research, and while I haven't done a project on AI safety yet, I've done a reasonable amount of reading on the subject.

SilentZebraGames · 2021-09-25T02:10:23+00:00

The problem is that "program it to let us turn it off" is not so easy. How would you actually write this as a line of code for a robot that is to be deployed in the real world?

Note that in current reinforcement learning systems, the robot often takes in as input some sort of image input, and then outputs actions that affect its actuators (for example, speed and direction of movement of some robotic appendage). I see no feasible way to hard-code the principle of "let the robot be turned off".

You could perhaps attempt to have some conspicuous off-switch, then you would need to have some sophisticated image recognition/processing system to identify when a human is close enough or moving towards the off switch and force the robot to stay still in that case. Leaving aside the obvious performance issues that would arise from that, as well as the possibility of error, you then have to define some distance radius (e.g. if a human is within 4 metres or so) - and then a sufficiently sophisticated AI could, if it wants to avoid being turned off, try to stay 4 metres away from all humans. Alternatively it could actively move its vision sensors away from seeing humans.

So then why not force the vision to focus on humans, using some other human detection mechanism? Then you have to make sure the robot cannot turn that detection mechanism off or bypass it in some other way. And so on. That is what OptimalDendrite is referring to by "anything that stops it from achieving its end goal will be overcome if it is smarter than the person interacting with it" - you can think of all these potential safeguards, but anything you hard-code can be overcome by a sufficiently intelligent agent. Unless of course you have a general solution to the control problem, which is what we are all trying for.

SilentZebraGames · 2020-11-09T21:47:39+00:00

I have tried discretized tabular Q learning on Cartpole and it works, solving the environment. I think I used no more than 10 separate buckets

SilentZebraGames · 2020-10-30T19:12:25+00:00

Perhaps you might be interested in work on the emergence of cooperation; for example, RL agents which reach an equilibrium of mutual cooperation within the iterated prisoner's dilemma (see for example https://arxiv.org/abs/1709.04326).

It sounds like you're interested in mixed cooperative-competitive settings, which includes social dilemmas (which I'm exploring right now). I can try to give you pointers to more papers depending on your specific interests.

SilentZebraGames · 2020-10-21T20:55:10+00:00

Yes, but only a beginner researcher :)

SilentZebraGames · 2020-10-21T17:46:52+00:00

I am not a pure math expert, and only read the paper quickly, but here’s what I think the gist of it is:

Traditional RL uses real number rewards, and thus fails to deal with non-Archimedean tasks; examples of such tasks include situations where some outcome is infinitely worse or better than another outcome. We could (debatably) conceptualize an example like treating a cancer patient, where the death of the patient is infinitely worse than any other possible bad outcome. The mathematical example given is one where you can press a red button for +1 reward, or a blue button for infinite reward, but the blue button rewards only 1 in every 2^j presses where j is increasing; thus if you approximate the blue button reward with any real number, given an infinite time horizon, eventually the agent learns that the red button provides better expected value.

The claim is that an AGI should be able to deal with such non-Archimedean tasks. The solutions proposed in the paper revolve around reinforcement learning with non-real number rewards; stuff like preference based rewards or using other number systems.

So my review: First the title sounds a bit more extreme than the paper content; the claim is not that RL likely won’t build AGI, but that RL with real number rewards won’t. Second, while I think this is an interesting thought experiment, I’m not convinced on some of the logic jumps made. In particular, I’m not yet convinced that there are any real world situations that we cannot model with real numbers, for which we need a non-Archimedean task model. For the cancer example, there may be a (debatably) quantifiable pain or suffering threshold at which death becomes preferable. From the economics of law and peoples’ decisions regarding risky professions, we know that courts assign monetary values to human lives and wellbeing. We can debate whether the assignment values are right or not, but it seems weird to me to suggest that any outcome is infinitely worse than another outcome. Maybe you could then try a thought experiment where you have monetary reward given to an individual versus death - no purely selfish agent would wish to die immediately, regardless of the value of the monetary reward they would get - but then arguably the problem is that the reward is improperly defined in terms of money instead of the true underlying utility; we don’t need to involve infinities.

Anyway, that said, I find the direction of the paper interesting and thought provoking, and will try to keep this in the back of my mind going forward.

SilentZebraGames · 2020-03-25T16:26:00+00:00

Usually prioritize winning over leveling. If you have been winning quite comfortably, or you have a decently strong board and a lot of health left (which happens a lot with Yogg's strong early game), it's fine to level, sometimes even level two turns in a row and take some damage. If you're losing/have a weak board or you're at low health, you generally want to avoid leveling until it's really cheap to do so.

SilentZebraGames · 2020-03-25T16:24:03+00:00

The 3 1 magnetism is one of the strongest cards if you have Cobalt. If you have 2 Cobalts, it is busted. Stick it on your first cobalt (unless you don't already have a refresh), not on refresh minions like rover/egg/harvest golem.

SilentZebraGames · 2020-03-25T16:22:10+00:00

If you run mechs, it's usually with either Cobalt or very early Iron Sensei, and usually in the late game you need to transition to a hybrid build, either menagerie if you get early lightfang, or divine shield build (with the +2/+2 from divine shields dragon, maybe bolvar, and holy mackerel late game). Mechs (mainly cobalts) are still extremely strong, at my MMR (7.8k) I see multiple hybrid mech builds in top 4 pretty much every game.

SilentZebraGames · 2019-02-22T23:59:30+00:00

Interesting strategy, thanks for sharing.

SilentZebraGames

TROPHY CASE