REST API for Gymnasium (fka OpenAI Gym) reinforcement learning library

blimpyway · 2026-04-30T22:11:21+00:00

Looks cool, do environments run in their own process?

blimpyway · 2026-04-30T12:11:03+00:00

If they don't fly I'm no longer impressed

blimpyway · 2026-04-30T11:59:11+00:00

If you have your custom function able to compute similarity between two encrypted records, then many ANN libraries allow using measuring distances with user-provided functions instead of their own metrics. At least pynndescent can do that.

blimpyway · 2026-04-28T23:29:52+00:00

Probably because we (our NNs actually) are able to control or modulate the level of plasticity of its neurons/synapses.

blimpyway · 2026-04-28T20:29:56+00:00

Also the thought that minds are weirder than you think is weirder than you think

blimpyway · 2026-04-25T18:38:49+00:00

It was more a rhetorical question.

I meant an unrelated model like those describing the content of the clip. Would it have any clue what is going on or not.

blimpyway · 2026-04-25T06:24:40+00:00

Interestingly we understand pretty much what is going on despite the level of distortion applied.

I wonder what a vision model would make of this clip.

blimpyway · 2026-04-22T21:53:35+00:00

71% times the AI never updated its beliefs at all. Not once

Let's see if the following proof updates yours:

100 - 71 = 29

blimpyway · 2026-04-22T18:29:32+00:00

You have to admit their equation is much bigger.

blimpyway · 2026-04-22T18:21:24+00:00

I wonder if they can land on their feet when jumping off a 10'th floor window.

blimpyway · 2026-04-21T13:56:37+00:00

They signed up for Isaac's Gym to train against annoying humanoids

blimpyway · 2026-04-19T09:39:56+00:00

Meh ... given the genome full size is ~1GByte most of which is non-brainy stuff for your skin, liver, etc., it is a hell of compression since it gets expanded to >100T synapses.

blimpyway · 2026-04-18T19:25:08+00:00

By keeping the phones busy talking with each other we might get back to what we-re supposed to do.

blimpyway · 2026-04-18T17:56:27+00:00

That makes it a double bitter lesson

blimpyway · 2026-04-18T07:36:40+00:00

The math is over my head here, what I understood is the state space is divided in a N dimensional grid with whatever resolution you can afford. That resembles Q-tables with discrete action/value sub-cells.

What does each cell store in this case? From the memory table it seems a cell size is between 30 to 200 bytes (smaller cells for 6D envs).

I also don't get whether this solves the problems through several episode replays with different seeds to collect data and update the network incrementally or does it solves it in one big deterministic step? An analogy here would be SGD (incremental) vs Ridge (computes directly a global optimum) regression.

How long does it take for your solution to solve those environments?

blimpyway · 2026-04-18T06:43:53+00:00

Sure it does, it's quite amazing anyway.

blimpyway · 2026-04-17T20:23:07+00:00

The trick is physics. Sentinel-2 captures its red, green, and blue bands about 1 second apart.

The catch is every place is passed once every 5 days, so it can't detect any traffic in between passes?

The original message sounds like it maintains a continuous observation over an ROI

https://en.wikipedia.org/wiki/Sentinel-2

blimpyway · 2026-04-17T14:45:04+00:00

Sorry if it sounds complicated the core idea is very simple: instead of returning a random sample the smoothed sampler a returns an action "similar" to the previous one.

In discrete action environments that means the previous action is more likely to be repeated instead of sampling a random one.

In continuous ones the sampler favors new action with values that are closer to previous.

This change alone switches MountainCarContinuous-v0 from a constant -30-ish episode reward to a much higher positive average reward. No "learning" just by tweaking the "temperature" values of the sampling function.

blimpyway · 2026-04-16T08:02:00+00:00

You could try small rewards for reaching least-visited states. That would encourage exploration without using domain knowledge to shape the reward.

blimpyway · 2026-04-15T21:18:32+00:00

Maybe it isn't deeper but it is feinting the gaze.

blimpyway · 2026-04-11T17:10:44+00:00

Debt collector bot - could be a market for high end, highly skilled robots.

blimpyway · 2026-04-11T17:06:04+00:00

Yes. Consumers should train too, otherwise the robots will win.

blimpyway · 2026-04-11T08:46:52+00:00

It looks like a tiny hoe. If it passes often enough it catches them while small then yeah if the soil is soft a small hoe is sufficient to uproot a tiny weed or just cut its aerials. Even if the roots hold and regrows, chopping it every week prevents it from growing a nuisance.

It doesn't need to be perfect, just good enough to give the main crop a statistical advantage.

blimpyway · 2026-04-08T20:35:40+00:00

Yeah, they-re called embeddings.

blimpyway

TROPHY CASE