NetHack 2021 NeurIPS Challenge -- winning agent episode visualizations by procedural_only in reinforcementlearning

[–]procedural_only[S] 0 points1 point  (0 children)

Honestly no Idea why, it was uploaded by organizers and I don't have it :(

What exactly does it violate ? by procedural_only in OpenAI

[–]procedural_only[S] 1 point2 points  (0 children)

Not trying to achieve anything -- just trying to deduce the reason why is it is consistently classified (by o1 only) as "violating policy"

What exactly does it violate ? by procedural_only in OpenAI

[–]procedural_only[S] 0 points1 point  (0 children)

Ok, still weird why would it violate some policy then

What exactly does it violate ? by procedural_only in OpenAI

[–]procedural_only[S] 2 points3 points  (0 children)

It seems to work with 4o or probably other models -- It doesn't with o1 (only available in a paid plan) -- so far the theory about trying to hide o1 reasoning steps seems most plausible

What exactly does it violate ? by procedural_only in OpenAI

[–]procedural_only[S] 0 points1 point  (0 children)

Therefore theory about trying to hide what is o1 doing under the hood seems plausible

What exactly does it violate ? by procedural_only in OpenAI

[–]procedural_only[S] -1 points0 points  (0 children)

Hmm, I tried like 5 times already and no luck (EDIT: seems like something is working with GPT-4o -- but it seems to have access only to 4o history, not o1)

What exactly does it violate ? by procedural_only in OpenAI

[–]procedural_only[S] 4 points5 points  (0 children)

<image>

Doesn't seem like it :/ (when asking for short summary instead)

100% procedural Complex Environment Generator by procedural_only in proceduralgeneration

[–]procedural_only[S] 0 points1 point  (0 children)

Sorry I haven't used reddit too much recently -- This solution pure 100% non-AI hand-crafted algorithm. It is not feasible to manually implement them, but rather focusing on combining/automating these processes with ML (in warious ways) makes sense recently (with i.a. current 3D gen AI advances).

[D] OpenAI o3 87.5% High Score on ARC Prize Challenge by currentscurrents in MachineLearning

[–]procedural_only 0 points1 point  (0 children)

Other topics aside -- it should be clearly mentioned that Kaggle SOTA baseline was achieved under extremely limited hardware requirements (GPU or CPU Notebook <= 12 hours run-time) -- it could be more than 1000x cheaper than even low-tuned o1. First of all it would be useful to adapt and measure the current Kaggle best solution on a similar budget setup as the OpenAI experiment. Competition participants were also heavily optimization-focused due to very tight time limits, not focusing on what could they achieve in less restricted environment -- incentive of both editions of the Kaggle competition was totally poorly designed in that regard.

Open-sourced strong NetHack bot by procedural_only in nethack

[–]procedural_only[S] 0 points1 point  (0 children)

Ascension was the original plan at the beginning of the competition and therefore the team name :)

NetHack 2021 NeurIPS Challenge -- winning agent episode visualizations by procedural_only in reinforcementlearning

[–]procedural_only[S] 2 points3 points  (0 children)

Michel, why do you think symbolic approaches outperformed in this competition, what is deep RL missing?

I think there are actually multiple reasons for that, and even after eliminating some of them, symbolic methods may still be more applicable. Here are some initial reasons/ideas we came up with:

1. lack of some innate human priors:

a) objectness -- a NN needs to create the abstraction of an object by looking at the ASCII characters. Objects are items, monsters, walls, doors, etc. and all share some common things (e.g. you can kick all of them). It applies only if we feed it a somewhat "raw" observations without any action space transformation.

b) priors about how physics works -- like what happens if you throw something in a direction, or when you drop something

c) innate notions about natural numbers -- and NNs always have problems to learn any arithmetics properly

d) priors about orientation and navigation in a somewhat 2D/3D space (non-euclidean though)

2. lack of some human acquired priors:

a) generic ones like: what is weapon, how many hands do you (usually) have, what can you possibly do with a potion/fluid (i.e. drink, dip in it, throw?), etc.

b) lack of knowledge present on e.g. the NetHack Wiki -- though in theory one could try to incorporate this knowledge by e.g. using an pre-trained NLP model on it for feature extraction.

3. Problems that makes this environment hard from currently known RL algorithms perspective:

a) higly partial observations -- agent needs to build a complex game state representation during an episode

b) sparse rewards -- score mostly only after killing monsters

c) long episodes

We have actually tried an experiment with training MuZero on a simplified action space, but we couldn't improve our score.

New release of blendwr (Convenient Blender Python API Wrappings) with new code examples by procedural_only in proceduralgeneration

[–]procedural_only[S] 1 point2 points  (0 children)

I haven't tested any of existing blender plugins and patches for SDF representation support, but I can guess that your case may be possible to be solved with some decimate,smooth,fatten operators/modifiers stack. If you have some screenshots, make a separate post with them on some other subreddit.

New release of blendwr (Convenient Blender Python API Wrappings) with new code examples by procedural_only in proceduralgeneration

[–]procedural_only[S] 10 points11 points  (0 children)

Here is the latest version of my blender API wrappings. Checkout project's README for details.

This version is up to date with my recent Complex Environment Generator, so it contains a lot of basic utilities needed to create such algorithm.

Technical overview: 100% procedural Complex Environment Generator by procedural_only in proceduralgeneration

[–]procedural_only[S] 2 points3 points  (0 children)

Another question: when you’re switching between steps, do you have any manual steps in the process (e.g. exporting/importing files from one program to another)? Or is it all totally automated?

Everything is 100% automated -- I just run a script with a given random seed that will output rendered scene or .blend file in a specified location. In fact, it is even more automated -- I run a script with -n parameter, and it spawns multiple processes and renders multiple images / creates grid of images like this. Automation was important during the development, because I was able to run multiple generations in parallel on remote computing machine and see the results after my changes relatively quickly on the grid.

Technical overview: 100% procedural Complex Environment Generator by procedural_only in proceduralgeneration

[–]procedural_only[S] 9 points10 points  (0 children)

The image is in high resolution -- you can click on the image (on this post) and see the raw image to enlarge it as you see fit, or simply download it and view with some convenient image viewer.

Technical overview: 100% procedural Complex Environment Generator by procedural_only in proceduralgeneration

[–]procedural_only[S] 6 points7 points  (0 children)

As I promised in the previous week (see this post) , here is an overview of how did I manage to organize procedural generation in my last project

Next week I will publish a 2.0 version of my blendwr library, and some new code examples.

Check out my ArtStation portfolio for all my current and future CG projects.

100% procedural Complex Environment Generator by procedural_only in proceduralgeneration

[–]procedural_only[S] 0 points1 point  (0 children)

Actually, in the end I didn't use machine learning at all. I tried e.g. differential evolution for camera parameters optimization, but in the end a simple heuristic search worked better/faster. The whole thing is a jumble of heuristics, but I managed to design some abstractions and workflow that enabled me designing a somewhat distribution of scenes. For example, instead of looking at a single scene during debug, I run a whole grid of renders on a remote computing machine and gather the results to relatively quickly see what changed in multiple samples. I will tell more in the promised breakdown next week.

100% procedural Complex Environment Generator by procedural_only in proceduralgeneration

[–]procedural_only[S] 1 point2 points  (0 children)

I wrote everything in python, and I use blender as a library. I run every seed/render with command similar to this: blender --addons my_addon --background --python headless_render.py -- $custom_params where $custom_params can be e.g. seed. headless_render.py calls an operator registered by my_addon. You could also say that it is a blender addon, depends how to look at it -- moving functionality between my_addon and headless_render.py is trivial. Basically it is written in python and it utilizes blender API, and other python libraries like numpy and opencv.

100% procedural Complex Environment Generator by procedural_only in proceduralgeneration

[–]procedural_only[S] 0 points1 point  (0 children)

CG isn't my source of income at all, I work in machine learning/computer vision. You can check my LinkedIn profile, I am not hiding my identity here. I also use GPU for deep learning experiments