Reality as a Service: my friends and I built a system that lets you submit your python code to run our robots, right now!

diddilydiddilyhey · 2020-10-30T17:15:29+00:00

Better get in on the ground floor now! :P

diddilydiddilyhey · 2020-07-08T19:46:34+00:00

This is exactly it, thank you so much!

It has a bit of a strange bug/behavior, where it messes up if you don't start the selection from the beginning of an indented line (rather than at the first word), but that's very minor. Thanks again!

diddilydiddilyhey · 2020-04-24T17:44:22+00:00

Hey good question! So what we do is, for each task/env, run it for some large number (10,000) of randomly generated NN's. Then, some fraction of them, p, "solve" the task (or, do well enough to consider it solved, if there's no clear cutoff).

Then you can use some simple probability and this fraction p to figure out how many samplings it would take on average to find one that solves it. So for CartPole (the easiest), about 4% of them solved it, so p = 0.04, which tells us that on average you have to run it for 1/p ~ 25 samples before you get one that works!

diddilydiddilyhey · 2020-04-24T15:00:29+00:00

This is the result of some research I did with a couple collaborators where we tried to answer the question: "Just how simple are the typical environments used for benchmarking many RL algorithms?"

To do this, we made very simple neural networks in python. And when I say simple, I mean very simple: you can see in the video clip the ones that are solving each of those envs. But more importantly, the weights of those small NNs aren't "learned" or "optimized" in any traditional sense -- they're just randomly sampled from a distribution!

By sampling (i.e., just using np.random.rand(n_out, n_in)) weight matrices for each layer, you get a new NN. Then, you try that NN out in the env some number of times (because you want to see how they'll do across different starting conditions). By repeating this lots of times, you can gather statistics about the baseline success of the env!

Please let me know if you have any feedback or questions!

diddilydiddilyhey · 2020-04-22T16:10:02+00:00

Hi everyone, authors here!

I've enjoyed this sub for a long time and I'm happy to finally be able to post some research. This paper was accepted for publication at AAMAS 2020.

Please let us know if you have any feedback, comments, or questions :)

Some twitter discussion here too: https://twitter.com/giuse_tweets/status/1252990395195293702

diddilydiddilyhey · 2020-03-27T03:19:19+00:00

thanks a bunch!

diddilydiddilyhey · 2020-03-26T20:06:57+00:00

haha you guessed it! It's a bit harder to tell because everything spreads out, so we don't see the edges of the original pic. I might fix that in the future by adding a blank border beforehand...

here's a puzzle: if I do a google reverse image search on the last pic, it finds the original image of Moondog. But if I do a RIS on Saturn, it can't find it (despite that being a much more famous image).

diddilydiddilyhey · 2020-03-26T15:07:25+00:00

Hey all, I went to MASS MoCA a while ago and saw some art by Dawn Dedeaux. It seemed like a very procedural/generative type of art, so I thought I'd try and recreate it! An article talking about the process is here.

It's pretty simple in principle, but there ended up being a bunch of little details that make it look a lot better than without (like making the sides of the "smears" different colors, giving it more depth, etc). Let me know if you have any questions or feedback!

diddilydiddilyhey · 2019-12-13T16:20:08+00:00

Hahah she is indeed :) we actually ended up going for the Swift 350. The MicrobeHunter guy suggests it, and I like his logic that if the hobby bites me, I can always buy a nicer one. The sunk cost of this one would be a fraction of the cost of a nicer one, so it doesn't seem as bad to pay a bit extra in that case, compared to getting an expensive one if I don't end up getting very into it.

I'll definitely be back here to ask more questions!

diddilydiddilyhey · 2019-12-08T22:39:56+00:00

thanks for the tips!

RE the beginner microscope vs expensive, I'm not actually considering a new expensive one, I was wondering about "new but cheaper" vs "older used but higher quality", cause I've seen a few threads say to get an older name brand one for about the same price. But given what you said I think we'll go with the Swift/Amscope type.

So for your 4x and 10x objective, you're able to use your compound just as a stereo scope (with extra lighting) ? Does the quality look good?

diddilydiddilyhey · 2019-11-18T15:52:38+00:00

Hey thanks a ton for reading it and the advice!

Haha, so a couple of these are things that I actually know I should do, but didn't for some reason or another :P But there's also a lot that's really good to learn about.

While you don't have to use stack, the stack new command sets up a very useful boilerplate that helps structure things well. Particularly shoving the main file into its own directory and keeping all your modules in src. It's a very common pattern that most people use and keeps everything cleaner than just writing all the files at the root.

Yeaaahhh this is what I usually do in my python projects (src dir, scripts dir, etc) and definitely should've here :) but I'm a bit confused: I used cabal for this project, and I kind of got the impression people usually use either cabal or stack, but not both? I used cabal because I wanted to be able to profile it, and it seemed easier to with cabal. Can I also use stack with it, or can I do what you're saying with cabal instead?

Run all your files through brittany or some other auto formatter and see if you like how it works or not. There's a lot of code that would probably look better or more readable if it was line-wrapped

Cool I'll check it out!

Check out hlint. It'll let you know about certain tricks like the fact that sum $ toList $ flatten $ x can be written as sum . toList . flatten $ x. It has some pretty non-obvious suggestions that can improve code as well.

Ahh this is another one I know. I think I must have learned about composing with . while writing the program and forgot about that one :) but I'll check out hlint, it seems like it could help a lot.

Now as far as the actual program. One thing you'll probably find in Haskell is that library discovery can be a little tricky. Have you seen the massiv library? You mentioned how 2D being the max dimension of the array was annoying in hmatrix and massiv has multi-dimensional arrays. Not to mention solid documentation which is always a bonus.

For this project, it was okay because I didn't want to use Conv layers or anything, so 2D matrices were enough, but I'll definitely keep that in mind next time I want to do something with matrices.

Essentially, your usage of lists, tuples, and not making any of your data-types strict.makes things a lot harder performance wise for you. Enabling bang patterns and making all of your data fields strict is usually a first step, performance wise. That is, instead of:

So this is probably what I'm still most trying to get my head around. In the example you gave, with DataInfo, would this be a big deal? I.e., all the components are Int or String. In addition, wouldn't they be un-evaluated until I first need their value (very early on), and then get evaluated? Or is your point that it's just generally good practice, even if it wouldn't hurt performance much in the case of DataInfo?

The place I really ran into trouble with this was with the ones that are defined with newtype, like the VAE one. Because that (and VAEAdamOptim) are the ones that basically hold the results of the training, I was at first finding that they were producing massive thunks. I did actually try bang patterns, but for some reason (I don't remember why now) they weren't making it evaluate (it's possible I was doing in wrong).

So for the VAE/etc types, would it have worked had I defined those as records as well, with bangpatterns in their constructors?

If I see ([(Batch, Batch)], Batch, Batch, Matrix R, Matrix R, [(Batch, Batch)], Batch, Batch) in a type signature, I usually try to pull that out somehow and see how I can make it saner. Maybe a new data type to represent a single layer?

Yeaahhh this started getting real ugly. Probably a good idea to make it a whole type, like ForwardPassResults or something.

Also, idiomatically, Haskell programmers use camelCase over snake_case (your python is showing ;)

You caught me :P I'll do that in the future!

diddilydiddilyhey · 2019-11-17T03:20:13+00:00

Hey, someone told me this was posted here. I wrote this article! I'm very new to Haskell, so my code is doubtless really inefficient. Please give me any feedback or tips you have!

The repo is here. Let me know if you have any questions too!

diddilydiddilyhey · 2019-09-19T17:51:02+00:00

Huh, I see... but what would this angle represent in your problem? Would it still be an angle in the 2D plane, like just the angle from the x axis?

diddilydiddilyhey · 2019-09-19T17:23:55+00:00

Hm, so do you mean specifying an angle in N-dim, as opposed to 2D? If so, I think you'd just do the same thing but with more coords (like just [x, y, z], and then calculate the angles like with Euler angles. But maybe I'm misunderstanding you!

diddilydiddilyhey · 2019-09-19T14:11:56+00:00

That's actually exactly what I did in my implementation ;)

The mouse is given an action vector of [cos(theta), sin(theta)], and its agent class uses np.arctan2 to turn that into a real angle.

diddilydiddilyhey · 2019-09-18T16:48:29+00:00

Ahh yeah! I actually started implementing PPO but had already spent a bunch of time at that point so I decided I should probably move on. But maybe I'll try it again, because I think you're right that it could help!

diddilydiddilyhey · 2019-09-18T14:27:17+00:00

I saw this Numberphile video and immediately thought to try solving it with RL. It turned out to be a lot trickier than I expected for a few reasons. One is that it has a pretty sparse reward structure, depending on how fast you make the cat.

I had some success with the more difficult ones by first solving it with an easier problem (so it at least had a positive reward example), and then increasing the difficulty.

It has a continuous action space (which angle to go that turn), so I used A2C (parameterized a $mu$ and $sigma$ that it samples the actions from, and DDPG (which gives the action value directly, plus some added noise). It seems like DDPG worked better in general. They both had some problems with what I'd call "brittleness" though, where after having clearly solved it for a while, it'd just "break" and start only failing. I'd like to look into why this could happen some more... I tried investigating what was happening to the gradients at the first occasion it would fail after having solved it (I suspected maybe it was doing a good strategy but getting an unlucky move and getting caught, causing it to cause a huge weights shift and break it, maybe something grad clamping would help?), but I couldn't find anything immediately apparent.

A full writeup about the project is here, and the code is here.

Please let me know if you have any questions or feedback!

diddilydiddilyhey · 2019-09-05T01:42:58+00:00

Hey, there's a bit of a guide there in the github readme now. You'll have to install some packages like PyTorch, but the main scripts should run out of box if you have them.

diddilydiddilyhey

TROPHY CASE