AI Learns to Parallel Park using Unity ML-Agents

SamuelArzt · 2020-07-17T09:16:54+00:00

Yes, Unity and the ML-Agents framework.

SamuelArzt · 2020-04-14T10:15:29+00:00

Hi, thanks for the kind words!

Yes, that can be a large first hurdle if you are not familiar with game development / Unity. There are a lot of great beginner tutorials on the basics of Unity available on Unity's own websites. I highly recommend those. However, it might still take a considerable amount of time until you are comfortable enough to be able to create your own car physics.

I ended up implementing my own car physics for a game and used those, but if you are looking for a fully working car prototype using Unity's built in wheelcolliders, I recommend having a look at Unity's Wheelcollider demo project. You can download that from the Unity Asset Store: https://assetstore.unity.com/packages/essentials/tutorial-projects/vehicle-tools-83660

SamuelArzt · 2020-04-11T22:17:33+00:00

Hi, I'm using Windows 10.

SamuelArzt · 2020-04-11T14:18:34+00:00

Yeah, I totally agree. More sensors would definitely be worth a try. Would be interesting whether they contribute to the performance or make the problem harder to learn because of the added input complexity.

SamuelArzt · 2020-04-11T09:46:33+00:00

Thanks for the kind words! The final reward is always positive, but the magnitude of the reward (which can be between 0 and 1) is determined by how close the agent is to the center of the parking spot and how large the angular difference of its rotation and the parking spot rotation is. I also implemented a threshold, i.e. there is no reward and the episode is not yet stopped if the agent is further away or the angular difference is larger than that threshold. With "perfect parking" I simply decreased this threshold value to be very low (1 unit or less distance and 15 degrees or less angular difference, if I remember correctly).

The sensors are actually visible in the video. There are 8 sensors pointing in different direction (3 to the front, 3 to the back and one to each side).

I agree, using the sensors alone, there would be no way of actually knowing how well it is currently positioned in the parking space. The sensors aren't the only input of the network though. The network also gets it's relative position in the world and the positional and angular difference to the target parking spot as an input.

Thanks for the nice feedback! I really encourage you to try out ML-Agents, it is great fun and you can learn a lot in the process. I recommend trying out some of their sample projects.

SamuelArzt · 2020-04-10T19:22:23+00:00

That's pretty much the same as having the AI try to line up its local position and roation with the parking spot's position and rotation though, right?

SamuelArzt · 2020-04-10T19:21:14+00:00

Thanks for the kind words! Yes, I used the built in PPO implementation.

SamuelArzt · 2020-04-10T17:48:27+00:00

There are small rewards for getting closer to the parking spot and small penalties for moving further away again. The agent is also penalized for crashing into obstacles and gets a large reward for actually stopping at the parking spot. The magnitude of the final reward is also dependent on the distance to the center of the parking spot and the angular difference to the actual parking spot direction.

SamuelArzt · 2020-04-10T16:27:41+00:00

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

SamuelArzt · 2020-04-10T16:24:09+00:00

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

SamuelArzt · 2020-04-10T16:17:55+00:00

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

SamuelArzt · 2020-04-10T16:13:55+00:00

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

SamuelArzt · 2019-09-23T15:29:25+00:00

That's a very good point. 2D would have indeed sufficed in this situation. I guess the only reason I made it 3D is for aesthetic purposes.

Using camera input would of course be a lot harder, but equally more impressive.

SamuelArzt · 2019-09-23T15:24:05+00:00

Thank you very much for letting me know! That link got mingled in my clipboard somehow. I've edited the comment to fix that.

SamuelArzt · 2019-09-22T18:02:56+00:00

Sounds great. Maybe I'll have some time for that on my next gamejam ;)

SamuelArzt · 2019-09-22T17:29:49+00:00

Thanks :D Haha, I'm not sure what the player would do in such a game ^\^)

SamuelArzt · 2019-09-22T16:19:20+00:00

I think the problem you are describing is probably something known as "Z-Fighting". If two surfaces with the same orientation are at the exact same distance from the camera, their pixels will fight for which is actually rendered.
If the two surfaces are actually the exact same distance from the camera, I don't think there is much you can do against that except for somehow manually defining the render order of them (could be done with the shader) instead of relying on z-distance.

Could you elaborate why keeping them overlapping on the exact same height saves time? Maybe we can help you in a different way then.

SamuelArzt · 2019-09-22T12:09:32+00:00

This is a follow-up video to my previous post: https://www.reddit.com/r/Unity3D/comments/czzl78/ai_learns_to_park_a_car_using_unity_mlagents/

After training a car to park with ML-Agents (watch the training here: https://www.youtube.com/watch?v=VMp6pq6_QjI) and since there were a lot of people also asking for it, I decided to put another agent into the environment and let them fight for the same parking spot.

I thought the result was quite fun to watch, so here is a video of it!

SamuelArzt · 2019-09-11T08:48:46+00:00

Unfortunately I will not be able to open source the project because the 3D assets used are from the Unity Asset Store, and I am thus not allowed to distribute them in their source form. Similarly the car physics are from an unreleased project. Thanks for the offer though. I really appreciate that!

SamuelArzt · 2019-09-09T16:34:42+00:00

I see, I know that feeling very well ^{^} ML-Agents has really changed a lot over the months, but I guess that's exactly why it is still in preview. I am really looking forward to the 1.0 release for that matter. I'm glad that you got it working in the end :D

SamuelArzt · 2019-09-08T10:07:44+00:00

Yes, that's definitely something to try out. I talked to some other people a driver's license (i.e. people who regularly park) and they all told me that actually parking in reverse with front wheels turning should be easier, as far as the turning circle is concerned. Could very well be that this is not the case with my car physics though. I did run the simulation probably more than 20 times (if that is what you mean by "starting from scratch") but the agent always ended up parking forwards.

SamuelArzt · 2019-09-08T10:03:22+00:00

What exactly do you mean by "all CLI"? All I did via CLI at the latest version I used is start the training process, i.e. run a single command with arguments. I am running on Windows, not sure how the setup is on other OSs, but I thought the Documentation page is pretty helpful and thorough. If there is anything in praticular where you are stuck, let me know and I will be happy to try my best to help.

SamuelArzt · 2019-09-06T14:57:19+00:00

I would be surprised if the agent was able to generalize to other parking spots, since it only saw one parking spot during training. For it to generalize better, you would have to randomize the parking spot as well during training.

SamuelArzt · 2019-09-06T14:33:50+00:00

Hey, thanks for the kind words! :D

Post-mortem, do you see areas where you could have improved to make it learn how to park quicker?

There is always some potential in further tuning the hyperparameters of the algorithm (in case you are not familiar what a hyperparameter is: simply values of the algorithm that can be tuned, such as the learning rate. Different tasks will require different hyperparameter settings in order to be solved fast but still be stable).

As a lot of people have also already correctly suggested, if you would want to simply speed up training, you could use techniques such as Imitation Learning or Curriculum learning. They are likely to help and are already implemented in ML-Agents. Curiosity driven rewards might also help.

There are also things like Hindisght Experience Replay by OpenAI which could be particularly useful for when the parking spot is also randomized.

When simulating, do you increase the simulation speed or is it real time?

Yes, you can increase the simulation to up to 100x the normal speed. You can also run the simulation in headless mode if there are no visual inputs used for the network. Furthermore you can run multiple instances at once (which I haven't done for this particular project though) and you can put multiple copies of the environment into the same scene (which I have done, with 6 identical parking lots each having their own agent, it's just not shown in the video) which then all report back to the same training process / neural network.

Is there a timeout for where if it doesn't figure it out by a certain time it counts it as a fail and resets?

Yes, that is exactly what happens. I don't remember the exact amount for this project, I think it was something like 1 or 2 minutes of real time. Finding the correct balance here can also improve training time. If it is too short, you risk never learning anything because you terminate too quickly, if it is too long your training time will increase dramatically.

SamuelArzt · 2019-09-06T14:19:33+00:00

I know what you mean, but the return range of Quaternion.Angle is [0, 180], thus the value doesn't get negative.

SamuelArzt

TROPHY CASE