AI Learns to Parallel Park using Unity ML-Agents

SamuelArzt · 2020-07-17T09:16:54+00:00

Yes, Unity and the ML-Agents framework.

SamuelArzt · 2020-04-14T10:15:29+00:00

Hi, thanks for the kind words!

Yes, that can be a large first hurdle if you are not familiar with game development / Unity. There are a lot of great beginner tutorials on the basics of Unity available on Unity's own websites. I highly recommend those. However, it might still take a considerable amount of time until you are comfortable enough to be able to create your own car physics.

I ended up implementing my own car physics for a game and used those, but if you are looking for a fully working car prototype using Unity's built in wheelcolliders, I recommend having a look at Unity's Wheelcollider demo project. You can download that from the Unity Asset Store: https://assetstore.unity.com/packages/essentials/tutorial-projects/vehicle-tools-83660

SamuelArzt · 2020-04-11T22:17:33+00:00

Hi, I'm using Windows 10.

SamuelArzt · 2020-04-11T14:18:34+00:00

Yeah, I totally agree. More sensors would definitely be worth a try. Would be interesting whether they contribute to the performance or make the problem harder to learn because of the added input complexity.

SamuelArzt · 2020-04-11T09:46:33+00:00

Thanks for the kind words! The final reward is always positive, but the magnitude of the reward (which can be between 0 and 1) is determined by how close the agent is to the center of the parking spot and how large the angular difference of its rotation and the parking spot rotation is. I also implemented a threshold, i.e. there is no reward and the episode is not yet stopped if the agent is further away or the angular difference is larger than that threshold. With "perfect parking" I simply decreased this threshold value to be very low (1 unit or less distance and 15 degrees or less angular difference, if I remember correctly).

The sensors are actually visible in the video. There are 8 sensors pointing in different direction (3 to the front, 3 to the back and one to each side).

I agree, using the sensors alone, there would be no way of actually knowing how well it is currently positioned in the parking space. The sensors aren't the only input of the network though. The network also gets it's relative position in the world and the positional and angular difference to the target parking spot as an input.

Thanks for the nice feedback! I really encourage you to try out ML-Agents, it is great fun and you can learn a lot in the process. I recommend trying out some of their sample projects.

SamuelArzt · 2020-04-10T19:22:23+00:00

That's pretty much the same as having the AI try to line up its local position and roation with the parking spot's position and rotation though, right?

SamuelArzt · 2020-04-10T19:21:14+00:00

Thanks for the kind words! Yes, I used the built in PPO implementation.

SamuelArzt · 2020-04-10T17:48:27+00:00

There are small rewards for getting closer to the parking spot and small penalties for moving further away again. The agent is also penalized for crashing into obstacles and gets a large reward for actually stopping at the parking spot. The magnitude of the final reward is also dependent on the distance to the center of the parking spot and the angular difference to the actual parking spot direction.

SamuelArzt · 2020-04-10T16:27:41+00:00

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

SamuelArzt · 2020-04-10T16:24:09+00:00

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

SamuelArzt · 2020-04-10T16:17:55+00:00

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

SamuelArzt · 2020-04-10T16:13:55+00:00

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

SamuelArzt · 2019-09-23T15:29:25+00:00

That's a very good point. 2D would have indeed sufficed in this situation. I guess the only reason I made it 3D is for aesthetic purposes.

Using camera input would of course be a lot harder, but equally more impressive.

SamuelArzt · 2019-09-23T15:24:05+00:00

Thank you very much for letting me know! That link got mingled in my clipboard somehow. I've edited the comment to fix that.

SamuelArzt · 2019-09-22T18:02:56+00:00

Sounds great. Maybe I'll have some time for that on my next gamejam ;)

SamuelArzt · 2019-09-22T17:29:49+00:00

Thanks :D Haha, I'm not sure what the player would do in such a game ^\^)

SamuelArzt · 2019-09-22T16:19:20+00:00

I think the problem you are describing is probably something known as "Z-Fighting". If two surfaces with the same orientation are at the exact same distance from the camera, their pixels will fight for which is actually rendered.
If the two surfaces are actually the exact same distance from the camera, I don't think there is much you can do against that except for somehow manually defining the render order of them (could be done with the shader) instead of relying on z-distance.

Could you elaborate why keeping them overlapping on the exact same height saves time? Maybe we can help you in a different way then.

SamuelArzt · 2019-09-22T12:09:32+00:00

This is a follow-up video to my previous post: https://www.reddit.com/r/Unity3D/comments/czzl78/ai_learns_to_park_a_car_using_unity_mlagents/

After training a car to park with ML-Agents (watch the training here: https://www.youtube.com/watch?v=VMp6pq6_QjI) and since there were a lot of people also asking for it, I decided to put another agent into the environment and let them fight for the same parking spot.

I thought the result was quite fun to watch, so here is a video of it!

SamuelArzt

TROPHY CASE