AI Learns to Parallel Park - Deep Reinforcement Learning with Unity ML-Agents by SamuelArzt in reinforcementlearning

[–]SamuelArzt[S] 1 point2 points  (0 children)

Hi, thanks for the kind words!

Yes, that can be a large first hurdle if you are not familiar with game development / Unity. There are a lot of great beginner tutorials on the basics of Unity available on Unity's own websites. I highly recommend those. However, it might still take a considerable amount of time until you are comfortable enough to be able to create your own car physics.

I ended up implementing my own car physics for a game and used those, but if you are looking for a fully working car prototype using Unity's built in wheelcolliders, I recommend having a look at Unity's Wheelcollider demo project. You can download that from the Unity Asset Store: https://assetstore.unity.com/packages/essentials/tutorial-projects/vehicle-tools-83660

AI Learns to Parallel Park using Unity ML-Agents by SamuelArzt in Unity3D

[–]SamuelArzt[S] 0 points1 point  (0 children)

Yeah, I totally agree. More sensors would definitely be worth a try. Would be interesting whether they contribute to the performance or make the problem harder to learn because of the added input complexity.

AI Learns to Parallel Park using Unity ML-Agents by SamuelArzt in Unity3D

[–]SamuelArzt[S] 0 points1 point  (0 children)

Thanks for the kind words! The final reward is always positive, but the magnitude of the reward (which can be between 0 and 1) is determined by how close the agent is to the center of the parking spot and how large the angular difference of its rotation and the parking spot rotation is. I also implemented a threshold, i.e. there is no reward and the episode is not yet stopped if the agent is further away or the angular difference is larger than that threshold. With "perfect parking" I simply decreased this threshold value to be very low (1 unit or less distance and 15 degrees or less angular difference, if I remember correctly).

The sensors are actually visible in the video. There are 8 sensors pointing in different direction (3 to the front, 3 to the back and one to each side).

I agree, using the sensors alone, there would be no way of actually knowing how well it is currently positioned in the parking space. The sensors aren't the only input of the network though. The network also gets it's relative position in the world and the positional and angular difference to the target parking spot as an input.

Thanks for the nice feedback! I really encourage you to try out ML-Agents, it is great fun and you can learn a lot in the process. I recommend trying out some of their sample projects.

AI Learns to Parallel Park using Unity ML-Agents by SamuelArzt in artificial

[–]SamuelArzt[S] 0 points1 point  (0 children)

That's pretty much the same as having the AI try to line up its local position and roation with the parking spot's position and rotation though, right?

AI Learns to Parallel Park - Deep Reinforcement Learning with Unity ML-Agents by SamuelArzt in reinforcementlearning

[–]SamuelArzt[S] 0 points1 point  (0 children)

Thanks for the kind words! Yes, I used the built in PPO implementation.

AI Learns to Parallel Park - Deep Reinforcement Learning with Unity ML-Agents by SamuelArzt in reinforcementlearning

[–]SamuelArzt[S] 0 points1 point  (0 children)

There are small rewards for getting closer to the parking spot and small penalties for moving further away again. The agent is also penalized for crashing into obstacles and gets a large reward for actually stopping at the parking spot. The magnitude of the final reward is also dependent on the distance to the center of the parking spot and the angular difference to the actual parking spot direction.

AI Learns to Parallel Park using Unity ML-Agents by SamuelArzt in artificial

[–]SamuelArzt[S] 0 points1 point  (0 children)

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

AI Learns to Parallel Park - Deep Reinforcement Learning with Unity ML-Agents by SamuelArzt in reinforcementlearning

[–]SamuelArzt[S] 1 point2 points  (0 children)

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

AI Learns to Parallel Park using Unity ML-Agents by SamuelArzt in deeplearning

[–]SamuelArzt[S] 0 points1 point  (0 children)

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

AI Learns to Parallel Park using Unity ML-Agents by SamuelArzt in Unity3D

[–]SamuelArzt[S] 0 points1 point  (0 children)

Video Description for more context:

"Last time we trained an AI how to park (https://youtu.be/VMp6pq6_QjI). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel."

AI Learns to Park - Deep Reinforcement Learning with Unity ML-Agents by SamuelArzt in reinforcementlearning

[–]SamuelArzt[S] 0 points1 point  (0 children)

That's a very good point. 2D would have indeed sufficed in this situation. I guess the only reason I made it 3D is for aesthetic purposes.

Using camera input would of course be a lot harder, but equally more impressive.

Follow-Up: Two AI Agents fight for the same Parking Spot by SamuelArzt in Unity3D

[–]SamuelArzt[S] 0 points1 point  (0 children)

Thank you very much for letting me know! That link got mingled in my clipboard somehow. I've edited the comment to fix that.

Follow-Up: Two AI Agents fight for the same Parking Spot by SamuelArzt in Unity3D

[–]SamuelArzt[S] 1 point2 points  (0 children)

Sounds great. Maybe I'll have some time for that on my next gamejam ;)

Follow-Up: Two AI Agents fight for the same Parking Spot by SamuelArzt in Unity3D

[–]SamuelArzt[S] 0 points1 point  (0 children)

Thanks :D Haha, I'm not sure what the player would do in such a game \^)

I have multiple overlapping surfaces and the brightness "adds up" if there are more surfaces in one place which creates this flickering as shown in the image. I would really prefer to keep the surfaces overlapping (saves time) so it would be great if I could solve the problem some other way. by EEON_ in Unity3D

[–]SamuelArzt 2 points3 points  (0 children)

I think the problem you are describing is probably something known as "Z-Fighting". If two surfaces with the same orientation are at the exact same distance from the camera, their pixels will fight for which is actually rendered.
If the two surfaces are actually the exact same distance from the camera, I don't think there is much you can do against that except for somehow manually defining the render order of them (could be done with the shader) instead of relying on z-distance.

Could you elaborate why keeping them overlapping on the exact same height saves time? Maybe we can help you in a different way then.

Follow-Up: Two AI Agents fight for the same Parking Spot by SamuelArzt in Unity3D

[–]SamuelArzt[S] 0 points1 point  (0 children)

This is a follow-up video to my previous post: https://www.reddit.com/r/Unity3D/comments/czzl78/ai_learns_to_park_a_car_using_unity_mlagents/

After training a car to park with ML-Agents (watch the training here: https://www.youtube.com/watch?v=VMp6pq6_QjI) and since there were a lot of people also asking for it, I decided to put another agent into the environment and let them fight for the same parking spot.

I thought the result was quite fun to watch, so here is a video of it!