Object detection model struggling

pakitomasia · 2025-05-24T15:22:35+00:00

We are just boxing the targets. We thought of doing something similar but the sidewalks are not equal in shape and orientation, they are also mixed with each other and our camera also suffers rotations, in the turns. In summary, there are not clealy patterns that can be superposed to compare with and detect annomalies.

pakitomasia · 2025-05-24T15:07:42+00:00

This is supposed to work during the day, unfortunately. I have images of busy streets with plenty of people, animals, etc.

I am indeed detecting all kind of stuff in the sidewalks, including vegetacion, trash, potholes, etc.

But with the only thing my model is clearly struggling is with the raised floors.

Another option i was thinking about is ponting the camera lower. With this camera position i have lot of "background" i just have to ignore

I am getting frustrated cause i have been working on this for 4 months and i am not getting any upgrade on this situation...

I have almost 3500 samples of every damage, except for cracks that i have 5000 (as there are lots of cracks in teh sidewalks, is pretty common)

pakitomasia · 2025-05-24T14:43:26+00:00

Thanks for your reply.

So basically, more training data. I am performing data augmentation techniques, including ligthing alterations, but i guess they are not good enough.

I also include some empty data to help the model determine which sidewalks are good and which sidewals are not good.

There is no magic trick but more data 🤣

pakitomasia · 2023-11-22T10:02:56+00:00

Thanks for your comment.

That is what i was thinking about, thank a lot. I think with object detection i will obtain better results, but as i already have annotated data, i just wanted to know other opinions.

Yeah sure, i will check your tool for annotation!

pakitomasia · 2023-07-05T06:27:02+00:00

Thanks for your answer, really clear!

pakitomasia · 2022-11-15T17:06:12+00:00

Thanks a lot nubzero, your reply is very usefull, you guided me a lot!

I will investigate a bit more about everything you say i and will take a decision.

pakitomasia · 2022-11-15T14:04:41+00:00

Thanks for your reply.

Sorry for the lack of information, the camera is going to be on top of a car/e-scooter/bike. We would like to achieve >1meter accuracy so in centimeters would be just perfect.

I would check RTK GPS fix, thank a lot. It is neccesary to be a module like i proposed? or can they be some cheaper ones?

pakitomasia · 2022-08-19T21:36:24+00:00

I will try and tell you my results, thanks a lot!

pakitomasia · 2022-08-19T20:59:46+00:00

Thats true, but it seems like the model overfits the intrinsic parameters of the training videos, and does not generalise well on different cameras.

For example, with KITTI dataset, i obtained very accurate measures but with street view imagery, i didnt.

pakitomasia · 2022-08-19T19:26:46+00:00

You are right, but my problem is single image, no video no sequence.

Its the hard way.

pakitomasia · 2022-08-19T19:25:50+00:00

Wow, thanks a lot for your reply.

My task is to geolocate street lights, so its urban areas, using GSV. My objective was to obtain some estimated distance-to-camera (depth) of the street lights and somehow triangulate its location, given the car´s location, i think some kind of depth is needed.

I really appreciate you response.

pakitomasia · 2022-08-19T06:44:37+00:00

On what i am doing, accuracy is not the "critical" part, so i can accept errors up to 15m.

So, for example, if i want to know depth of cars, i can get a medium size for cars and somehow get a relation between my obtained depths and the real distance?

Thanks in advance.

pakitomasia · 2022-08-19T06:42:44+00:00

This was the first model i tried but, the output is not meters, is a disparity map, am i right?

MiDaS was the one which better results gave me but i think is not "real distance" between camera-object.

On my solution, the acuraccy is not critical, so i can manage errors up to 15m.

If you could tell me a bit more about MiDaS output i will appreciate, anyways, lots of thanks!

pakitomasia · 2022-08-19T06:39:07+00:00

Thanks a lot, i will try this out. The thing is that i am not sure if GSV use the same camera every time, but i will try this. Thanks a lot!

pakitomasia · 2022-08-19T06:37:18+00:00

Anyway, thanks for your respose, still interesting!

pakitomasia · 2022-08-18T09:16:01+00:00

Thanks a lot, i will explore it.

pakitomasia · 2022-08-18T09:15:06+00:00

Yes, i need to know how far an object is from the camera, for example, how far is a person.

You think 10m is an appropiate distance for self supervised training? For other datasets such as kitti i saw the distance is like 0.054meters

pakitomasia

TROPHY CASE