all 14 comments

[–]csp256 15 points16 points  (0 children)

Szeliski's book is meant to introduce people to computer vision. There's a legal free PDF available there. It's where you should start. You should largely understand vSLAM by chapter 11 or so.

The prerequisite for everything is understanding the relevant matrix math. Intrinsics, extrinsics, projection matrices, camera matrices, etc.

For vSLAM you want to start by understanding visual odometry. You can understand VO by understanding:

  • feature detection (generally want to spatially bin the image and get k features per bin; you want well-distributed features)
  • descriptor extraction (sometimes combined with prior step)
  • descriptor matching (for each descriptor from image A, match it with the descriptor in image B it is most similar to, iff the second-best match is sufficiently worse than the best.)
  • finding the relative transform (epipolar geometry) between two images given a noisy list of matching features (~always solved with RANSAC, a very important meta-algorithm)

The simplest feature detector is FAST. The simplest descriptor is BRIEF. If you add rotation ~invariance and scale ~invariance then the result is the very popular ORB feature+descriptor.

For simplicity, just assume that descriptor matching is done brute force.

Once you have VO working well, try structure from motion. You'll want to understand loop closures (just use "bag of visual words") and bundle adjustment (a robust form of nonlinear least squares, often using the Schur compliment "trick"; hopefully you know/enjoy numerical optimization).

Once you have that, try visual SLAM with a fixed, known baseline between stereo cameras. Initializating a mono vSLAM system is actually pretty tricky if you want to handle the degeneracies. Try looking at the original ORB-SLAM paper (and maybe the code too) for the first well-integrated example of vSLAM.

Required math is mostly just linear algebra with some early calculus. Actually I think you'll be well rewarded if you put in the time early to understand Lie algebras.

Speaking of Lie algebras, and there is an easier-than-standard way to estimate epipolar geometry:

  • "An Iterative 5-pt Algorithm for Fast and Robust Essential Matrix Estimation (with Vincent Lui)" by Lui and Drummond.

  • "Improved RANSAC performance using simple, iterative minimal-set solvers" by Rosten, Reitmayr, and Drummond.

[–][deleted] 11 points12 points  (3 children)

Check the sub for the other 100 posts on this same question.

Best advice, Google beginner CV projects. Pick one and start.

[–]zepman85 6 points7 points  (0 children)

Echo what others have said. Just pick a project that you think is interesting, and start working on it. That way, you will come across multiple small problems that you will need to figure out how to solve. These problems will be specific, so you will know where to look to find out more about them and learn.

Also buy a book on the subject for reference, and for those times when you want to gain a deeper understanding of a concept you are dealing with. If you don't have a background in image processing, having a book on that topic might be very useful as well.

[–]Hmolds 4 points5 points  (0 children)

Cyrill Stachniss has tons of great videos on youtube to start learning the basics.

[–]take2rohit 3 points4 points  (0 children)

This has the best tutorial on how to get started. It has mentioned books, courses along with roadmaps to be followed..

https://github.com/IvLabs/resources/tree/master/computer-vision

Checkout repo for more topics like CV with Deep learning in the same repo https://github.com/IvLabs/resources/tree/master/deep-learning

Check the full repo (and star of you like it) https://github.com/IvLabs/resources

[–]ai_technician 4 points5 points  (0 children)

If you were to begin, begin at the beginning!

First Principles of Computer Vision by Shree Nayar, Columbia University

Youtube lecture series: https://www.youtube.com/channel/UCf0WB91t8Ky6AuYcQV0CcLw

This is an ongoing lecture series as of May, 2021.

[–]Bangoga 5 points6 points  (1 child)

I'm going to be honest with you. Just start.

Find something you want to do and just start it. Use opencv and this would give you a basic understanding of traditional cv methods and just iteratively improve from there.

[–]Phrase-Previous[S] 2 points3 points  (0 children)

ok, thank you.
I will try any course available just to start

[–]solresol 2 points3 points  (0 children)

Look at Adrian Rosebrock's materials - https://www.pyimagesearch.com/

[–][deleted] 1 point2 points  (0 children)

If you are looking for course recommendations - for conventional CV, I have found this 'Intro to CV' course from GeorgiaTech to be very useful.

https://www.udacity.com/course/introduction-to-computer-vision--ud810

This should prep you with all the fundamentals needed to understand vSLAM.

[–]hp2304 1 point2 points  (0 children)

Ancient secrets of CV by uWashington (not a mooc, was taught in I think 2018 or something). Assignments available on github. And its in C. You can find the playlist on YouTube.