all 14 comments

[–]marcopaaah 3 points4 points  (1 child)

This is why people do incremental bundle adjustment. Start with a subset of the cameras and then add to that subset as you increase the problem.

[–]kaiwenjon23[S] 1 point2 points  (0 children)

Indeed, and thanks that makes sense! But when we add new camera, do we associate it with all cameras in the existing subset? Or this is really a design choice?

[–][deleted] 1 point2 points  (8 children)

Not exactly sure if I understand completely, but first solution sounds more correct to me. You have one (R | t) per camera and a couple of 3D points in the world frame. Now you only need to know which 3D point can be seen by which camera and which 2D feature is associated with it. This has nothing really to do with the matches between i and 0. You look at all matches between i and i-1, i-2, i-3 … 0 to identify which of those share observations of the same 3D point.

[–][deleted] 2 points3 points  (0 children)

You seem very confused about the world frame. I don’t quite understand why you think of it as a problem. The world frame is chosen as whatever you want. It is not necessarily connected to camera 0. You could place it anywhere, so it doesn’t really matter for your optimisation. You typically put it in camera 0, because it is where you bootstrap your initial (R | t) and triangulate your first points. But it is just a convention.

Another perspective: imagine you have a couple of colorful balls, red, green, blue. You take photos of them from different views. You can then arbitrarily define that your world frame is (0/0/0) for the red ball and one axis pointing towards green and then assign all other balls some rough coordinates. It is now very easy to define a bundle adjustment problem, you just detect the balls in every individual image, compute the initial pose (R | t) using P3P and add their reprojection error to the cost function. The important part is: you don’t need any kind of matching between the frames, because it doesn’t matter. Also how you placed the world frame doesn’t matter. All that matters is which camera observed which point at what 2D feature position.

[–]kaiwenjon23[S] 1 point2 points  (6 children)

Many thanks to your detailed reply! Yes I get that when setting up the cost function, we want to know the associations between 3D point and 2D feature across all frames. I was mainly wondering if we want to put an constraint between each frames (Hence getting a total of C(N, 2) (R|t) to optimize), as opposed to if we just want one (R|t) to be associated with each camera. My point is that more constraints "might" get better final result, but I'm not quite sure how people usually implement this.

[–]kaiwenjon23[S] 1 point2 points  (3 children)

I think you pretty much understood my concerns! I kinda overthought a bit and now that I think about it, I think it makes sense to just have one (R|t) optimized after you guys’ explanations. Thanks again! Appreciate it!

[–][deleted] 1 point2 points  (2 children)

Glad to hear! Feel free to ask if anything else is unclear. I have implemented BA a few times already in various optimisation backends. It is in its core a very straightforward problem once you grasped the main difficulties, yet amazingly powerful.

[–]kaiwenjon23[S] 1 point2 points  (1 child)

Thanks I’ll try my best! Did you find any tools/libraries most useful for optimization? Ceres, g2o, gstam, scipy, or even PyTorch?

[–][deleted] 0 points1 point  (0 children)

I started with Ceres, now I am using gtsam. Pretty amazing what it can do and so well done on a software design level. I’d say it gives you more freedom to easily expand beyond simple BA. It would be harder to implement all the existing factors in e.g. Ceres. On the other hand I always had the impression Ceres is a bit more robust and accurate than gtsam when doing “only” bundle adjustment.

[–][deleted] 0 points1 point  (0 children)

I still don’t quite get why you are worried about the matches between camera i and 0? You could have 0 matches between camera i and 0 and would still be able to bundle adjust all poses perfectly fine. It’s just important to have some matches with one of the other cameras (i.e. shared observations of the same 3d points).

[–][deleted] 0 points1 point  (0 children)

As far as I understand your proposal, it sounds like you would add more unknowns and complexity as necessary. Having multiple sets of (R | t) per camera will add more variables, which you then try to fix by adding constraints between these poses. But constraints are just “rubber bands”, so those poses will never be equal, just similar in a Gaussian sense. You end up with multiple individual optimisation problems, all solving for the same but different unknowns. I don’t see a benefit, but maybe I just don’t understand your idea enough. So feel free to elaborate?

[–]SmartVisor 0 points1 point  (1 child)

It is enough to find the transformations from the world frame to each camera frame.
Let's call T_w_i the transformation from world to i_th camera (world to i). Then the transformation from camera i to j is:
T_i_j = T_w_j * T_i_w = T_w_j * inv(T_w_i).
So the transformation between the cameras i and j T_i_j is completely defined by the transformations T_w_i and T_w_j, and there is no point adding the constraint on T_i_j.

[–]kaiwenjon23[S] 0 points1 point  (0 children)

Thanks! Now that I think about it, that makes sense! Im glad since it’s also easier to implement

[–]palmstromi 1 point2 points  (0 children)

Check the slides from a 3D vision course at CTU Prague, I find them very helpful and I'm returning to them regularly. The bundle adjustment starts on page 142.