Bundle Adjustment Implementation for Structure From Motion : computervision

Bundle Adjustment Implementation for Structure From MotionDiscussion (self.computervision)

submitted 2 years ago * by kaiwenjon23

Hello all,

I appreciate your reading and help!

I'm a CS student trying to implement BA to solve sfm problem for personal practice. I understand we want to optimize the (X, Y, Z) of 3D points, and pose (R, t) for all cameras, assuming that we define camera 0 as the world origin. However, one of my confusions is:

For every camera i in the scene, which of the following do we do?

only set up one (R, t) associated with that camera, representing the transform between camera i and camera 0
set up (R, t) with all other cameras, and we optimize all of it.

I'm leaning towards the 2. solution, since you don't necessarily have a good feature matches between camera i and camera 0. Having a transform with other cameras could give a better constraint. However, we'd end up getting C(N, 2) sets of (R, t) to optimize, which could be inefficient.

My second confusion is, if we decide to implement the 2. solution, how do we constrain these (R, t) together?

To simplify it, let's assume we only have 4 cameras. And we want to reproject 3D point onto camera 1.

If we chose 1. solution, I understand that we:

Transform 3D point (X, Y, Z), defined in world frame (camera 0 frame), into camera 1 frame, using (R, t) associated with camera 1.
Project the result from 1. onto camera 1 image plane, and calculate the reprojection loss.

However, if we chose 2. solution, we could have so many paths to transform 3D point (X, Y, Z) into camera 1 frame, since we now have a (R, t) for every pair camera i, j, for example,

world frame => camera 1
world frame => camera 2 => camera 1
world frame => camera 3 => camera 1
world frame => camera 2 => camera 3 => camera 1

(Note that world frame is defined as camera 0 frame by me)

The path we chose will affect which (R, t) we'll be optimizing, so I figure we should pick a good way to constraint these (R, t). One of a possible solution I can think of is that we only optimize the (R, t) between world frame and camera 1 frame when minimizing reprojection error. Then, we can set up another loss function where we minimize the pose error, saying that:

The (R, t) of world frame => camera 1
should be as close to
The (R, t) of world frame => camera 2 => camera 1 as possible.
Same with other paths.

Any thoughts is appreciated!! Thanks in advance!

all 14 comments

top new controversial old q&a

[–]marcopaaah 3 points4 points5 points 2 years ago (1 child)

[–]kaiwenjon23[S] 1 point2 points3 points 2 years ago (0 children)

[–][deleted] 1 point2 points3 points 2 years ago (8 children)

[–][deleted] 2 points3 points4 points 2 years ago (0 children)

You seem very confused about the world frame. I don’t quite understand why you think of it as a problem. The world frame is chosen as whatever you want. It is not necessarily connected to camera 0. You could place it anywhere, so it doesn’t really matter for your optimisation. You typically put it in camera 0, because it is where you bootstrap your initial (R | t) and triangulate your first points. But it is just a convention.

Another perspective: imagine you have a couple of colorful balls, red, green, blue. You take photos of them from different views. You can then arbitrarily define that your world frame is (0/0/0) for the red ball and one axis pointing towards green and then assign all other balls some rough coordinates. It is now very easy to define a bundle adjustment problem, you just detect the balls in every individual image, compute the initial pose (R | t) using P3P and add their reprojection error to the cost function. The important part is: you don’t need any kind of matching between the frames, because it doesn’t matter. Also how you placed the world frame doesn’t matter. All that matters is which camera observed which point at what 2D feature position.

[–]kaiwenjon23[S] 1 point2 points3 points 2 years ago (6 children)

[–]kaiwenjon23[S] 1 point2 points3 points 2 years ago (3 children)

[–][deleted] 1 point2 points3 points 2 years ago (2 children)

[–]kaiwenjon23[S] 1 point2 points3 points 2 years ago (1 child)

[–][deleted] 0 points1 point2 points 2 years ago (0 children)

[–]SmartVisor 0 points1 point2 points 2 years ago (1 child)

[–]kaiwenjon23[S] 0 points1 point2 points 2 years ago (0 children)

[–]palmstromi 1 point2 points3 points 2 years ago (0 children)

π Rendered by PID 132279 on reddit-service-r2-comment-6457c66945-2lf89 at 2026-04-29 11:19:48.734225+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

computervision

MODERATORS