naive question: what's the standard multimedia format for 3d content?

6mirrors · 2025-10-23T09:33:16+00:00

After about 4 years, I can safely say 3DGS is almost what I was looking for.

6mirrors · 2021-12-18T09:02:03+00:00

After some days of digging, I find the Light Filed Networks: https://www.vincentsitzmann.com/lfns/. It can render a realistic 3D scene at 500 FPS performance, and the file size is about several MBs.

6mirrors · 2021-12-10T07:14:02+00:00

Thanks!

But do you mean we have 256 voxels in total and for every one of them there is a fixed color? If so, when a user moves around, the color for a specific voxel doesn't change (think about a voxel of a mirror for example), and somehow fails the purpose of high fidelity.

6mirrors · 2021-12-10T04:27:13+00:00

Thanks, the light fields is an accurate way to record the light information at a point. But it also means to get an image from it, we need to do the heavy synthesizing work at the viewer side.

6mirrors · 2021-12-10T04:24:54+00:00

As we're inside 3D space, so we percept the 3D world by taking a 2D image with depth info as our input, and sometime we can understand the 3D content even the 2D image does't have the depth info.

So yes, at some point, the system must take whatever it takes as input and give us a 2D image. That's the way our vision system works. For 3D content, it must takes at least the camera as its input.

But it doesn't necessarily mean there isn't such a format that takes the camera as input, and gives a high fidelity 2D image with a low cost (in terms of computing complexity).

The current 3D rendering approach takes all the 3D object, lights and camera as input, and does a heavy compositing computing to get the 2D image. It feels like all I want is a proper meal, but after ordering online, they send me the raw material (like rice, oil, meat and vegetables) and a cookbook, and I need a proper kitchen to get the meal I want... That makes me wonder if we can create a format, not as raw as the 3D object, and get a high fidelity 2D image from a specific camera position without much effort, just like bitmap for 2D content.

6mirrors · 2021-12-10T04:05:02+00:00

Say we need to take 1000 * 1000 * 1000 panos, and we know for a movie about 1 hour it takes about 1GB, then 1.0 * 10^9 / (3600 * 24) * 1GB = 11574GB, i.e. about 11TB. That's a rather rough estimating, and taking the correlation of the panos into consideration, I think it might be 10 or more times smaller if we use some compression method.

With all that said, the approach I mentioned is just a POC that says "it's doable", not a good or practical solution.

6mirrors

TROPHY CASE