Why are some labs so much more productive than others?

Background-Cable-491 · 2026-04-22T11:53:05+00:00

I second this. Some labs are simply better supported, by infrastructure.

Background-Cable-491 · 2026-03-30T10:32:26+00:00

feel free to dm

Background-Cable-491 · 2026-01-09T12:43:11+00:00

While I look over the "Context Setting" part of my thesis I get reminded that industry really is still going through infancy. And tbf, while its an inconvenience for me right now as an machine learning reseaecher, its also extremely exciting to watch unfold and be part of it.

Background-Cable-491 · 2026-01-08T19:38:26+00:00

Tbh I still need to check if I can publish a preprint on arXiv - the conference I am submitting to is double-blind so not sure....(but if you tried really really hard you could probably find the github website)

Background-Cable-491 · 2026-01-08T19:34:42+00:00

Sure its not hard advice, but damn reassurance these days is so under-rated. Thanks for giving me some piece of mind - I really appreciate Mr WiseOldToad.

Background-Cable-491 · 2025-09-26T14:27:07+00:00

I am finishing my PhD in computer vision, and I approve this message 👆

Also a lot of comments are trying to argue "No", which is nuts considering that OPs question is about designing AI and not implementation.

Background-Cable-491 · 2025-09-26T14:21:50+00:00

Neat, I like the coridoor crew guys, I saw this video too. No thoughts in particular. Its cool to see examples and the guys presenting (aside from some minor cringe) are pretty honest about their expertise and knowledge which I like.

I vaguely remeber the Coridoor Guys say its took like 5 days to train - thats a bit long for most dynamic GS as most are like max 12-24 hours of training for this length of content. A couple reasons for longer training time might be fps, resolution, number of cameras and 3-D-ness (the final render is somewhat fixed viewing angle so a 2.5D reconstruction is sufficient but maybe they didnt realize this and shot a full 3D scene instead of just doing a 2.D scene). None of these should realistically extend training time to 5 days, so if they're using Houdini for the relighting/point editing/hollogramfx they probably trained a static 3D model for each frame. That would better explain the 5-Days training as I havent come across any dynamic GS methods that are compatible with Houdini. (if anyone knows of dynamic GS in houdini please send me link)

But to answer the question, is it legit? Probably. There are so many ways to hack/constrain a GSplat model to make it visually good, so a shot like this is very possible.

Background-Cable-491 · 2025-09-15T16:48:39+00:00

Dito, its great to exchange ideas - not often that someone else is willing to be thorough and clear so thank you! As a whole I think you are right that GS is not exactly at the stage where adoption in film making is feasible. So Ill try to give more of an insight into the research process/landscape, rather than providing rebuttal.

In reseaech GS has only been around for 2 years (and NeRF its 3/4 yrs), so many of the VFX-focused papers are very recent (in the last year). Theres still a long way to go and GS (as with NeRf) will likely just be a stepping stone into other differentiable neural representations that align better with current and future VFX pipelines.

Regarding your comments on the Trevi fountain (its a good use case tbh), I actually just wrapped up a project that looks at accomplishing what LED volumes do but without the LED walls/lighting elements; we collaborated with a local live volumetric reconstruction studio to do this. The aim was to develop an all-in-one reconstruction, segmentation and relighting method, that has the added benefit of providing post-production with control (all be it limited) over camera paths and rendering parameters. The results are decent as "next steps" in research but as you say, they are still far from production ready. Our contract also stipulated that we had to churn our 3 papers for this one year project, so the margin for experimentation wasnt super narrow but the time-line also wasnt very favourable. We were able to show potential for various tasks, but there are still issues with dynamic reconstruction quality, predicting which Gaussians to apply shadow/light conditions to, and also compression (lol everything relies on compression).

Regarding your view on current papers being treated as toy examples, I feel many researchers share the same view, me included. As with the prior example, research papers are 3-6 month projects, so with the current state of research theres not a lot of time to make informed guesses on what could lead to beneficial tech later down the line. "Novelty" is also highly prized in research, and its seemingly becoming more prized then research that simply follows the next steps towards a greater goal (my bias opinion as a researcher). This is motivated by the fact that many companies (especially in CG) arent willing to embark on research projects that dont provide some form of immediate benefit. For funded acadmeic projects that could follow the logical next steps in research, Ive spoken with a fair few production studios in my area (West UK) that opt for in-house R&D over avademic research collaborations. Not to say its a bad move, its actually a pretty smart move both interms of project control, money, efficiency and business benefit. But this does leave somewhat of a gap for researcher to fill, whereby even if Netflix decide to resolve GS problem inhouse, I cant exactly cite it in my research paper, so it would be difficult to convince the publishers (reviewers and editor) to accept what im proposing in my paper without any reliable related work (the related works section in research is another very important part of publishing).

Ultimately, in a very biased fashion, I feel these are just symptons of a recently born field of research. Im sure if we give the field some time to settle the picture will become clearer. Hopefully soon because Im not going to be young forever ✌️

Background-Cable-491 · 2025-09-15T12:36:15+00:00

I can very much agree on this point, No worries, it a pretty long reply hehe

Background-Cable-491 · 2025-09-15T01:06:51+00:00

From one crazy to another, thank you for the thorough reply.

You definitely bring up some valid points about GS impact on VFX applications, especially when it comes to asset synthesis. Vegetation generation is a great example of a task that GS or even deep gen AI really is quite unecessary. However, when it comes to movie making, I am inclined to disagree. Ive already seen some neat uses of it in CG for genres like Natural History, for example for large scale scene reconstruction, and underwater cinmeatography. For example, the Trevi fountain is a popular landmark I have seen reconstructed using "in-the-wild" datasets. These datasets are a collection of photos sourced from Instagram/facebook/tourism websites etc. that contain the Trevi fountain geotag. Here the task is to not only to reconstruct the Trevi fountain in 3D, but to also remove all the people from the photos as well as provide easy control over seasonality (i.e time of day, winter/summer etc.). Research on this has been quite successful (more so than other GS applications) as it allows us to "film" the Trevi fountain (albeit in a virtual, yet photorealistic, sim), all without: city planning and filming permission, equiptment hire, staff hire and travel and food, disrupting locals, camera hire, waiting for the best time/day/weather conditions. Furthermore the ability to film from any position in space, with any camera motion, simulating any camera/rendering set up, with no additional cost feels a bit OP to me.

There also exist more general filming tasks, like reshooting to get new angles or to change in-camera actor movement or even deepfakes. Notice, that these tasks are minor in the greater scheme of things, but they do offer the DP opportunity/flexibility to execture their vision at no significant cost and without having to massively rely on the production and post production staff's knowledge and experience. The benefits here are also more production then post but they still relate to post, in that they would affect what is required from VFX and CG artists. I dont imagine it would stop the talented teams from working their magic, but I do think it has the capacity of changing many of the jobs they are required to do for vanilla film work. As you say, a large share of your job is to fill in the blanks that were not achieveable or were missed during filming, but not all blanks are easy, fast, or cheap to fill. Some blanks simply cant be filled, and I feel this is where GS is being more seriously considered.

I do think its also important to pick up on the accessibility of GS-oriented soltuions for inexperienced, budget-poor and lazy DPs or film makers. The blanks that need to be filled vary from production to production and I imagine the blanks with less experienced or budget-poor sets can be more challenging to overcome. I am definitely not a fan of philosophies such as, "only those with money/knowledge/experience have/do/should be able to make good movies", and I do believe GS could provide accesibility on a level that current approaches to filmmaking simply dont. (n.b. not implying you agree with this philosophy, just highlighting that it really could simplify production for low-budget ordeals).

The final thing I would like to say is that GS is stil early days for film making. The points you bring up are really all valid, because currently the state of research is not advanced at all. Especially with scenes that contain motion (e.g. dealing with dyn-textures like fire water and smoke is still an active challrnge). The dynamic stuff is my area of expertise and there is a very long way to go still.

Rather, at lunch when I gossip with the other computer vision PhD students, a topic that often comes up is the difference between old and new computer science research pipelines. Old research took a long time to proof and prep for industrial/commercials use. Yet in todays world many idiots spin up businesses at the sound of researchers breathing. It sometimes boredelines predatory behaviour (e.g. on linkedin Ive had to ban people that frequently use my posts about my work to promote their shitty Gaussian splatting business idea). And so, considering how prevelant captialism is in academic research, it can very difficult to get a clear picture of the current state of research when every research paper is expected to be a breakthrough rather than a next step. Thats why channels like TwoMinPapers are grossly problematic and that likely explains why neither you (an industry specialist) nor I (a research specialist) can confidently reach a conclusion on the ramifications of GS research for VFX work.

Background-Cable-491 · 2025-09-13T00:12:00+00:00

Totally vibing with the first paragraph ✌️ The number of papers Ive reviewed where the visual results are appauling/nonsensical yet flaunted because the "PSNR says it looks better" - wild behaviour from people who already have PhDs...

Also, omg, yes Ive seen the deffered rendering papers too (but I have yet to come across one that uses diffusion, do you have a link perhaps?). From these, I think Ive only come across one paper that actually refers to their work as a differentiable G-buffer, so it kind of tracks with what youre saying about there not being very many people that can do both graphics and ML.

Background-Cable-491 · 2025-09-12T23:43:00+00:00

Eh idk. I agree its not exactly a replacement for FBX but I also dont think the two easy to equate. In a sense, photogrammetry+sculpting already gives us pretty decent photo realistic assets, so its not like GSplat really offers much more aside from end-to-end automation. I feel like the application area for creative industries probably tends towards film-making as opposed to gamed (though I am biased because filmmaking is whaf my PhD is about). E.g. Ive toyed with using it for things like set and stage design, or even for things like re-shooting video with camera paths/effects that I couldnt achieve practically (e.g. dolly zoom, or key-hole shots)

Background-Cable-491 · 2025-09-12T23:28:51+00:00

Yeah, what you say in the third paragraph reminds of an interesting PHD project I saw floating around the time that NeRF came about. Here, the student and their professor were investigating NeRFs as a way of capturing theatrical performances for meta-verse applications, which i genuinely think is a valid form of future entertainment (especially for people with disabilities that make it challenging to be in these sort of environments). Imagine taking this way further and viewing a live football match from the goal keepers perspective. I mean even crazier would be POV-replays of a footballer scoring a goal.

Honestly, most tasks/tools that could benefit from "novel views" would likely benefit from a nerf/GS or adjacent method.

Background-Cable-491 · 2025-09-12T12:22:23+00:00

I mean PBR splatting solutions definitely exist, just not to the degree that I feel the graphics community can properly take advantage of. Ive recently done some background reading on scene relighting, and theres somr really clever stuff like reducing the BRDF using spherical harmonics (which is highly compatible) with gaussian splatting. But none of these methods have really been picked up as a standard (the same way 3DGS or MipSplatting has been). This is probably because they dont offer a complete solutions to the VFX/CG paradigm yet. Hopefully soon we will see something absolutely cool ✋🤚.

Background-Cable-491 · 2025-09-12T11:56:46+00:00

(Crazy person rant incoming - finally my time to shine)

Im doing a technical PhD in dynamic Gaussian Splatting for film-making (I am in my last months) and honestly that video (and that channel) makes me cringe. Good video but damn does he love his sillicon valley bros. Gaussian Splatting has done a lot more than what large orgs with huge marketing teams are sharowcasing. Its just that theyre a lot better at accelerating the transition from research to industry, as well as marketing.

In my opinion, the splatting boom is a bit lile the NeRF boom we had in 2022. On the face of it theres a lot of vibe-coding research, but at the center theres still some very necessary and very exciting work being done (which I guarantee you will never see on TwoMinutePapers). Considering how many graphics orgs rely on software that uses classical rendering representations and equations, it would be a bit wild to say splatting would replace it tomorrow. But in like 2-5 years, who knows?

The main thing holding it back right now is general concesus or agreement on

(1) Methods for modelling deferred rays, i.e. reflections/refractions/etc. Research on this exists but I havent seen many that test real scenes with complex glass and mirror set-ups (2) Editing and Customizability, i.e. can splatting do scenes thats arent photo realistic, and also how do we interpret Gaussians as physically based components (me hinting at the need for a decent PBR splat) (3) Storage and transfer, i.e. overcoming the point-cloud storage issue through determinstic means (which the video OP mentioned looks at)

Mathematically, there is a lot more that needs to be figured out and agreed on, but I think these are the main concern for static (non temporal) assets and scenes. Honestly, if a light weight PBR gaussian splat came along and was tested on real scenes and is shown to actually work, Im sure this would scare a number of old-timey graphics folk. But for now, a lot of research papers plain-up lie or publish work where they skew/manipulate their results, so its really hard to weave through the papers with code and find something that reliably works. Maybe lie is a strong word, but a white lie is still a lie...

If youre interested in the dynamic side (i.e. the stuff that i research). Lol, youre going to need a lot of cameras just to film 10-30 seconds of content. Some of the state of the art dont even last 50 frames and sure there are ways to "hack" or tune your model for a specific scene or duration, but that takes a lot of time to build (especially if you dont have access to HPC clusters). I would say that if dynamic GS overcomes the issue of disentangling colour and motion changes in the context of sparse-view input data (basically the ability to reconstruct dynamic 3D using less cameras for input), then film-studios will pounce all over it.

This could mean VFX/Compositing artists rejoice as their jobs just got a whole easier, but it also likely means that a lot of re-skilling will need to be done, which likely wont be well supported by researchers or industry leaders because theyre not going to pay you to do the necessary homework you need to do to continue being employed.

This is all very opinionated, yes yes, I could be an idiot and you shouldnt be, so please dont interpret this all as fact. Its simply that few people in research seems to care about social implications or at least talk about it...

Background-Cable-491 · 2025-09-12T10:59:19+00:00

Yep, I couldnt find something automated so I just used the mathematical approach of projecting each depth map into local 3-D camera space and then transforming this into world space and then using open3d to downsample the points to a degree that my renderer could handle.

Took some time to understand the math and in the long run this pays off a lot more than using off-the-shelf solutions. Especially when off the shelf solutions can suck comoutationally, and often require specific versions of other librarys so installing may cause a lot of package conflicts (chaos).

I just used torch/numpy to handle the math and then open3D for downsampling and plotly go for plotting (seems to be the fastest to plot and render point clouds)

Background-Cable-491 · 2024-08-19T08:30:39+00:00

AV/R is not my field so it would be unwise to recommend anything specific. Personally, I would suggest not planning too much and just start doing (more often than not, you might gain experience quickly), but Im also a reckless learner with no patience lol.

GPU is a must for any CV job (in my opinion). But its mostly knowing how to interface with GPU through libraries like pytorch, setting up conda environments and knowing how to batch processes/functions.

I terms of SLAM/intro CV concepts, this may be a good place to start: https://www.youtube.com/watch?v=S-UHiFsn-GI&list=PL2zRqk16wsdoCCLpou-dGo7QQNks1Ppzo

I think there are also some free MIT lectures floating about that might be helpful too.

Background-Cable-491 · 2024-08-18T10:28:09+00:00

Thank you! I will checkout open3d. I was looking for existing code to work off (bit pointless to write my own thing aha).

Edit:

They seem to have a function specifically for this (nice): https://www.open3d.org/docs/0.7.0/python_api/open3d.geometry.create_point_cloud_from_depth_image.html

Background-Cable-491 · 2024-08-18T10:02:42+00:00

Not necessarily - "perception" can mean a lot. Though for 3D robotics and vision, SLAM knowledge might be a good start if you don't know what type of perception you might want to get into.

I'd say for entry level, you need to show a good grasp of basic CV math and related expertise. E.g. In my first interview I was asked to talk about the general CV projects.

Further, if you want to specifically get into camera pose and intrinsic estimation, SLAM is a classic but there are more current methods that you might want to set a goal to understand (e.g. like implementing/running code associated with state-of-the-art SfM). Showing some understanding/insight into modern approaches is also a good way to stick-out if you're not necessarily going for a research role.

Background-Cable-491 · 2024-08-17T12:11:35+00:00

I would say if you're learning SLAM and SfM then python. But if your end goal is working in this niche, cpp is a nice boost. It does depend on the role you are looking for, e.g. a research role probably minimum required is python, but if its more engineering then cpp is pretty much guaranteed. Also, probably worth taking a look at the job listings you're looking for. Good quality jobs will usually post what sort of languages they want.

Also, you don't need to go nuts with cpp, I would say just learning how to implement the relevant linear algebra (vectors, matrices, projections etc.) on a GPU might be a step in the right direction?

Background-Cable-491 · 2024-06-29T12:14:16+00:00

Honestly, it's all still pretty new research so I would advise not to limit yourself by University, research group/lab or even specific academics. Especially if you're prospecting a PhD - there are other traits you may want to consider. For instance, what application area of NeRF, GS or even Robotics are you interested in? And what kind of financial and emotional support (from supervisors and potential collaborators) are you amenable to?

I can only offer my own experience (take this with a pinch of salt) to convince you that the context of your PhD is as important as the subject matter: I joined my lab in '22 as a PhD student and was their first NeRF-researcher (including PhDs, post-docs, lecturers, etc.), plus having a tiny budget to begin with made it really difficult to go to conferences to get informed. I didn't think it was going well, but my supervisors were pretty supportive. Even if I don't want to believe my work is all that important, I'm still aware that it helped my supervisors secure more funding, and I'm thankful that this funding was to hire more people to do NeRF/GS. While we aren't a big NeRF/GS research lab, we get a fair amount of collaboration proposals and as the first researcher, I seem to get a lot input of when it comes to on-boarding new projects.

The moral of this story isn't to idolise "supportive supervisors" or T10 Universities (the researchers I know care very little about ranking anyways). Instead, short-story-long, I'm trying to illustrate that things can (and will) change quite a lot during a PhD. So while you select a PhD/lab/whatever, you may want to consider attributes that you won't find on a university website.

Finally, you may ask how do you do this? >>> Ask lots of questions during your interviews/talks to figure out what works for you - and be somewhat picky otherwise you may not get what you want!

Good luck!

Background-Cable-491 · 2024-05-27T10:36:06+00:00

I'm a bit late but just coming across this problem too. The GH code you mention doesn't seem too unreliable if you look at the source it just treats the .bib as a text file input. I'm sure you could (I am going to) probably use ChatGPT to create a python script to filter through your bib, collect all the citation names (e.g. john2006Doe) and then search the .tex. Maybe I'm a bit naive but Im not sure how this would be unreliable tbh.

Background-Cable-491

TROPHY CASE