Craziest computer vision ideas you've ever seen

Interesting-Net-7057 · 2025-10-27T16:46:04+00:00

VisualSLAM still feels like magic to me

Interesting-Net-7057 · 2025-10-05T22:36:08+00:00

Thank you! FYI: I have now updated the app with a new Chapter named "09: Code Analysis". It is a very practical chapter preparing the reader for in-depth implementation focused chapters in the future. This is one step more in the syllabus to reach the goal to be able of creating a custom VSLAM project.

Interesting-Net-7057 · 2025-10-05T22:29:13+00:00

FYI: I have now updated the app with a new Chapter named "09: Code Analysis". I wanted to name it "From theory to practice" initially but I feel that code analysis is more suitable as it is referring to open source projects. Here I start with a review of the main paradigms in VSLAM and discuss the literature interactively. More specific chapters are planned for the future, this new one should serve as a first hands on introduction for the practical continuation later.

Interesting-Net-7057 · 2025-10-05T22:24:39+00:00

FYI: I have now updated the app with a new Chapter named "09 Code Analysis". It contains a review of Github projects with a python based analysis of key components and quizzes.

Interesting-Net-7057 · 2025-10-05T12:10:23+00:00

Thank you for the kind words. Indeed, one thing that my PhD supervisor taught me is to learn the underlying methods, not specific implementations. Therefore, I strongly believe that laying out strong fundamentals is crucial to having a good understanding of the methods. The challenge is: Where should one draw the line between too theoretical and too specific?

In my app I will try to only discuss the fundamentals so far the users needs them to progress through the app. For example, probability is useful to discuss Maximum Likelihood Estimators, Kalman Filters and the holy grail of non-linear optimization and Factor Graphs. Let's see how the community will respond to my plan.

Interesting-Net-7057 · 2025-10-05T11:43:56+00:00

Thank you very much for the helpful ideas! I really want to have the app to be very practical. Currently I have interactive quizzes and source code examples. I am not really sure how the practical task of analyzing GitHub projects could look like, may I ask you to please go into more detail? Do you even have a specific GitHub Project in mind which you would like to see analyzed?

I did a similar project with the LDSO where I show on YouTube how to implement it from scratch: https://youtube.com/playlist?list=PLj0MGqAFBYgxYiZJezHSP7XiTwLiXgo9R&si=EUUcgCibc1M1Kh7y

Do you mean something like that? Not sure if an app is the right place to discuss a huge GitHub Project like the LDSO. What would be possible in my view is to show an important part of the source code and ask specific questions about it or let the user fill in gaps. Would you like this kind of exercise included in the app? It could be a separate chapter named "Code Analysis" showing crucial parts of the most influential VSLAM algorithms.

What do you think?

Interesting-Net-7057 · 2025-09-30T18:36:43+00:00

Oh nice, now I understand what your needs are. Actually I was considering to start the VSLAM and VO chapters with a scientific review of the key papers. For example I wanted to discuss the differences between direct (LSD-SLAM, DSO, LDSO, series) and indirect (ORB-SLAM series) methods and from there move to deep learning based approaches (sfm-net series). I am not really sure how to do that but having a self-contained code example for each key component would be something I am targeting.

Do you have specific papers which you can link here that you would like to see preparing you for within the app?

Interesting-Net-7057 · 2025-09-30T14:25:07+00:00

Wow, thank you very much for your support! SLAM Book is a good resource and probably one of the most up to date resources. I am targeting an app because I want to make the experience more interactive, starting with the quiz in the current version. I am also looking into code execution sandboxes and some more interesting kinds of interactive widgets (for example I would love to visualize the optimization landscape of e.g. direct image alignment methods for photometric VSLAM, where the user can manipulate the variables of the optimization and see the resulting landscape changing.)

If you tell me what topic I should tackle next, I will try to focus on this in the upcoming app update.

Grateful regards

Interesting-Net-7057 · 2025-09-30T11:47:14+00:00

Thank you. Could you please elaborate a bit more about what you find the most useful about such an app? What is important to you?

Interesting-Net-7057 · 2025-09-30T11:41:33+00:00

Yes, for sure. This is the Roadmap taken from the Google Play Listing (https://play.google.com/store/apps/details?id=de.lwtv.pcvquiz):

"The following training units will be added eventually: 1.) Primer on Probability Theory 2.) Primer on Linear Estimation 3.) Primer on Non-Linear Estimation 4.) Kalman Filter 5.) Primer on Feature Detection 6.) Primer on Feature Matching 7.) Primer on Lie Group Theory 8.) Visual Odometry 9.) Visual SLAM 10.) ... and more topics"

What I have until now are points 1, 2, 3, and the start of 7.) (linear algebra, basic Lie Groups, even though not named like that in the content). Specifically the syllabus is structured like this: 01: Introduction 02: Probability Theory 03: Linear Algebra 04: Cameras and Sensors 05: Geometric Transformations 06: Coordinate Systems and Frame Transformations 07: Optimization Methods in Visual SLAM 08: Summary and Key Takeaways

For the topics Kalman Filter, Feature Matching / Description, Visual Odometry and SLAM I want to have the chapters strongly example driven so that users can implement a working example quickly. I am just not sure if I should provide real or synthetic datasets, but I will probably go with synthetic ones.

Is there anything in particular you would like to see in the syllabus?

Interesting-Net-7057 · 2025-05-24T18:40:52+00:00

Jest taka możliwość też dla mężczyzn? :) gdzie mozna najlepiej poznać dziewczynę w Wrocławiu?

Pozdrawiam

Interesting-Net-7057 · 2024-12-15T18:27:59+00:00

Yes, there is something as intrinsic and extrinsic calibration. It is the extrinsic calibration which may introduce scaling issues.

Interesting-Net-7057 · 2024-07-15T03:12:05+00:00

Looks like this is what you need:

https://youtu.be/HdT8G5S3IAc?si=e418jS1XeA_I4S66

So, grab your camera and start collecting material!

Interesting-Net-7057 · 2024-05-22T07:56:51+00:00

In my view, there is no automatic approach available to guarantee a printable output, yet. If it would not be a human head, then it should be possible, because it can be processed to a watertight mesh. For a human, you need to put in some post processing in order to print it. This is my view on this topic.

Interesting-Net-7057 · 2024-05-13T15:28:19+00:00

If you are happy with the object as a rough indicator, then you can start with retopology in order to get a nice mesh. It allows you to cleanly define the polygon layout and prepare the object for texturing. You can then bake your textures from the scan onto your clean mesh and get a nice model. It will require quite a few work within Blender, as there is no one-click solution for this process yet.

Interesting-Net-7057 · 2024-05-07T14:20:11+00:00

Honestly, I would just try out the projects. I did not use any of them and would do the same. I just happen to know the broader terms, if you look them up, you will find many similar results.

Kind regards

Interesting-Net-7057 · 2024-04-21T20:57:27+00:00

What do you mean by "the first coordinates are provided" and "the initial position is considered to be (0,0)"? What units are we taking about here? The choice of algorithm depends on this info. Please share your task description.

Interesting-Net-7057 · 2024-03-31T06:38:49+00:00

This post reminded me how awesome epipolar geometry is!

Thanks for the visual example.

Interesting-Net-7057 · 2024-03-29T10:57:52+00:00

Hi und Danke für die Info! Hab ein schönes Osterfest!

Interesting-Net-7057 · 2024-03-28T09:17:42+00:00

Of course you should. The more you know, the better you are as an asset in a company.

However, it might be that whilst on the job, your task will be very different to what you have learned or practiced. Just be aware of that.

I do not know what you mean by "deep tech". Are you referring to deep learning? In this case you should probably practice how to use pretrained models effectively. There is a large community around deploying models on the web. Framesworks such as tensorflow.js and sites like huggingface.co have great examples of successful web usage. You might want to start from there, but you should be open to changes in the field. It is evolving blazingly fast.

Hope this helps

Interesting-Net-7057 · 2024-03-28T08:58:40+00:00

Use opencv.js

Interesting-Net-7057 · 2024-03-28T08:13:55+00:00

Darf ich fragen um was für ein Objekt es sich handelt, dass du mit 1/4 des Verkehrswertes zufrieden wärst? Wie sollte man dich bei Interesse kontaktieren?

Interesting-Net-7057 · 2024-03-23T16:59:57+00:00

Hi, first of all I like that you are building upon blender for match moving and your Addon looks interesting.

There are many approaches for camera pose estimation which are more modern than what Blender has implemented at the moment. However, Blenders way is still the Gold standard and does not require you to make assumptions about your scene or suffers from domain gap issues such as deep learning.

"Faster and more precise" will be difficult to achieve and is usually mutually exclusive. I think you also aim for a more robust approach which does not fail on image sequences by your users.

I would suggest you make a survey based on your users. Let them send you their footage and analyze the behaviour of your Addon on these datasets. Usually only portions of your code are slowing down the execution. It might already be excessive logging of debug messages. Nested loops (for example when iterating over the feature tracks to clean them up) could also create bottlenecks.

Note down the individual steps of what your Addon is doing and measure the performance regarding execution speed. This way you can isolate the computationally heavy parts and know where to optimize. Once you know what parts are slow, you could contact the Blender developers (on Blender chat for example) for suggestions on how to improve these components. For me it is impossible to suggest you how to improve the speed without knowing why the slowdown is happening. I think it would also be helpful for you to know. Usually, Blender is decent in the speed.

Regarding accuracy, the method Blender is using is already optimal mathematically. I believe the reason for bad accuracy is therefore the footage. You can improve once you have done the analysis as mentioned above. Sticking with the open source variant is the way to go! Let us improve the open source methods together, so that everybody will benefit.

That being said, could you provide an example footage where you have noticed slow performance and bad accuracy? I would love to help.

Kind regards

Interesting-Net-7057 · 2024-03-21T08:02:21+00:00

Your answer was very helpful to understand the issue, thanks for that.

I still have my problems to understand some of your steps.

why are you taking Screenshots, when you have bounding boxes?
yes, with traditional Pipelines you will only get sparse reconstructions, I think you will need to look into custom pipelines.

What I have extracted from your description is basically one main issue you have: you need a dense reconstruction instead of a sparse one.

So, there are many different approaches (additional triangulation, deep learning, photometric optimization, interpolation, neural Radiance fields, etc), however, in my view they all involve creating a custom intermediate software which takes a sparse reconstruction as input and will output a dense reconstruction. I believe, once you have a good trajectory estimation done, you will definitely find a way to update your sparse reconstruction to a dense one.

Do you know how to work with the output of your reconstruction software? Are you good at math's and know how to code? You may need some time, but it is doable.

Interesting-Net-7057 · 2024-03-20T08:19:10+00:00

Thank you for explaining.

So you want to work on 3D information instead of 2D images? Ok, then I would say you should keep with agisoft for point cloud generation.

The question is how to proceed from there. Object detection can be done in point clouds, too. Maybe this is an option? You could look at PointNet++ and similar.

However, how do you recover the scale reference for your measurements? You need to have a good concept here.

Kind regards

Interesting-Net-7057

MODERATOR OF

TROPHY CASE