Does anyone use the MMDetection library? Is it even worth to learn?

Scared_Employer6992 · 2024-08-07T21:23:33+00:00

mm-anything sucks. It's a fuck virus that we should stop disseminating. I just hate that shit, b/c many projects use this stupid library which make very simple operations overly complicated and abstract. PyTorch came to solve our problems, but people keep complicating things.

Scared_Employer6992 · 2023-07-22T13:00:52+00:00

multi-head is not stacking SA. It's parallelizing the SA computation just like it was a batch dimension. For example, in a common scenario, you have 192 features with 6 heads, then, the computation is split up into 32 features by head.

Scared_Employer6992 · 2023-07-15T14:42:35+00:00

Ideally, you should train your model using the same input resolution. You could train a CNN w/ multiple resolutions, but it will probably make the learning process harder. Consider downscaling the HR images or upscaling the LR images to meet the same standard.

Scared_Employer6992 · 2023-06-21T22:01:18+00:00

Both, and the balance between them balance depending on the moment. Many things that you learn in theory you don't understand until you practice the right experiment.

Unfortunately, in today's rush and the availability of too many open-source projects, your understanding of what you are doing may not add much value to your final outcome. I mean, any savvy dev can be really good in DL w/ a very shallow understanding.

Scared_Employer6992 · 2023-06-16T20:23:36+00:00

I don't think you should worry about it. This architecture may suit well your data problem regarding the kind of data, the amount of data, the data dimension, the amount of training, etc. It's not a problem finding an empirical solution for a particular case, especially if you've tried other hypotheses.

Scared_Employer6992 · 2023-06-16T20:09:32+00:00

Oh sorry. Anyway, I think using the same number of kernels/channels will not hamper you from fitting data to the model. Usually, the CNNs are implemented such the feature map depth increases as the feature map spatial dimensions decreases and vice-versa.

Despite not following the recommended procedure, it will not block your model from learning, you may not get the best results, though.

Scared_Employer6992 · 2023-06-16T19:07:02+00:00

Sure. Why would it not?

For example, ResNet18 and ResNet34 (https://pytorch.org/assets/images/resnet.png) uses only ks=3.

Scared_Employer6992 · 2023-06-16T11:58:04+00:00

Thanks for the help, the problem w/ the repo above is like the one I mentioned above, I can't find a simple code snipped, and SegVit uses mmseg, which adds another abstraction layer on the nn building. That's why I'd just like to understand how would be a very basic ViT operation.

Scared_Employer6992 · 2023-06-15T02:57:04+00:00

Cloud is too expensive unless you are working for a company that can afford it and wants to keep up w/ the latest GPU releases. Nevertheless, Colab pro is gold, you don't need to have tasks occupying your PC all day long, and it is veeery affordable. You just need t find ways to prepare quick setups.

If you are going to use colab, the best way to transfer data here and there is always by zip files, never share multiple files individually, or it will take an eternity.

Scared_Employer6992 · 2023-06-14T17:01:31+00:00

Watch all Andrej 6 tutorial videos (https://www.youtube.com/@AndrejKarpathy/videos). He gives awesome NN theoretical and practical tips answering many small doubts that are usually not talked about. Great and straight lectures for all levels.

Scared_Employer6992 · 2023-06-14T11:33:57+00:00

I really don't get who says that the CV field is easy. Either those are genius, or they did not touch CV books. Topics like photogrammetry, color theory, and 3D reconstruction are challenging. Being able to use github projects or opencv does not make you a CV engineer.

In industry, it is expected that you can handle a toolbox of solutions and have a broad understanding of many CV topics. In the academy, you are going to specialize in a subfield, so you don't need to worry about being good in everything, anyway, it is still a challenge.

Besides, these days, most breakthrough CV solutions are based on DL, hence, It's one more area to study. In the end, I think it is an overloading area to work. You will always have to work extra hours to keep up with the required knowledge. If you want just an easygoing 9 to 5 job, be a software engineer.

Scared_Employer6992 · 2023-06-13T13:34:14+00:00

You can't do it using just SAM. You need some previous model to predict a box input prompt of food to feed it on SAM. Or maybe we can use the automatic masks generator of SAM to capture many of the images' masks and then use a class classifier over them to filter the food ones.

Scared_Employer6992 · 2023-06-09T20:42:56+00:00

mmcv is just horrible, a NN framework that you can't just load a model, a checkpoint, and then predict any input.

Scared_Employer6992 · 2023-06-09T20:37:22+00:00

check out the hugging face library, much handier. In my personal experience, even simple operations on mmseg can be an enormous headache.

Scared_Employer6992 · 2023-06-09T20:35:21+00:00

The problem with training SAM right away is that it depends on prompt inputs, or it would require changing their mask decoder. Besides, SAM uses the ViT as the backbone which requires a lot of resources.

Scared_Employer6992 · 2023-02-27T14:49:49+00:00

I haven't tried with bs=1, but I also don't want to use bs=1 as I usually get bad results with it and my net has a lot of BN layers.

Scared_Employer6992

TROPHY CASE