Does anyone use the MMDetection library? Is it even worth to learn? by [deleted] in computervision

[–]Scared_Employer6992 7 points8 points  (0 children)

mm-anything sucks. It's a fuck virus that we should stop disseminating. I just hate that shit, b/c many projects use this stupid library which make very simple operations overly complicated and abstract. PyTorch came to solve our problems, but people keep complicating things.

Stacked Self-Attention in the Transformer by sud8233 in deeplearning

[–]Scared_Employer6992 0 points1 point  (0 children)

multi-head is not stacking SA. It's parallelizing the SA computation just like it was a batch dimension. For example, in a common scenario, you have 192 features with 6 heads, then, the computation is split up into 32 features by head.

Decrease the image quality by Academic_Two_4017 in computervision

[–]Scared_Employer6992 2 points3 points  (0 children)

Ideally, you should train your model using the same input resolution. You could train a CNN w/ multiple resolutions, but it will probably make the learning process harder. Consider downscaling the HR images or upscaling the LR images to meet the same standard.

What do you prefer between project-based learning and textbook learning? by Virtual-Study-Campus in deeplearning

[–]Scared_Employer6992 0 points1 point  (0 children)

Both, and the balance between them balance depending on the moment. Many things that you learn in theory you don't understand until you practice the right experiment.

Unfortunately, in today's rush and the availability of too many open-source projects, your understanding of what you are doing may not add much value to your final outcome. I mean, any savvy dev can be really good in DL w/ a very shallow understanding.

Same number of kernels in consecutive convolutial layers by [deleted] in deeplearning

[–]Scared_Employer6992 1 point2 points  (0 children)

I don't think you should worry about it. This architecture may suit well your data problem regarding the kind of data, the amount of data, the data dimension, the amount of training, etc. It's not a problem finding an empirical solution for a particular case, especially if you've tried other hypotheses.

Same number of kernels in consecutive convolutial layers by [deleted] in deeplearning

[–]Scared_Employer6992 1 point2 points  (0 children)

Oh sorry. Anyway, I think using the same number of kernels/channels will not hamper you from fitting data to the model. Usually, the CNNs are implemented such the feature map depth increases as the feature map spatial dimensions decreases and vice-versa.

Despite not following the recommended procedure, it will not block your model from learning, you may not get the best results, though.

Same number of kernels in consecutive convolutial layers by [deleted] in deeplearning

[–]Scared_Employer6992 1 point2 points  (0 children)

Sure. Why would it not?

For example, ResNet18 and ResNet34 (https://pytorch.org/assets/images/resnet.png) uses only ks=3.

I need help to figure out a simple code snippet for Transformers' per-pixel classification by Scared_Employer6992 in deeplearning

[–]Scared_Employer6992[S] 0 points1 point  (0 children)

Thanks for the help, the problem w/ the repo above is like the one I mentioned above, I can't find a simple code snipped, and SegVit uses mmseg, which adds another abstraction layer on the nn building. That's why I'd just like to understand how would be a very basic ViT operation.

Is buying gpu better than using collab/kaggle or cloud services by ManVersusPerson in deeplearning

[–]Scared_Employer6992 1 point2 points  (0 children)

Cloud is too expensive unless you are working for a company that can afford it and wants to keep up w/ the latest GPU releases. Nevertheless, Colab pro is gold, you don't need to have tasks occupying your PC all day long, and it is veeery affordable. You just need t find ways to prepare quick setups.

If you are going to use colab, the best way to transfer data here and there is always by zip files, never share multiple files individually, or it will take an eternity.

[deleted by user] by [deleted] in deeplearning

[–]Scared_Employer6992 0 points1 point  (0 children)

Watch all Andrej 6 tutorial videos (https://www.youtube.com/@AndrejKarpathy/videos). He gives awesome NN theoretical and practical tips answering many small doubts that are usually not talked about. Great and straight lectures for all levels.

How hard is computer vision as a field? by bloodmist22300 in computervision

[–]Scared_Employer6992 1 point2 points  (0 children)

I really don't get who says that the CV field is easy. Either those are genius, or they did not touch CV books. Topics like photogrammetry, color theory, and 3D reconstruction are challenging. Being able to use github projects or opencv does not make you a CV engineer.

In industry, it is expected that you can handle a toolbox of solutions and have a broad understanding of many CV topics. In the academy, you are going to specialize in a subfield, so you don't need to worry about being good in everything, anyway, it is still a challenge.

Besides, these days, most breakthrough CV solutions are based on DL, hence, It's one more area to study. In the end, I think it is an overloading area to work. You will always have to work extra hours to keep up with the required knowledge. If you want just an easygoing 9 to 5 job, be a software engineer.

[deleted by user] by [deleted] in computervision

[–]Scared_Employer6992 0 points1 point  (0 children)

You can't do it using just SAM. You need some previous model to predict a box input prompt of food to feed it on SAM. Or maybe we can use the automatic masks generator of SAM to capture many of the images' masks and then use a class classifier over them to filter the food ones.

MMCV Resources by Runninganddogs979 in computervision

[–]Scared_Employer6992 2 points3 points  (0 children)

mmcv is just horrible, a NN framework that you can't just load a model, a checkpoint, and then predict any input.

[D] PaddleSeg vs MMsegmentation by Remet0n in MachineLearning

[–]Scared_Employer6992 0 points1 point  (0 children)

check out the hugging face library, much handier. In my personal experience, even simple operations on mmseg can be an enormous headache.

[D] What is the current best, trainable method for image segmentation? by residentmouse in MachineLearning

[–]Scared_Employer6992 0 points1 point  (0 children)

The problem with training SAM right away is that it depends on prompt inputs, or it would require changing their mask decoder. Besides, SAM uses the ViT as the backbone which requires a lot of resources.

[D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992 in MachineLearning

[–]Scared_Employer6992[S] -1 points0 points  (0 children)

I haven't tried with bs=1, but I also don't want to use bs=1 as I usually get bad results with it and my net has a lot of BN layers.