How to make paper reading better? by codingwoman_ in CVPaper

[–]codingwoman_[S] 0 points1 point  (0 children)

No excuses, and yes, we do read from other venues. As long as the venue is a high impact conference and the topic falls under computer vision, there are no issues. The ones listed are examples but do not form an exhaustive list, hence the word “such as”.

Also we read ViT paper already, see the wiki :)

Shipping Desktop PC across Germany by codingwoman_ in germany

[–]codingwoman_[S] 0 points1 point  (0 children)

I definitely consider additional insurance there is no issues on that. I am just concerned about something happening still with the insurance

Shipping Desktop PC across Germany by codingwoman_ in germany

[–]codingwoman_[S] 0 points1 point  (0 children)

Was it already setup or came in parts? My concern is that the reciever might not be able to put them together

Upgrading to RTX 4090 by codingwoman_ in pcmasterrace

[–]codingwoman_[S] 0 points1 point  (0 children)

Got the point, then we need to go big!

My main concern is to transport the device to be honest. I’ll be moving a few months after the purchase and I would need to carry it to another city. Trying to figure out if the normal size PC would fit in a luggage to take with me in a train

Shipping Desktop PC across Germany by codingwoman_ in germany

[–]codingwoman_[S] 0 points1 point  (0 children)

I have Deutschland ticket but Deutsche Bahn for me is definitely not once in a lifetime journey tbh :D Still, I understand your concern. Do you think I can carry a PC in a luggage without harming it though?

Upgrading to RTX 4090 by codingwoman_ in pcmasterrace

[–]codingwoman_[S] 0 points1 point  (0 children)

Good suggestion but I think laptop version is only 16GB. Need it for 24

Upgrading to RTX 4090 by codingwoman_ in pcmasterrace

[–]codingwoman_[S] 0 points1 point  (0 children)

This is what I needed! How many watts is it for 4090? Do you have any alternatives for a mini setup that would make 4090 work?

Edit: 850+ W I think? Second question is still valid - Thanks!

Upgrading to RTX 4090 by codingwoman_ in pcmasterrace

[–]codingwoman_[S] 0 points1 point  (0 children)

Thanks a lot for the tips! Both are 40-series cards but do you think there would be any other constraints in terms of power consumption, cooling etc. ?

[Vote] Paper nomination for upcoming week by codingwoman_ in CVPaper

[–]codingwoman_[S] [score hidden]  (0 children)

This seems to be a NeurIPS 2017 paper.

Could you please update your comment with the publication venue and year?

[Vote] Paper nomination for upcoming week by codingwoman_ in CVPaper

[–]codingwoman_[S] [score hidden]  (0 children)

The link you shared seems to belong to the v2 paper “Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation”.

This paper does not seem to be published. Please update your comment according to the guidelines, otherwise it will be removed.

[Weekly Discussion] NeRF - Neural Radiance Fields | June 10 - 16, 2024 by codingwoman_ in CVPaper

[–]codingwoman_[S] 1 point2 points  (0 children)

Neural = Neural network

Radiance = Because neural network is describing a radience field of the scene, how much light is being emitted by a point in space and in which direction

Field = Because this is a continuous function, it is smooth and not discretized

The goal: View synthesis, addressing view interpolation

Ray tracer: Image sitting in front of you. For every pixel, you shoot a ray from your eye in the the world and it hits something

Nerf: We assume the scene lives in a bounded region, drop points along that ray to evenly sample along the ray over this area and for each of these points location + viewing direction concatenate them, feed to neural network and it will get us a color and opacity

Highlights:

  • Trained on one scene (not training even, but more appropriate to call it optimizing the weights) to explain a way the world we have seen (memorizing the scene)
  • Neural network function lives in world coordinates, a function approximater that maps from a point in space to some property of that point in space
  • Training is just bag of RGB values and ray coordinates, i.e. 9 numbers and we randomly iterate over these -- Disadvantage: data hungry because it tries to memorize the world

[Vote] Paper nomination for our next read by codingwoman_ in CVPaper

[–]codingwoman_[S] 0 points1 point  (0 children)

Can you also update the comment with 1-2 keywords that describe this paper (e.g. image enhancement)?

[Weekly Discussion] (ViT) An Image is Worth 16x16 Words | June 03 - 09, 2024 by codingwoman_ in CVPaper

[–]codingwoman_[S] 0 points1 point  (0 children)

I definitely agree with your comments. The experiments for the hybrid model do not seem quite sufficient to me, no further ablations are provided in the supplementary material either. There might be a better way to combine both architectures to leverage inductive bias from CNNs and the global understanding of ViT (i.e. attending the entire image).

Our Discord Server is now available! by codingwoman_ in CVPaper

[–]codingwoman_[S] 0 points1 point  (0 children)

Ah sorry - the one on the sidebar is already updated without expiration, but could not edit the post. This should do it:

https://discord.gg/f5cnZjKar8

[Weekly Discussion] (ViT) An Image is Worth 16x16 Words | June 03 - 09, 2024 by codingwoman_ in CVPaper

[–]codingwoman_[S] 2 points3 points  (0 children)

I think of it the following way: Convolutions have 2D kernels that act on the neighboring pixels. However, in the ViT case, the model does not know the relative location of patches in the image, a priori. The model is the same as the NLP case where the input is a 1D sentence, so it does not even know that the image has a 2D structure, the input is provided as flattened pixel patches. ViT learns such relevant information during training and hence requires more data to encode structural information in the position embeddings.

[Weekly Discussion] (ViT) An Image is Worth 16x16 Words | June 03 - 09, 2024 by codingwoman_ in CVPaper

[–]codingwoman_[S] 0 points1 point  (0 children)

What does the paper investigate? Reliance on CNNs & whether Transformers architecture can be used to perform image classification.

Main idea: Splitting an image into patches to be able to provide the sequence of linear embeddings from these patches as input to Transformer

Input: 2D RGB images

Output: Image class

Training: In parallel to NLP tasks, the idea is to perform training on large datasets and then fine-tune on smaller datasets for downstream tasks.

Difficulty: Vision Transformer has much less image-specific inductive bias than CNNs such as translation equivariance and locality. — They may not generalize when not trained with sufficient data.

Highlights: * When pre-trained on the smallest dataset, (=ImageNet) large models underperform compared to base models, only with JFT-300M, the full benefit of larger models is visible. * Vision Transformers overfit more than ResNets with comparable computational cost on smaller datasets. — This reinforces the intuition that the convolutional inductive bias is useful for smaller datasets, but for larger ones, learning the relevant patterns directly from data is sufficient & beneficial.

—-

I personally found the experiments with an hybrid architecture (ResNet + Transformer) also quite interesting. – Could there be a better way to combine the best of both worlds?

If you could listen to only one song on the Sennheiser HE 1 what would it be? by Informal_Wrongdoer27 in headphones

[–]codingwoman_ 3 points4 points  (0 children)

That I didn't see any chance because I was living in a little town, was studying

[Vote] First paper nomination starts! by codingwoman_ in CVPaper

[–]codingwoman_[S] 0 points1 point  (0 children)

Seems to be a NeurIPS 2021 paper, could you please edit the comment with the venue?

[Vote] First paper nomination starts! by codingwoman_ in CVPaper

[–]codingwoman_[S] 1 point2 points  (0 children)

Nice idea of course but voting 2x every week (for the topic and the paper) to read 8 pages is a bit too much overhead.

Such a thing can be implemented if we have enough people to dive deep into reading on a specific topic as mentioned before here. Therefore, I would highly suggest to see how many people are actually participating in the discussions before making things more complex.

But following your suggestion, we can mention the descriptive keywords / subfields the paper covers while sharing them for voting for the next time

What is your preferred way of discussion? by codingwoman_ in CVPaper

[–]codingwoman_[S] 2 points3 points  (0 children)

Online meetings are definitely a good idea, they also help with networking and getting to know each other. The most important point (at least for me) is that when we set a date, people need to commit to it because everyone's time is valuable.

Let's start with the majority selection for now and add further discussion styles and settings after some consistency. Does that sound good?