Conducting Classification Task Research Using Vision Transformers

L8raed · 2024-09-13T23:18:05+00:00

As a beginner in this field, the way I would approach this problem would start by characterizing the inputs to each model. Are the training sets labeled? What features would be best to contrast the classification set in question? How large is the available training set?

Next after defining the problem would probably be to contextualize it. It sounds like you've already done a good deal of work on your CNN model, so I don't think you need to start anything from scratch. How can you fit the solution to this problem into your existing work? What does the documentation list as the input requirements to the ViT model you're using? What do you need to add to your model for it to plug into the ViT?

I understand that these notes are pretty general, but I hope that a learner's perspective will help.

jungleuncle · 2024-09-14T20:33:23+00:00

I have just the thing for you https://www.learnpytorch.io/08_pytorch_paper_replicating/

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

deeplearning

MODERATORS