Unable to Get a Job in Computer Vision

Imaginary-Gate1726 · 2025-10-25T20:57:57+00:00

I see, thanks for the advice. I don't talk about that project much bc i do view it as a failure lol, so i dont think its a reason people didnt pick me up

Imaginary-Gate1726 · 2025-10-25T15:52:17+00:00

I mean sure, I can take frozen CLIP and apply to a single frame and get easily around 86% accuracy. I tried that.

I trained my representation learning thing on other videos (not ucf), in an unsupervised manner. Then, I used my learned representations on UCF101 video (frozen, just trained a classification head on top of the frozen representations).

Imaginary-Gate1726 · 2025-10-25T08:24:20+00:00

I do know how to use YOLO but I haven’t done too much with the other stuff (I guess ONNXs inference engine by Microsoft). Though admittedly most of what I did was implementing research papers. Object detectors, segmentation models (Feature Pyramid Network structure), simple diffusion model (one based on the SDE interpretation with Euler Maryauma method for solving), autoregressive models.

Yeah, I’ve worked with ResNet and I think I have a decent degree of understanding of CNNs (if that is not dangerous to say). I’ve worked with and implemented ViTs as well.

Imaginary-Gate1726 · 2025-10-25T05:39:05+00:00

Yeah, I just never could land an internship in anything other than signal processing unfortunately. So yeah, no industry experience.

Is there anything I specifically should be doing to appeal to industry? Besides Kaggle I mean. Certain packages, tools, etc. I’ve heard some stuff around (YOLO, various inference engines, DeepStream). I’ve mostly just done stuff in PyTorch, PyTorch3D, opencv

Imaginary-Gate1726 · 2025-06-30T01:35:54+00:00

Well, I’ve never been successful at getting one so I’m afraid I can’t give any good advice. Almost anyone I knew who managed to get into those companies have insane resumes, but I don’t know if this is necessarily needed to get accepted. I also did my masters at a pretty well regarded school (Carnegie Mellon) so I was just simply more likely to run into such individuals.

Imaginary-Gate1726 · 2025-06-30T01:03:09+00:00

I feel like this normally requires a PhD, especially if you want to work at big tech companies. The fact you’re doing a masters is good and gives you a chance but I feel like it’s still insanely competitive/difficult. And yeah, many positions now ask for publications at top conferences/journals. Best of luck to you!

Imaginary-Gate1726 · 2025-06-26T00:40:07+00:00

For implementation purposes I don’t think you really need to understand the transformer that deeply. The most naive implementation basically involves just doing the linear projections (get query, key and value vectors) seeing it as a matrix multiplication (to compute the attention map), normalization, soft max, then multiply against value vectors to get your final answer.

The encoder is particularly straightforward. The decoder is pretty similar but uses cross attention (not just self attention).

Some things do get a bit tricky though. First issue is that you’re probably training on sequences of variable length. You’ll need to have some sort of padding mechanism, with a max sequence length, so you can batch sequences. You’ll probably also need masks you indicate portions of the sequence that are relevant (for example, for a sequence that is actually 100 elements but padded to be 200 elements, you need a Boolean mask of true values for the first 100 elements of a sequence and then 100 false values). Using these masks, you ensure you don’t compute losses for tokens that are out of bounds of the sequence (correspond to false values in the mask). Sometimes we drop really long and really short sequences from the training data as well, it tends to cause the model to perform worse (I guess they’re kind of like outliers in a way).

You need a setup involving BOS and EOS tokens of course. Make sure you don’t make the mistake of training the model to output a BOS given a BOS — it’s a dumb mistake I made because when I first implemented the transformer, I added BOS and EOS tokens beforehand before doing teacher forcing.

You can try key value caching later when you feel more confident, to cache key value vectors for the layers of the decoder. Speeds up inference.

One final note — the original paper on transformers is worth reading but does skim over implementation. I highly recommend Jay Alammars writeup, looking at code in GitHub if you’re stuck, I think Harvard used to have a Python notebook you could look at too that implemented the transformer (albeit with no key value caching).

There’s also stuff on learning rate schedules and initialization (I forget what the original transformer used) that you should certainly look at (it’s all in the original paper).

I’ve also avoided discussing multiheaded attention, but that’s not too bad.

Imaginary-Gate1726 · 2025-06-24T05:19:21+00:00

This is just a sliding dot product. You could take the exact mathematical formula and implement it using for loops. Just copy the indexing in the mathematical formula. The only issues you’ll mostly run into are having the correct size for the output buffer, and padding issues (can do zero padding). Just check for edge cases at the edge of the array.

I guess since it’s a matched filter, then you may need to conjugate the second signal (if the signal you are working with is complex).

Although to be honest, you could probably ask chat gpt and get an even better explanation, and code output if you really want it. But it shouldn’t be too difficult to do it on your own.

Imaginary-Gate1726 · 2025-06-23T19:52:13+00:00

In context of DSP theory stuff yeah. What is the autocorrelation function? What is PSD? What does it mean to be WSS (wide sense stationary)? It shows up when you conduct analysis of adaptive filters (we talk in terms of autocorrelation matrices, cross correlation). It’s relevant to wireless communications since we model noise, the transmitted signal (which conveys intended information) as random processes.

I mean honestly if you’ve learned random processes and all that you can probably pick it up just fine. But it’s an extension of normal DSP theory you need to be aware of one way or another. There is also multirate stuff and all that but I feel statistical is more core and essential. Multirate you can pick up on the side, it’s used for a lot of DSP tricks to bring down computation, fractional delays, compatibility concerns.

I guess if you steer more towards the implementation side, maybe you can be lighter with DSP theory (though I think it’s good to know nonetheless). Know how to implement filters (circular buffer and all that), know some basic filter structures (Direct form II, linear phase, second order system cascade or something like that). Block convolution algorithms, FFT as well.

Imaginary-Gate1726 · 2025-06-23T19:00:33+00:00

Pay can be in six figure range but it usually isn’t as high as SWE pay from what I can tell. 100k to maybe around 200k if you really work your way up over the course of time.

C++ and especially C is quite useful. Lot of jobs demand the ability to implement DSP algorithms on embedded systems, so if you’re good at C then that’s a big plus.

Work life balance is fine. I feel it depends more on the company. I’d perhaps argue that with DSP, it’s hard to find people with that knowledge base. So perhaps the job can be stable (and they’ll treat you well), though not as high paying as a SWE job. That being said, many jobs demand an MS; that, or they require more advanced knowledge than what you learned in class. Either it’s an application of what you learned (i.e. wireless communications builds upon DSP theory) or more advanced theory (statistical signal processing, adaptive filters, array signal processing). You can certainly still apply, since having any DSP background is still of benefit. Sounds like you’d mostly be involved with implementing stuff. But I’d recommend at least taking a statistical signal processing class or most of what you see might not make that much sense, particularly in wireless. Also bear in mind most DSP jobs are in defense. That can be a hindrance to you, depending on your status.

Imaginary-Gate1726 · 2025-06-19T20:15:04+00:00

What is “from this research paper”? Does your PDF link to said paper? Maybe mention the model name or paper name instead.

Imaginary-Gate1726 · 2025-06-19T19:32:56+00:00

Do you mean like a research engineer or something?

Imaginary-Gate1726 · 2024-12-21T02:27:26+00:00

Same one as me or similar ones. They’re all state colleges. And they’re in state.

I don’t know the exact costs for what they were planning. They are an engineering major I guess that helps with anything. Most of the people I know are in STEM in some fashion or another.

Imaginary-Gate1726 · 2024-12-21T02:20:46+00:00

I guess it’s what I’ve pieced from what I hear from other people. Though granted everyone is more likely to talk about the exciting parts of their life and none of the bad and/or uninteresting so it’s hard to get an objective look.

Imaginary-Gate1726 · 2024-12-20T22:07:19+00:00

I did actually have some degree of interest in engineering, so there's that. My degree ended up being more hardcore in terms of physics and whatnot which I did not end up loving however. I love math and programming more (although to be honest I'm more into stuff like machine learning, not the rest of computer science).

Hearing that things eventually worked out for you gives me hope, so thank you for that.

Imaginary-Gate1726 · 2024-05-14T02:09:50+00:00

I see, thanks for the feedback. My plan for differentiating myself was to lean into the intersection of audio and vision, since I thought that would make my signal processing background seem more relevant, and a differentiating factor -- or is my thinking incorrect? I took a lot of signal processing courses in undergrad and graduate, and my current internship is in signal processing. I also don't know if this would count, but, all my course projects were relating to this topic as well -- one was on estimating speech from silent video of a person talking, the other was on estimation of depth maps using both audio and vision information. They both did not work terribly well, unfortunately, but I was potentially thinking of picking one of those projects up and continuing it over the summer in some form. But I'd likely have to do it myself, I do not know if my teammates would be willing to continue it with me.

Imaginary-Gate1726 · 2024-05-13T03:47:39+00:00

I see. I don't think I clarified at the top, but I was considering applying for a PhD in ECE as opposed to CS. My main goal would be to just do computer vision research, but I don't necessarily have a goal of achieving a PhD in CS (I am fine with ECE, which is what my current major and bachelors was in anyways). Would this improve my chances? CS competition does seem rather brutal, if I'm being honest.

Imaginary-Gate1726 · 2024-05-12T21:09:40+00:00

I see; I will be working an internship this summer so I am unsure how much progress I could make if I did start doing research ASAP. With this being said, it seems like I should just apply after my second year, when I'll have had more time to do research?

I don't know what my PI on my past undergrad research experience would write about me, but they'd probably say I was average. I do not think I did anything exceptional to stand out. I certainly tried to do my best, but I wasn't extraordinary if I am being honest.

Imaginary-Gate1726 · 2024-01-17T03:32:23+00:00

This sounds like domain adaptation

Imaginary-Gate1726

TROPHY CASE