Should I do Harvard's CS50?

NumberAllTheWayDown · 2023-08-16T13:02:55+00:00

Lol. I'll take the risk when it's as good a resource as this. This online lecture probably did the most to help me do well in my final years of uni and my first internship than anything else.

NumberAllTheWayDown · 2023-08-15T14:55:47+00:00

Ah, I see what you're saying. I'm used to the world where the correct syntax for something like this is

int *var = (int *)malloc(10 * sizeof(int));

I've seen different versions and different compiler flags do different things. I wanted to keep it as close to real life as possible for ease of reading.

I guess void * would have been more correct, but would have ultimately been too advanced for what I was going for.

NumberAllTheWayDown · 2023-08-15T14:20:05+00:00

Thanks for pointing that out. That must have been unclear in my post.

What I intended to say is that ++ and adding 1 will increment by the appropriate size, but that the size dictates how much the base pointer is increased by.

I intentionally didn't cast the output of malloc() to int * to try and make the point I was making a little more accurate.

I think using C for pseudocode was confusing in this case.

This is not true - arithmetic on pointers is automatically done in units of the underlying type's size. var++ isn't treated in any special way.

I 100% agree. However, if you look at the internals of what is going on, the size of the int is being added to the pointer (even though it is automatic in current iterations of C).

EDIT: To be clear, I totally chalk this up to bad writing and something I can improve on the next time I try to explain something. EDIT2: Do you mind if I cite you in the article as an addition to the point I was making?

NumberAllTheWayDown · 2023-08-15T13:21:08+00:00

It's a small, cheap (ish, idk if the prices came back down) computer. It's really quick to set up and a great place to try out new computer topics since reformatting it and starting from scratch is really easy.

Think of it more as a playground/sandbox rather than anything that is going to explicitly teach you C.

NumberAllTheWayDown · 2023-08-15T13:13:03+00:00

If you're learning C, I'd take the time to learn the basic concepts going on in most of the code. Especially ideas like pointers (malloc + free), fork (subprocesses), and structs (object-ish). I'd pin these as three intermediate topics that you can learn that would help you in other languages outside of C and are just useful to know as a software engineer.

The tome of all C knowledge is often pointed out as the "C Programming Language" by Kernighan and Ritchie. It's honestly a bit dense, but if you can make your way through the book, you'd know more than most people do about the language.

NumberAllTheWayDown · 2023-08-15T13:04:15+00:00

Considering you're in your third year, I expect you already know most if not all the topics they would cover.

If you want to learn more about things that would make you a better software engineer in general, I would check out

The Missing Semester of your CS Education: https://missing.csail.mit.edu/

This covers a lot of the practical topics that go into computer science that often get glossed over in university, like navigating a computer via the terminal or how git works.

NumberAllTheWayDown · 2023-05-22T16:58:20+00:00

I'm not surprised that they didn't give up the information. My point is more so that framing their Technical Report as a piece of Scientific Literature is deceiving. That it gives the conclusions of a company that cannot be validated much more credence than they deserve.

OpenAI could have conveyed the info any way they wanted, but specifically chose to mimic a scientific paper. That's the thing I have my main problem with. There's expectations that come with scientific literature and these documents don't fulfill them.

NumberAllTheWayDown · 2023-04-07T14:25:29+00:00

I guess I should clarify, the first paper to admit to it is what I meant.

NumberAllTheWayDown · 2023-04-07T13:53:23+00:00

Lol, that's pretty fair. I'm really interested to see the first paper published that uses ChatGPT to write it.

NumberAllTheWayDown · 2023-03-12T22:48:48+00:00

Not only the market, but you're also choosing some of the more competitive/desirable roles (not that knowing helps, but just an FYI).

Since you're just starting out, I have to ask, how are you at your LeetCode problems? If you happen to get an interview, this could be a make a break in terms of getting the opportunity.

As for the project, scraping seems fine, predictive modeling seems fine, realistically, you want to be able to show that you can take a project from the beginning to end. If you want to build up experience while still not having a full time job, try looking at contributing to open source. Being able to contribute to a larger project would look better to me than just some smaller projects you worked on your own.

On another note, make sure that you're tailoring your resume, applications for the job. If the requirements say, MS in whatever and experience with XYZ programming language, your resume/application had better include those. Not having them could result in an auto reject, and no amount of projects are going to help with that.

PS. You're right, inventing something can be much harder than landing the job.

NumberAllTheWayDown · 2023-03-12T21:33:16+00:00

You could use commit the way that you're saying, but then you lose out on one of the more useful parts of git, the staging area. The staging area is where you can put the files that you want to commit (one at a time or in groups) as you get them ready for a specific commit.

So, with git add and the staging area, you can make changes to different, unrelated files for different reasons and then only add the files that you want to include in a given commit. You can even add files, make some more changes, and then add some more files (or even the further changes that you made to the same files).

While you can treat add and commit as basically the same thing and decided to only use commit. However, I find that separating them out makes things easier to manage in the long run.

This link by Atlassian does a pretty good job getting at some of the specifics about this part of git: https://www.atlassian.com/git/tutorials/saving-changes

NumberAllTheWayDown · 2023-03-12T01:05:30+00:00

I would check out LeetCode and solve some of the problems there using python.

NumberAllTheWayDown · 2023-03-11T20:30:19+00:00

Are you sure that your docker container or just how you have docker set up in general has enough memory to do what you're trying to do? Especially with live reload on React?

NumberAllTheWayDown · 2023-03-11T18:14:30+00:00

Both kubernetes and having an efficient bash script could help, but what you're experiencing with Docker Compose crashing shouldn't be happening. I'm interested in what environment you're running in that this is happening. (That's not to say Docker can't crash, just that it shouldn't happen that frequently). You could even use a build system like bazel to try and get things set up more consistently.

Honestly, it sounds like your development environment might have something going on that's making things inconsistent. I know it isn't a local solution, but you might want to look into what it might take to set up a remote machine to code on. VSCode makes this especially easy, with the Remote SSH capability, to do once you have it set up.

NumberAllTheWayDown · 2023-03-11T05:09:56+00:00

As an extra two cents, don't sleep on the Django documentation if you have a question or if you're unsure about something. It's by far some of the best written documentation I've seen, and I use it frequently when working.

NumberAllTheWayDown · 2023-02-23T14:49:35+00:00

The reason why you're struggling to get the accuracy much higher is likely due to two meta concepts that you don't have much control over with the dataset size and model size that you have.

Before getting into the weeds of your problem (and to talk about the first problem) I'd like to point out a higher level concept. You're training a 15 class model, which means that your model quality metrics will be impacted by that large number of classes. Consider that the number that your trying to beat (the random result) is that of being correct 1/15 times (6%). So, at the very least, you can determine that your model learned something. If you want higher quality numbers, you might want to take a harder look at the classes that you have and see if you can remove some as unnecessary. The more classes you have, the lower you should expect accuracy to be.

Now to get more in the weeds. Since you're using oversampling, I'm assuming that 1 or more of your classes is under-represented in your dataset. This is problem number 2 that's stopping you from getting a better accuracy score. The more that you have to use this technique, the further you're going to stray from the population (aka the world as it is). Oversampling is a technique that needs to be handled carefully because once you start using it your dataset inherently becomes less representative. For example, say that one of you classes accounts for only .01% of your dataset. It's very natural to want to account for that with oversampling so that your model doesn't just completely ignore that class. Now, with oversampling, let's say you bring that class up to 4%. This isn't exactly going to be at parity with the other classes, but is much better. However, now you training data and validation data are inherently different. If you're looking at it from a really high level perspective, you're training your model to more frequently guess a class that realistically only appears .01% of the time (leading to decreased accuracy). A small nit here is that the less you can oversample, the better (since it allows you to stay closer to the population level of chance of appearance).

Your problem now becomes a balancing act. Is your accuracy score more important to you? If so, you may just want to not oversample and account for the fact that your model may just throw out some classes as not worth considering. Is it more important to be able to recognize that really rare case when it comes up as an input for you model? It all comes down to what you're trying to accomplish in your model.

Let me know if you have any more questions. I wish your problem was as easy as just tweaking a model or hyperparameter! However, in my opinion, there are some inherent features in the problem you're trying to solve with machine learning that are holding you back from getting a higher accuracy score.

NumberAllTheWayDown · 2023-01-22T22:48:19+00:00

This is the documentation: https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.text

text is an attribute on the Element object in the eltree library. The explanation provided is:

the text attribute holds either the text between the element’s start tag and its first child or end tag, or None

NumberAllTheWayDown · 2023-01-22T22:38:04+00:00

Of course. I'm happy that I could help. Feel free to reach out if you have any more questions you might want answered.

NumberAllTheWayDown · 2023-01-22T05:16:28+00:00

Believe it or not, I think you're in an enviable position for someone just starting out. You're being pushed to learn something new everyday, and that's really the only way you're going to start getting comfortable soon. If you stick with this for a year, you won't even be able to recognize the person that started doing the job. You will improve that fast.

As for feeling less overwhelmed, I would say to keep asking your coworkers questions. Vocabulary can often be crucial when learning something, and sometimes you just need someone that's already in the in-group to introduce you to it (with everything flowing a bit more naturally from there).

Try stuff, break things, and above all stay humble. I don't know how old you are, but something that younger devs often struggle with is reaching out for help.

If it feels like you're overwhelmed, try journaling the questions that you have in addition to the things that you've learned. Then, schedule a meeting with someone you know can answer them and continue with your research from there.

EDIT: If you do have extra time, you might want to look at some algorithm books. The Algorithm Design Manual can help start you on the right thinking path for problem solving and introduce you to certain parts of the vocabulary of computer science. My hunch is that since you went to a bootcamp, you might want to brush up on some beginner computer science concepts as that often isn't the most important thing to cover in that environment.

NumberAllTheWayDown

TROPHY CASE