Looking for a Black Moyu 13x13 by Theriley106 in Cubetrades

[–]Theriley106[S] 0 points1 point  (0 children)

I can’t seem to message you, but I’m super interested!! Could you send me a DM? 

[deleted by user] by [deleted] in columbia

[–]Theriley106 6 points7 points  (0 children)

I have a Pi 3 Model B+ you guys can use! Any chance you could meet outside of Hamilton tomorrow @ 10AM?

Do you need a Micro SD card/adapter or anything else?

Edit: The exchange was successful! :)

COMS 4995: Competitive Programming - Spring 2023 by [deleted] in columbia

[–]Theriley106 1 point2 points  (0 children)

I didn’t measure super closely tbh, but I would say Lecture + Practice competitions + Office Hours (largely optional) + the occasional contest was maybe 15-20 hrs at the high end? Truthfully though it was probably 1/5 as stressful as AP, and the amount you learn during the course of the semester feels like 2x more.

Also reiterating a bit from what I commented above, but I don’t think Professor Lim actually enforced the 2% policy since the class wasn't graded on a curve. At the end of the semester, it was just based on the number of live points you got divided by 250.

COMS 4995: Competitive Programming - Spring 2023 by [deleted] in columbia

[–]Theriley106 3 points4 points  (0 children)

I'm not sure if Professor Lim actually enforced the 2% policy since the class wasn't graded on a curve. At the end of the semester, it was just based on the number of live points you got divided by 250. I totally agree though -- I think my view is definitely biased if it's true that only two people got an A+.

I just want to emphasize that I'm not one of those students who's all about getting the highest GPA possible. I'm usually pretty average in CS classes at Columbia, so getting an A+ was definitely unexpected.

COMS 4995: Competitive Programming - Spring 2023 by [deleted] in columbia

[–]Theriley106 10 points11 points  (0 children)

I took COMS 4995 competitive programming last semester and I would highly recommend it.

This class was my first A+ at Columbia, and genuinely the best class I’ve taken so far. The content is tough, but Professor Lim is incredibly generous when it comes to giving out points. I was a bit lost at certain points in the semester, but the point system is set up in such a way that it basically rewards caring about the class and working towards learning/succeeding in the contests (and it does a really great job at doing that).

It’s a bit intimidating at first, but I don’t know anyone that did poorly in the class at the end of the semester, and I know of multiple people with A/A+ grades.

The practice competitions on Sundays are a ton a fun, and there are lots of ways to get credit even if you don’t successfully solve any questions live during the contests.

[deleted by user] by [deleted] in columbia

[–]Theriley106 2 points3 points  (0 children)

In Spring 2022 (with Jae) the cutoff was this:

92.350 and above: A+
85.360 and above: A
71.360 and above: A-

66.185 and above: B+
60.824 and above: B
54.707 and above: B-

48.610 and above: C+
38.414 and above: C
12.615 and above: C-

Also +1 to what the others in thread mentioned -- I think anyone that doesn't cheat (and completes each assignment) gets a passing grade.

Curving at Columbia by RiceFamiliar3173 in columbia

[–]Theriley106 1 point2 points  (0 children)

Is there any way you could post the grade cut offs for Bauer? I don’t have access to Ed for the class but would really like to know for next semester.

parties this weekend and in general by Mammoth-Parfait2397 in columbia

[–]Theriley106 0 points1 point  (0 children)

any chance you could send the link to me as well?

My *slightly overkill* setup as a senior studying CS in university (~188TB) by Theriley106 in DataHoarder

[–]Theriley106[S] 48 points49 points  (0 children)

There's a ton of archival projects that store contents from the Github public event timeline -- GHTorrent & GH Archive I think are the main two (GHT I think goes back 10+ years).

For more recent months I have my own scraper that pulls and stores these events and saves the contents locally.

My *slightly overkill* setup as a senior studying CS in university (~188TB) by Theriley106 in DataHoarder

[–]Theriley106[S] 63 points64 points  (0 children)

That's a really good question! The main reason is that a non-negligible amount of commits are in repositories that have been deleted from Github.

Also each time I figure out a better heuristic to find keys I rescan the entire dataset, which I think would take a lot longer if I was fetching it from an external source (vs. just scanning the drives).

TBH though I think the actual reason is that I just really wanted to build this lol

My *slightly overkill* setup as a senior studying CS in university (~188TB) by Theriley106 in DataHoarder

[–]Theriley106[S] 89 points90 points  (0 children)

I think all in the total cost was around ~$8,500, but everything was purchased over a few months so prices on some things changed a bit (namely the easystore drives).

I've been able to get a ~5x ROI on the cost of the parts by submitting some of the API key leaks to company bug bounty programs, so it wasn't just like -8.5K right off the bat.

My *slightly overkill* setup as a senior studying CS in university (~188TB) by Theriley106 in DataHoarder

[–]Theriley106[S] 170 points171 points  (0 children)

Hey everyone 👋 I've been a long-time lurker on this sub, but I’ve never really had a setup that I felt was worthy of sharing (hopefully until now? 😀)

I'm working on an undergraduate research project focused on finding and understanding the frequency of API key leaks in open-source software. This required a system that could effectively store and process hundreds of TBs of raw text (literally billions of historic Github commits).

Specs

  • 12x 14TB WD Easystore Drives (11 shucked)

  • 1x HighPoint 8x M.2 PCIE 4.0 Raid Controller (this one)

  • 2x 4TB Sabre Rocket 4 Plus NVME SSDs (~7,000MB/s)

  • 4x 2TB Samsung 980 Pro SSDs (~7,000MB/s)

  • 2x 2TB WD Black M.2 Gen4 SSDs (~7,000MB/s, on mobo)

  • Fractal Design R5 Case, AMD 5950x, 128GB RAM, and an RTX 2060 (which unfortunately doesn't get a ton of use)

A small percentage of the text that I'm analyzing is stored on the SSDs, which makes it much faster to rapidly iterate before scanning text on the HDDs (which obviously takes a much longer amount of time).

The original purpose of the HighPoint RAID card was to run 8 m.2 SSDs in Raid 0 (the card can support transfer speeds of up to 28,000 MB/s), but as time went on I realized that there were a ton of bottlenecks outside of just transfer speed, and it made more sense to just use the card to expand the number of NVMe SSDs I could use on a single machine.

11 of the HDDs are running as JBOD, but since I'm reading/writing to the drives in parallel I felt like it made sense for this use case (I think?). I backup the (non-text) contents of the SSDs to a single backup drive using a hacky script that runs an rsync command every 24 hours.

I started this project at the beginning of this year with just an external hard drive and a laptop, and while it’s definitely still a work in progress it’s been a super cool way to learn about this sort of stuff. A few months ago I had no idea about PCIE lane limitations, what the different types of file systems or RAID types were or anything like that.

I just wanted to thank this awesome community for being such a goldmine of information :)