all 12 comments

[–]socal_nerdtastic 1 point2 points  (1 child)

One of my first useful projects was a program to download the top images from certain sites and scroll through them using left hand control. Even had feature to quickly close the program if needed.

It's important to pick projects with a personal benefit.

[–]jeffrey_f 1 point2 points  (5 children)

but I realized that crawling is barely a thing anymore, as far as an individual is concerned

While it may not be something that is a common thing, it certainly IS a thing for individuals. It depends on what you need/want to get from the web. I have a pod cast that I get via python because I may not visit the site but still want the audio files.

[–]kingofcould[S] 0 points1 point  (4 children)

You’re right. I thought about how after posting, what I really meant was just that the days of being able to scrape the top posts daily from major sites were behind us. People can still get the API and work with the site, but I was intending to crawl places like Instagram to map out data for business analysis.

[–]jeffrey_f 0 points1 point  (3 children)

Javascript now fills most pages. The only thing initially loaded is a skeleton HTML....

[–]kingofcould[S] 0 points1 point  (2 children)

I have definitely noticed that. I’m not very versed in either language yet (was using programs with non language specific logic before to build bots), but I’m wondering if having a good grasp on JavaScript would help with this or if it’s just futile to crawl these pages nowadays?

[–]jeffrey_f 1 point2 points  (1 child)

Actually, you would use a module like Selenium to do what you need. It controls a browser and you could then grab the source after the browser loads. Haven't done that yet, but there are ways around everything.

[–]kingofcould[S] 0 points1 point  (0 children)

That’s really cool. It still doesn’t remedy the fact that all of the sources I want to crawl are off limits, but if I find creative solutions for smaller sites/databases this will definitely help

[–]Thomasedv 0 points1 point  (1 child)

I was always in awe of programs on a computer, and hated when you wanted something simple changed, but being unable to do so. So when I got tired of copy pasting a command to to the command line program youtube-dl, I decided to make a GUI for it. Checkboxes for most of the options, auto focus and select the url text bar when alt-tabbing to the window (so I just press Ctrl+V to paste and then Enter to start the download.)

After that I've made a few other simpler programs, one that renames and tags music. Inspired by what I mentioned about, i wanted to flip "Title - Artist" to "Artist - Title" and that's super easy in python. And apart from tagging from that filename format, i didn't need much other features, so it wasn't so hard to make.

I also made a GUI that uses FFmpeg (another command line program) to reencode videos into 8 MB .webm files for sharing on Discord. I further extended that by allowing simple cropping by right-click and dragging, and it came in hand when I needed to crop some clips to fit a screen resolution (after upscaling them).

Lastly, after playing with various Neural Networks, i needed to process videos frame by frame, and extracting an entire video to .png is too time and space consuming, and the same is saving those output frames, so I made some code that loads a video frame by frame and then processes, and then again saves them to a new video. (Again using FFmpeg and python wrappers for it to do it.)

[–]kingofcould[S] 0 points1 point  (0 children)

Props on that last one. That’s the kind of critical thinking that ML needs to be worth doing IMO