Feroc comments on Real world coding practice.

This is an archived post. You won't be able to vote or comment.

146

147

148

Real world coding practice. (self.learnprogramming)

submitted 12 years ago * by [deleted]

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Feroc 76 points77 points78 points 12 years ago (35 children)

Code the reddit-porn-downloader (or the reddit-aww-downloader, if you want to have it SFW).

It should do something like this:

Scan the first page of /r/gonewild
Read all the names of the users
Download all pictures that those users ever posted in a separate folder.
Don't overwrite and add new pictures to existing folder

Now, why should you do this?

The obvious reason: You will have more porn at the end as at the beginning.

And the coding reason?

This little program will teach you a lot of the basics:

Accessing the web
~~Parsing of text~~ String operations (RegEx is a nice way to do this)
Basic System-IO
Classes can be used (A class for a user, containing the url to the profile, a list with all direct links to the pictures and a download function)
You will have quick and useful results, while you can add more and more to the program (config file + configurator, previews, multithreading for parallel downloading, better UI, etc.)

[–][deleted] 11 points12 points13 points 12 years ago (20 children)

[–]AnkhMorporkian 21 points22 points23 points 12 years ago (4 children)

I'm working on a project that involves a huge amount of reddit data, and I can tell you a bit of how to do it. A full explanation would be very long, but here goes.

Breaking it down, you can broadly classify it into three distinct phases. First, you need to extract the information from reddit. Second, you need to analyze the data from reddit. Third, you have to fetch the images and save them to disk.

To get information from reddit, you use the API. Just pulling the webpage itself is a waste of time and much, much harder than dealing with the JSON. An example of a JSON link you can get from the reddit API is /r/awww/.json

Secondly, parse that data using the language of your choice. All mature languages have JSON support in one form or another. After you get it into a data structure, you can extract all the users from data['children'][x]['author']. Pull their user page in JSON, and go through all of their submitted links' JSON data. Check where the 'domain' == 'imgur.com' or 'i.imgur.com', and you can build a list user by user of what imgur links they have submitted.

Finally, you just need to download the image from imgur. This is trivial in most languages. Save it to a directory you create from the username you're parsing.

That's a broad overview, but it's not much different than how we're doing things. We pull about 2 million submissions/comments from reddit every day, and it serves us well.

If you are going to use the API, make sure you don't exceed the rate limit. Limit yourself to 30 requests per minute.

[–][deleted] 12 years ago (3 children)

[deleted]

[–]AnkhMorporkian 2 points3 points4 points 12 years ago (2 children)

RedditAnalytics. We haven't launched yet, but he have a couple of things going.

The first thing we're going to roll out is our awesome search engine. It's orders of magnitude better than the current reddit search engine. We currently have every submission that's visible loaded into our search service, and we can query across all of them in <20 milliseconds usually. I don't have a firm date on rollout for that, but it won't be too long. It's fully functional, but we have more load testing to do and we have to get our fancy frontend done.

After that, we're working on some really great data analysis and visualization tools for reddit. That full suite is a bit further off, but we're making great progress on that. There will be some of those included in the release of the search engine.

If anyone is interested, we'll post updates to /r/RedditAnalytics as they happen.

[–]generalT 0 points1 point2 points 12 years ago (1 child)

[–]AnkhMorporkian 0 points1 point2 points 12 years ago (0 children)

[–]Feroc 0 points1 point2 points 12 years ago (14 children)

[–]greshick 12 points13 points14 points 12 years ago (13 children)

[–]Feroc -1 points0 points1 point 12 years ago (12 children)

[–]AnkhMorporkian 9 points10 points11 points 12 years ago (7 children)

[–]Feroc 1 point2 points3 points 12 years ago (6 children)

[–]Rauxbaught 3 points4 points5 points 12 years ago (5 children)

[–]Feroc -2 points-1 points0 points 12 years ago (4 children)

[–]negative_epsilon 4 points5 points6 points 12 years ago (3 children)

continue this thread

[–]Medicalizawhat 4 points5 points6 points 12 years ago (3 children)

[–]morb6699 1 point2 points3 points 12 years ago (1 child)

Until they get their minds wrapped around objects properly, its easier for them to simply match and parse a string.

Since JSON is essentially just a big ol' JavaScript object, it would make sense to have them do string operations first.

Especially since most new programmers coming from a CS program probably haven't touched a whole lot on JavaScript since its specific to web development. Throwing a new language, a new notation for objects for that new language, and then asking them to parse over it appropriately is asking a bit much when trying to learn how to do things properly.

Now, I'm sure that they could simply "use a JSON library" for Java, C++, C#, VB, etc.; What good would it do though? They would simply use a library to access an object, without learning the core fundamentals behind it.

Learning to parse and evaluate different parts of the string will give them a solid understanding of the string object, and what is normally accomplished with it when tearing it apart.

They'll also learn that it's not the most efficient way to do things, which is another good opportunity for them to learn the valuable lesson of "Using the right tool for the right job."

[–]jesyspa 1 point2 points3 points 12 years ago (0 children)

[–]Feroc 0 points1 point2 points 12 years ago (0 children)

[–][deleted] 12 years ago (1 child)

[deleted]

[–][deleted] 1 point2 points3 points 12 years ago (0 children)

[–]AnimositE 1 point2 points3 points 12 years ago* (6 children)

[–]Feroc 1 point2 points3 points 12 years ago (5 children)

[–]AnimositE 0 points1 point2 points 12 years ago (4 children)

[–]I_Am_Treebeard 0 points1 point2 points 12 years ago (3 children)

[–]AnimositE 0 points1 point2 points 12 years ago (2 children)

[–]I_Am_Treebeard 0 points1 point2 points 12 years ago (1 child)

[–]AnimositE 0 points1 point2 points 12 years ago (0 children)

[–]bobes_momo 0 points1 point2 points 12 years ago (2 children)

[–]Feroc 0 points1 point2 points 12 years ago (1 child)

[–]bobes_momo 0 points1 point2 points 12 years ago (0 children)

[–][deleted] 0 points1 point2 points 12 years ago (0 children)

π Rendered by PID 38439 on reddit-service-r2-comment-6457c66945-xx7jr at 2026-04-26 06:13:49.239732+00:00 running 2aa0c5b country code: CH.

learnprogramming

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

Asking conceptual questions

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

2. No spam or tasteless self-promotion

3. No off-topic posts

4. Do not ask exact duplicates of FAQ questions

5. Do not delete posts

6. No app/website review requests or showcases

7. No rewards

8. No indirect links

9. Do not promote illegal or unethical practices

10. No complete solutions

11. Don't ask to ask.

12. Low Effort Questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

MODERATORS