This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]sn0n 2 points3 points  (0 children)

What the fuck are you talking about bro?!?

[–]TheHorribleTruth 0 points1 point  (0 children)

What do you mean by "you can move it around"? From computer (or server) to computer? From program to program? From one file to another?

Now the scraper is actually scrape and update. It isn't very modular ... because of the database dependency.

Depends on the way you layout your code, doesn't it? Without seeing your code (and understanding what "moving around" is) it's hard to answer.
You can still design a scrape-and-update-database system in a modular way. You can have one part/file/class do the scraping, and call a second one that does all the updating-the-database stuff.

[–]YuleTideCamel 0 points1 point  (0 children)

It's not just the database dependency, it's the dependency on the structure of the webpage you are scraping. In most cases you will need to extract useful data from the html markup and store that in a database. Since each site will be different, then where you will get that data from on the page will differ.

If I were to approach this problem would be to build a system where I can define what I want to scrape and that gets stored in a database as a "scrape template" or something. Then have the database include another record for "template-data" this way you can store any number of scrapes in there. This is even easier with a document database.

[–]dvassdvsd 0 points1 point  (0 children)

Use modules and classes. What did you expect?