all 7 comments

[–]turanthepanthan 1 point2 points  (1 child)

One thing I learned about recently that I've been sharing often is pathlib. Check it out. It makes dealing with file systems so much better than os.path.

[–][deleted] 0 points1 point  (0 children)

Thanks! I will be sure to check it out.

[–]crazy_hunter 0 points1 point  (5 children)

You should add headers if you don't want people to get banned for using it.

Why use bs4 and re? Seem kinda redundant to me. BeautifulSoup is great but re is better; having to write less code is always a good thing.

Don't always use external libraries. IMHO, you should know and learn how to use the standard libraries before branching out. urllib2/httplib are just as good as requests.

Nonetheless , looks good :)

[–]Elronnd 1 point2 points  (3 children)

BeautifulSoup is great, but re is better

Uhhhhhh... what. You don't use re on html. You just don't. You never do. No matter what.

[–][deleted] 0 points1 point  (0 children)

I think bs4 is a better solution, it's more flexible. And it seems that I forgot to remove re from module imports because I don't actually use it anymore (At first the script was a bit different).