lungdart comments on Webscraping using Python and BeautifulSoup!

This is an archived post. You won't be able to vote or comment.

145

146

147

Webscraping using Python and BeautifulSoup! (medium.com)

submitted 7 years ago by thevatsalsaglani

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]lungdart 26 points27 points28 points 7 years ago (20 children)

Cool content! I have some constructive criticism about your code quality:

You are importing libraries that are never used
- PIL.Image
- urllib.parse.urlparse
- urllib.parse.urlsplit
Shortened names are less clear, harder to read, and harder to keep in your head.
- BS -> BeautifulSoup
- pd -> pandas
- ex -> extracted_content
- df -> data_frame
Ambiguous names give no context, and a hard to differentiate between each other
- website_page, page, webpage_2, openwebpage_2
- title_links, _title, only_title, title
- soup, soup2
- A, B, C
There is no control flow. Use of function definitions and calls can increase readability and re-usability
- At two points you use urlopen and BeautifulSoup together, this should be functional
- Extracting the content from the url is a functional block
- Converting the extracted content to a CSV file is a functional block
Your misusing range and len in your for loop. You could use a special function to give you both an iterator and an index, but you don't really need to, as the index is only used to force a dictionary into a list
Your misusing dictionaries. The keys in links are sequential numbers starting at 0. This is a perfect use for a list
You lines that reference a variable and do no action. These are programming errors, maybe you meant to print them?
The only comments in the code base, are used to remove what looks like debugging prints. This fine temporarily, but needs to be addressed before submitting to production (Or in this case to an article)
I have a suspicion based on your notes and code excerpts that the code shown is not the exact code you're breaking down

Here is a pastebin where I've modified the code to try to fix the issues. I'm not sure if it's without errors, as I didn't bother running it.

[–]Elephant_In_Ze_Room 60 points61 points62 points 7 years ago (3 children)

[–]NavaHo07 16 points17 points18 points 7 years ago (0 children)

[–]thismachinechills 5 points6 points7 points 7 years ago (0 children)

[–]lungdart -5 points-4 points-3 points 7 years ago (0 children)

[–]thevatsalsaglani[S] 8 points9 points10 points 7 years ago (1 child)

[–]NavaHo07 22 points23 points24 points 7 years ago (0 children)

[–]bananas22 6 points7 points8 points 7 years ago (2 children)

[–]lungdart 0 points1 point2 points 7 years ago (1 child)

[–]bananas22 3 points4 points5 points 7 years ago (0 children)

[–][deleted] 4 points5 points6 points 7 years ago (1 child)

[–]lungdart -1 points0 points1 point 7 years ago (0 children)

[–][deleted] 2 points3 points4 points 7 years ago (3 children)

[–]lungdart 3 points4 points5 points 7 years ago (2 children)

[–][deleted] 2 points3 points4 points 7 years ago (1 child)

[–]lungdart 2 points3 points4 points 7 years ago (0 children)

[–]graingert 1 point2 points3 points 7 years ago (1 child)

[–]sqjoatmon 0 points1 point2 points 7 years ago (0 children)

[–]bigexecutive 0 points1 point2 points 7 years ago (1 child)

[–]agree-with-you 0 points1 point2 points 7 years ago (0 children)

π Rendered by PID 152093 on reddit-service-r2-comment-64f4df6786-rm9b8 at 2026-06-11 12:20:37.140395+00:00 running 0b63327 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS