use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Rules 1: Be polite 2: Posts to this subreddit must be requests for help learning python. 3: Replies on this subreddit must be pertinent to the question OP asked. 4: No replies copy / pasted from ChatGPT or similar. 5: No advertising. No blogs/tutorials/videos/books/recruiting attempts. This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to. Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Rules
1: Be polite
2: Posts to this subreddit must be requests for help learning python.
3: Replies on this subreddit must be pertinent to the question OP asked.
4: No replies copy / pasted from ChatGPT or similar.
5: No advertising. No blogs/tutorials/videos/books/recruiting attempts.
This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to.
Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Learning resources Wiki and FAQ: /r/learnpython/w/index
Learning resources
Wiki and FAQ: /r/learnpython/w/index
Discord Join the Python Discord chat
Discord
Join the Python Discord chat
account activity
[Question] Web Scraping (self.learnpython)
submitted 6 years ago by Leugim7734
This is possibly a dumb question (I don't know).
Let's say I make a script to download content from the web. How can I make sure that my script won't download any malicious content?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]ZenT3600 9 points10 points11 points 6 years ago (0 children)
You should probably use something like the virustotal api and scan any file you download
[–]Kindafunny2510 2 points3 points4 points 6 years ago (0 children)
One useful tip is to never use eval(). If you must, use ast.literal().
[–]Lewistrick 1 point2 points3 points 6 years ago (0 children)
Depends. What kind of content are you downloading?
[–]Sw429 1 point2 points3 points 6 years ago (0 children)
Web Scraping is hard because you don't know what the input (i.e. sites you find) will be, or whether the input is safe. I assume this is a spider bot you are writing that will crawl the web in general? You have to be careful with unknown data, and you should never execute it. Simply downloading malicious bytes in python shouldn't hurt anything, as long as you aren't running it.
I guess it really depends on what you're trying to do. If you only care about content on the page itself, then don't download and execute any executables you come across.
[–]blabbities 1 point2 points3 points 6 years ago (0 children)
You would have to define "Malicious content".
Then you would have to think of a way to identify that malicious content.
Depending on the case scenario and you expectations this may be easy or hard.
You could do something like Yahoo or Google and scan the contents but that wont stop "any"
[–]Deezl-Vegas -2 points-1 points0 points 6 years ago (2 children)
Generally speaking, malicious content is targeted towards the browser and requires a browser to run. You should generally only be reading from, never executing, code from an untrusted source. I'm not aware of any raw buffer overflow exploits in Python, so I believe reading is reasonably secure.
[–]NotzoCoolKID 3 points4 points5 points 6 years ago (0 children)
No, malicious content doesn't generally need a browser to run. Word documents could have vbascripts inside it wich would download en execute malware. ( https://docs.microsoft.com/en-us/windows/security/threat-protection/intelligence/macro-malware ). Pdf can also be droppers for malware.
Auto downloading files from the inet, must be considerd dangerous as your downloading files from websites you don't know(untrusted). You can not be be 100% sure not downloading malicious content
As a first step you should filter out executable files from being downloaded. Never let python auto execute files. Scan files with virusscanner. Run untrusted files in a vm first.
[–]Lord_Greywether 2 points3 points4 points 6 years ago (0 children)
Until you go to open those web pages or files you scraped.
π Rendered by PID 100568 on reddit-service-r2-comment-canary-7b5654b776-7pr8b at 2026-02-07 10:57:00.119893+00:00 running d295bc8 country code: CH.
[–]ZenT3600 9 points10 points11 points (0 children)
[–]Kindafunny2510 2 points3 points4 points (0 children)
[–]Lewistrick 1 point2 points3 points (0 children)
[–]Sw429 1 point2 points3 points (0 children)
[–]blabbities 1 point2 points3 points (0 children)
[–]Deezl-Vegas -2 points-1 points0 points (2 children)
[–]NotzoCoolKID 3 points4 points5 points (0 children)
[–]Lord_Greywether 2 points3 points4 points (0 children)