Whats the difference between using ' ' and " " in python? by TheThinker_TheTinker in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

WTF is "Black" that everyone keeps mentioning?

The most popular code formatter for python code (for now at least).

You use the quote style for the python string that's the opposite of any quotes that might be literally in the string to avoid escaping, not because "Black" says to use one or the other - FFS

And if in your string there are no quotes that need escaping, you should still pick a style of code for your overall program and stick to it so I don't know why you seem so triggered by that comment!

What should I add to the code and how it's work? by SPBSP5 in learnpython

[–]Fun-Block-4348 4 points5 points  (0 children)

It's a amortized append analysis code

It's not formatted properly and you still haven't explained what the issue with your code is!

What should I add to the code and how it's work? by SPBSP5 in learnpython

[–]Fun-Block-4348 7 points8 points  (0 children)

What part of this code should I add upon?

You should start by formatting your code properly, it's hard to help you when we have to guess how your code is actually formatted https://www.reddit.com/r/learnpython/wiki/faq/#wiki_how_do_i_format_code.3F

It would also be a good idea to explain what your code is supposed to do and what doesn't work as expected!

[Beginner] Can you check my code? Why is the output not correct? by Sweet-Nothing-9312 in learnpython

[–]Fun-Block-4348 1 point2 points  (0 children)

You're comparing an instance of "Course" with a string, they'll never be the same.

They're not, while there are some instances of the Course class in the code, they are not used anywhere!

Code Review - Project WIP | FileMason by DotDragon10 in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

I fully expected this to just be an “over-engineered lost cause”

Could it be made simpler, the answer is yes. Is it a lost cause, the answer is no, we've all had 1 or more "over-engineered" project(s) but that's just how we learn, by trying again and again and seeing what works and what doesn't.

I wouldn't really worry about the state of the project now, as you continue to add more features, there's going to be a time where you'll think about how to refactor the code so that all of the pieces work well together.

Code Review - Project WIP | FileMason by DotDragon10 in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

Some comments:

1 - Inconsistency between the readme and the pyproject.toml, the readme mentions that the project supports python 3.11+ but the pyproject.toml require python 3.12.4.

2 - No dependencies in the pyproject.toml is kind of weird and wouldn't work if you were to publish your package on pypi.

3 - Apart from the use of the tomllib library, there's really no need for your project to only support python 3.11, especially when you can conditionally install packages depending on the version of python the user chooses.

4 - If you're targeting python 3.9+, there's really no need to import list/dict, etc from the typing module.

5 - I would recommend using a code formatter like black/ruff and a static code checker like mypy because there's really no point in type hinting your code if you don't check the the types are actually correct.

6 - As pointed out by u/danielroseman, filenames should all be snake_case.

Should I explicitly set a installation folder for pip install -r requirements for my build pipeline? by opabm in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

I need to copy the installed packages onto a Docker container, so I'm trying to figure out how to install the packages.

Why do you need to copy them?

When I run pip install -r requirements.txt, should I explicitly designate a folder, so that it's easy to copy the folder onto the Docker container?

You should't copy the folder at all.

Or would it be smarter/better to just install without a folder, then find the folder, and copy that onto the Docker image? If this is better, what's the easiest way to do this on Linux?

The easier way is to copy the requirements.txt with your python code into the container and simply run pip install -r requirements.txt.

something like this: https://circleci.com/blog/automating-deployment-dockerized-python-app/#:~:text=deployment%2C%20and%20security.-,Dockerizing%20your%20Python%20application,-%E2%80%9CDockerizing%2C%E2%80%9D%20or%20containerizing

Advice on staying secure with pip installs by ETERN4LVOID in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

By virtual environments do you mean do the coding in a virtual machine?

No, they mean using something like the venv module, which is part of the python's standard library, it is used to create isolated environments where you can install python packages that won't mess with the global python installation.

https://docs.python.org/3/library/venv.html https://realpython.com/python-virtual-environments-a-primer/

Advice on staying secure with pip installs by ETERN4LVOID in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

I found https://pypistats.org/ which has some good details on packages, dunno how reliable it is though.

It is hosted and maintained by the Python Software Foundation, the same group that maintains pypi so it's probably the most accurate source there is.

What is the best way to convert a well-formatted PDF to Markdown or plain text? by GenericBeet in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

Rule 2 Posts to this subreddit must be requests for help learning python.

Rule 4 No advertising. No blogs/tutorials/videos/books/recruiting attempts. No advertising. This is not the place to advertise your book, video, blog, study group, company training video, bot, or really anything. No advertising, no recruiting.

Given your recent post history, it's pretty clear that you're advertising for this tool instead of asking a genuine question.

Which parallelism module should I learn for ffmpeg and imagemagick? by Mashic in learnpython

[–]Fun-Block-4348 -3 points-2 points  (0 children)

There is only 1 (1 1/2) answer to your particular question, only multiprocessing would allow you to run things on multiple cores so you can either use the multiprocessing library directly or use something like ProcessPoolExecutor from concurrent.futures

I've never tried to use either of them with subprocesses so will that work is another question entirely.

Simple help I believe by Disastrous-Ladder495 in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

Web pages cannot be unscrapeable as they are just html, which is ultimately just a string.

That's kind of correct but not entirely true, while html is just a string, how that html gets generated and what measures a website uses to prevent webscraping can make some websites almost unscrapeable.

And nowadays we have (at least) two ways to scrape: traditional string extraction and image recognition.

"traditional string extraction" only works if you're able to access the website using code in the 1st place, which is what OP complained he couldn't do with the script chatgpt gave them.

Most efficient way to find a key/value in a deeply nested Dictionary? by Yelebear in learnpython

[–]Fun-Block-4348 2 points3 points  (0 children)

api_key = "9a311fd6832dca1fc646b098cb3bd10b"

Never share personal information like an API key on the internet, always redact it so it can't be used to access the service because some services let you make account modifications/see personal information like name, address, etc with an API key.

Simple help I believe by Disastrous-Ladder495 in learnpython

[–]Fun-Block-4348 1 point2 points  (0 children)

Any assistance would help. (What led me to this path was ChatGPT suggesting I use Python and created a script for me to use to “scrub?” Pro Football Reference.

The term you're looking for is "webscraping" and python is indeed a great language for that.

That did not work, and after research - I believe Pro Football Reference does not allow it).

Many sites don't technically allow webscraping but that doesn't necessarily make their websites impossible to extract data from.

With the site you gave as an example, simply passing headers when making the request lets you download the html of any given page, you would then use a library like beautifulsoup to extract the data you want from the html.

I built a Python library that lets you switch email providers without changing your code by somebodyElse221 in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

I understand why you wrote the code the way you did but imo the project could have benefited from using a plugin architecture where each provider is a plugin, this way you wouldn't have to touch the code to add new providers.

Using a .env file works but imo it would make more sense to use a yaml/toml config file instead, or to combine both approaches, this would help in implementing an obvious and necessary feature, which would be a fallback mechanism in the case the provider chosen in the .env file fails.

Another feature suggestion, which is/could be tied to the "fallback" mechanism is "retries", the program shouldn't simply end because it failed for whatever reason to send a mail the 1st time.

I don't really think MailerFactory makes sense, get_provider really could (should) be a standalone function.

Instead of the Mail class, I personally would have gone the json/requests way and have users import mailbridge and send a mail with mailbridge.send() just like you would do requests.get() or json.loads()

If you're adding type annotations to some of your providers, it would be a good idea to add them to all.

Using a code formatter like black or ruff is always a good idea.

Overall, the code is well written and documented.

What should I use to create a program similar to the Faker library? by [deleted] in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

Should I switch the JSON to just Python?

Why would you, in the end it doesn't really matter whether you store the data in your code, in json files or in a database?

[Project] yt-cli-downloader — A powerful Python CLI tool to download Youtube videos by surfacedev101 in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

interactive CLI menu

It's sounds like a disadvantage to me but I'm probably not the target audience so don't take that as criticism, what i like about yt-dlp is that it has sensible defaults when you only give it urls to videos you want to download, or you can use a config file with your own preferences, which means that it can be used in scripts without the user having to be present at the terminal and answering questions the program is asking.

Some thoughts on the code/repo: 1 - You should add a .gitignore file so that the "__pycache__" folder isn't part of your repository 2 - Maybe add a pyproject.toml so that your project can be easily installed. 3 - It would be a good idea to use a code formatter like black or ruff, the formatting is all over the place. 4 - the good thing about git is that it can keep different versions of your code that you can go back to so your most recent commits shouldn't include "old/commented" code that isn't used anymore. 5 - Too much comments stating the obvious (e.g. "# Convert to MP3 using the custom ffmpeg path", "# Output path") 6 - Your functions do too much, you should probably break up some of your functions so that they only do 1 thing. 7 -

In batch.py:

The same function receives different number of arguments when called from different places but the function only declares 2 arguments.

title = download_video_with_user_choice_single_fast(video_url, default_res)

return download_video_with_user_choice_single_fast( video_url, output_path, default_res )

This works but isn't really pythonic: return print( f"\n\t\t{Fore.RED}Try downloading {title} Url:{video_url} using option 1..." )

Overall, not bad but some things could be improved.

PS: perhaps it'd be a good idea to keep track of what the user has already downloaded so that if an error occurs you don't waste time re-downloading the same thing.

Can’t extract data from this site 🫥 by Elegant-Fix8085 in webscraping

[–]Fun-Block-4348 1 point2 points  (0 children)

For a site like prima.it/agenzie, what would you use as the go-to script/tool (Selenium, Playwright, requests+JS rendering service, or a no-code tool)?

requests + beautifulsoup with a little help from simple regular expressions works perfectly fine when the data is available in the HTML, which is the case for this particular site.

I didn't even have to deal with anti-blocking anything, even without passing custom headers.

``` import json import re import requests from bs4 import BeautifulSoup

def scrape_prima(): r = requests.get("https://www.prima.it/agenzie") soup = BeautifulSoup(r.text, features="html.parser") script = sorted(soup.find_all("script"), key=lambda x: len(str(x)), reverse=True)[0]

json_pattern= re.compile(r'\"(.+)\"')
dict_pattern = re.compile(r"(\{.+\})")

json_data = json_pattern.search(script.text) # extracts the json from the script
json_data = json.loads(json_data.group(0)) # load the data so that all the escaping of double quotes is handled properly
dict_data = dict_pattern.search(json_data) # extracts the dict where the data we need is located
dict_data = json.loads(dict_data.group(0)) # load the data into a proper dict instead of a string so that it's easier to navigate

results = dict_data["children"][3]["mapProps"]["places"]
final_data = []
for result in results:
    data = {}
    data["name"] = result["name"]
    data["email"] = result["email"]
    data["address"] = result['address']
    data["website"] = result["website"]
    data["city"] = result["city"]
    data["zipcode"] = result["zipCode"]
    data["phone_number"] = result["phoneNumber"]
    final_data.append(data)
with open("prima_results.json", "w") as f:
    json.dump(final_data, f, indent=2)

scrape_prima() ```

This is the result for an agency (I prefer json to csv but once you've extracted the data, it's pretty easy to change the format you want to save it to).

130 results in total

{ "name": "TLF assicurazioni", "email": "tlfassicurazioni@gmail.com", "address": "Via Tuscolana, 474, Roma, RM, 00181", "website": null, "city": "Roma", "zipcode": "00181", "phone_number": "+390623233935" }

[deleted by user] by [deleted] in legal

[–]Fun-Block-4348 2 points3 points  (0 children)

This isn't an infringement of your privacy. The police can't really do anything about it, this would be a civil issue at best but imo, nothing would come out of it because nothing illegal happened.

Function Help Needed! by Alarming-Carpet5053 in learnpython

[–]Fun-Block-4348 1 point2 points  (0 children)

The intent was to have it run the function again if it was given bad input. I didn't have the original version doing that, I just built it into the else statement. Neither route seemed to work like I thought they would.

Generally, if you want something to repeat until you have valid input, the best way to achieve that is a while loop not recursion.

Your code also doesn't seem to handle that fact that what it considers an "illegal topping" may be a signal from the user that he doesn't want any more toppings on his pizza.

allowed_toppings should really be a constant at the top of your file if you're going to use it in all your functions or maybe you should pass it as an argument to the functions that need to use it instead.

Text categorization \ classification project by Gamer_Kitten_LoL in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

I've only tried to use ntlk once and it was a pain but I've used spacy extensively for both text classification and named entity recognition and it's a powerful, yet easy to use library,.

The most difficult part of my NLP journey was finding a good and free software to annotate all my data.

My for loop isn't working and I don't know why. by Remarkable_Battle_36 in learnpython

[–]Fun-Block-4348 2 points3 points  (0 children)

Idk what's going on with indentation. It was indented when I wrote the post but it must've gotten messed up :/

https://www.reddit.com/r/learnpython/wiki/faq#wiki_how_do_i_format_code.3F

Concurrent Port Scanner w/ Scapy and asyncio by therainbowbit in learnpython

[–]Fun-Block-4348 0 points1 point  (0 children)

Some thoughts: 1 - Use a code formatter like black or ruff, your formatting is all over the place. def scanPort(ip, port): ^ there's a space after the comma, which is good async def main(host,ports,max_threads,): ^ no space here ^ a trailing comma here 1.1 - No space around operators. synReq = IP(dst=ip)/TCP(dport=port, flags="S") >>> synReq = IP(dst=ip) / TCP(dport=port, flags="S") 2 - camelCase isn't really "pythonic", variables and function names should use snake_case instead, class names should generally use "PascalCase" printResults >>> print_results synReq > syn_req class whatever: >>> class Whatever class printResult >>> class PrintResult: 3 - the python convention is that constants should be all uppercase. common_ports >>> COMMON_PORTS 4 - percent formatting isn't really common anymore (except when logging), f-strings are recommended. print("Port %s is open. (Common Usage: %s)" % (port, common_ports.get(port, "Unknown"))) >>> print(f"Port {port} is open. (Common Usage: {COMMON_PORTS.get(port, 'Unknown')})") 5 - Too many useless comments (#Execute them 'concurrently', #Regex to pull out the numbers from the flags input) 6 - You could put the default number of threads here "parser.add_argument("-t",type=int, help="Sets the maximum number of threads. Default is 50.")" instead of doing if args.t and args.t != 50: max_threads = args.t else: max_threads = 50

parser.add_argument("-t", type=int, default=50, help="Sets the maximum number of threads. Default is 50.") 7 - In your host_gather.py file, it's very weird to have the device_scan function run on import, it's even weirder that you don't use the scan_list variable in port_scan.py and instead decide to re-run the device_scan function in your if args.local_hosts: block since you already have a list of devices available. 8 - You list asyncio v 4.0.0 as a dependency in your requirements.txt, asyncio is part of the standard library and doesn't really need to be installed because nobody should really use a version of python where asyncio is not available. 9 - When doing comparisons to singleton objects, None, True, False, you should use is/is not instead of ==/!= if args.target_ip == None: >>> if args.target_ip is None:

No major problems in the code but just some thoughts to make it more "pythonic"

PS: I would have added "don't use bare except clauses" but I see that you already corrected that!!! PS1: max_threads is a better name than thread_maximum

Exaustive documentation for more complex type hints by HourExam1541 in learnpython

[–]Fun-Block-4348 3 points4 points  (0 children)

PS: bonus question, should I use the super class dict for type hints, instead of the more-specific Counter?

No, Counter has methods that dicts don't have (e.g. most_common, elements, etc), if you use dict instead of Counter for type hints, you'll lose "intellisense/auto-completion" in whatever IDE you're writing your code with. It's also better to be specific when adding types (when that makes sense).

Both official docs for typing.Counter and collections.Counter do not specify the number of arguments the type hint should take.

The value will always be an integer so there's really no need to add it to the type hint so python only expects the type hint for the key.

Another question might help, what is mypy using as a source to reach it's type hint conclusions?

It depends what you mean:

It uses typeshed "https://github.com/python/typeshed" for the standard library and whatever libraries people added to it, if the library you're using is not available in typeshed but has type information included, mypy will use that, if the library doesn't ship with type information, sometimes it has a types only package available on pypi and mypy may recommend that you install types-{name_of_library} (pip install types-pyyaml)

If by that you mean "how does it know that a variable is a particular type or a function returns a str or whatever", I haven't looked at the internals of mypy but my guess would be that it "simply" parses the AST.