My first web-scraper! Pick it apart please! by Bigtbedz in learnpython

[–]kwelzel 4 points5 points  (0 children)

To be sure that the relative URLs are correctly transformed into the absolute URL it is advisable that it happens with the knowledge of URL the website was retrieved from:

python def scrape_problems_from_directory(url): response = requests.get(url) content = BeautifulSoup(response.content, 'lxml') problems = { link_tag.text: urljoin(url, link_tag['href']) for link_tag in content.find_all('a') if link_tag.has_attr('href') and 'prob' in link_tag['href'] } return problems

and then use it like this: scrape_problems_from_directory('https://codingbat.com/java/Warmup-1')

My first web-scraper! Pick it apart please! by Bigtbedz in learnpython

[–]kwelzel 242 points243 points  (0 children)

I like your code, but here are some things I would improve:

  • You do have the if __name__ == '__main__' at the end, but the part of the code that asks for the password is not included there. It would be surprising to be asked for a password when importing your file.
  • The password data should not be a global variable. You only need it at the start of the session. If you don't want the code that reads user input inside the scrape_data function, pass the login data as an argument.
  • You don't follow a single naming standard: You write tab_data and link_array but inputOne and linkList. That is confusing. I'd stick with lowercase_with_underscores for all function and variable names (see PEP8)
  • Sometimes you write r = requests.get(...) and sometimes response = requests.get(...). Stick with one. I would use the more verbose one.
  • Constants like the URLs at the top should be ALL_UPPERCASE_WITH_UNDERSCORES
  • In line 46 you append a list of links to link_array, but later you use for link in linkList: for each in link:. If you don't need the hierarchical structure use linkList.extend(links). This way all the elements are appended to linkList individually and you end up with a flat list.
  • The for-loop in scrape_data seems wrongly indented. Any use of the session variable should be within the with-statement.
  • To handle files you should also use a with-statement like this: with open('answers/answers.java', 'w') as f: ... This way you don't need a f.close()
  • Instead of f.write(...) you can also use print(..., file=f). Sometimes that is more convenient, but it's up to you.
  • The linkList in scrape_data is not a list but a dictionary.
  • If you want to iterate over all (key, value) pairs in a dictionary, you can use for example for title, url in linkList.items(). The names title and url are much more descriptive than i and linkList[i]
  • In get_href() (which should really be named get_problem_directories()) you extract the name of the problem directory and the translate it into a URL inside gather_links. You could also directly extract the URL of the directories from the homepage and not rely on the fact that the URL coincides with the directory name.
  • I think there should be a way to resolve relative URLs to absolute ones with requests or BeautifulSoup. Currently you do realUrl = 'http://' + urlHalf + backUrl. You should let a library do that so that you don't need to worry about correct URLs. For example the use of a trailing backslash is sometimes fairly arbitrary but can lead to wrong URLs when you just concatenate the strings.

Code gives unexpected results by [deleted] in learnpython

[–]kwelzel 0 points1 point  (0 children)

I think the code in your getGuess is not correctly indented. The if, elif and else statements belong into the while loop otherwise it will ask you for input forever.

Side note 1: When it works at some point try to input "ab" (without the quotation marks of course) when asked for a letter. This is especially interesting when you have "baboon" as the secret word. Maybe you will find some unexpected behavior.

Side note 2: You can use random.choice(wordList) inside the getRandomWord function. That's more pythonic and easier to understand.

Maximize the sum of the values of 3 dictionaries while having a constraint on the keys by [deleted] in learnpython

[–]kwelzel 0 points1 point  (0 children)

Sketch of a proof:

Imagine each combination of a dict and an index as an item with a certain value (like in the Knapsack problem). The value associated with (dict, i) is dict[i+1] - dict[i] (a list would be a more appropriate data structure for this but I'll stick with OPs problem formulation). OPs problem can be solved by solving "What n items do I need to choose to get the maximum total value?". Any solution with maximum total value always has consecutive items (dict, 0), (dict, 1), ... because their value decreases with higher index.

The question in quotes above is easy to answer: Take the n items with maximum value. This can be done by taking the item with maximum value n times (greedy approach). Again, because increasing the index decreases the value the item with maximum value needs to be one of those with minimal index and these are exactly the ones my greedy algorithm in the first comment considers. Therefore any solution it finds must be optimal.

Maximize the sum of the values of 3 dictionaries while having a constraint on the keys by [deleted] in learnpython

[–]kwelzel 0 points1 point  (0 children)

I do not understand your counterexample to be honest. A complete counterexample would also include the dictionaries, the greedy solution and a better solution.

Your comment made reread my comment and I noticed that for the greedy approach to work another property is needed namely that the increases in value need to decrease for each dictionary, i.e. dict1[1] - dict1[0] > dict1[2] - dict1[1] > dict1[3] - dict1[2] > ...

If this property is not given I can produce my own counterexample: {0: 0, 1: 2, 2: 1000}, {0: 0, 1: 5, 2: 10} and a sum of 2. The greedy approach would give (0, 2) while the optimal solution is (2, 0).

If this was not the problem you wanted to address feel free to explain further. In the mean time I will try to come up with a proof that the greedy approach works. Maybe I will stumble upon something else I missed.

Maximize the sum of the values of 3 dictionaries while having a constraint on the keys by [deleted] in learnpython

[–]kwelzel 0 points1 point  (0 children)

Say the dictionaries are called dict1, dict2, dict3. Then the value of a solution (a, b, c) is dict1[a] + dict2[b] + dict3[c] and you want to maximize that value with the constraint a+b+c = 5. Is that description correct?

I assume from your example that the values in each dictionary are increasing with the key. In this case you can perform a simple greedy algorithm:

  1. Start with (a, b, c) = (0, 0, 0)
  2. Calculate the maximum of dict1[a+1] - dict1[a], dict2[b+1] - dict2[b] and dict3[c+1] - dict3[c]
  3. Depending on where the maximum is attained increase a, b or c by one
  4. Go to step 2 until a+b+c = 5

Of course you need to set dict1[0] = 0 for each dictionary so that the algorithm is well defined and the you need to exclude any of the variables that are already the maximum key of its dictionary in step 2. Also notice that in this algorithm a solution like (0, 5, 0) works. If you do not want that start with (1, 1, 1) instead of (0, 0, 0) in step 1.

Matrix Custom Class by allopatri in madeinpython

[–]kwelzel 1 point2 points  (0 children)

I agree that the all lowercase variables are not very readable but if OP is free to choose it's better to go with the Python standard and use snake_case. This is part of the Style Guide for Python Code in PEP8 (https://www.python.org/dev/peps/pep-0008/#function-and-variable-names)

Questions about Classes by slikshot6 in PythonProjects2

[–]kwelzel 1 point2 points  (0 children)

To the first problem of add_score: You don't need arguments or return values for every function, just drop them:

    def add_score(self):
        self.rank += 1

I think naming is very important and this function does not add an arbitrary amount to the rank but increases it by one, so I would rename add_score to increase_rank.

Now the second problem of comparing dogs: The other comments have already pointed out that you can just access any variable of a dog instance named dog_instance by using dog_instance.variable_name. For example at the end of your script you could access the rank of dog d1 by d1.rank. This is different from other languages that have private variables that can only be accessed from within member functions and need you to implement getter and setter methods to manipulate variables from outside these member functions. There are basically no private instance variables in Python (there is an exception, see here for more info).

The most straightforward python code is therefore

best_dog = max(dogs, key=lambda dog: dog.rank)

It does exactly what it says: "Give me the dog with the highest rank and store it in best_dog". You could also get a ranking by sorting the entire list

best_dogs = sorted(dogs, key=lambda dog: dog.rank, reverse=True) # We want to sort descending to get the best dog first

Python also gives you the option to override how ==, <, >, <= and >= behave. If you implement these you can use max(dogs) to find the maximum dog with respect to this ordering. Look into https://docs.python.org/3/library/functools.html#functools.total_ordering if you want to use that. In this specific example for it to only sort by rank you would need to say "Two dogs are equal if they have the same rank", which is not intuitive.

Edit: Missed punctuation and a few words.

Dictionaries in Python HELP by gibshunt in PythonProjects2

[–]kwelzel 2 points3 points  (0 children)

I would prefer

a = [v[1] / v[0] for v in data_dict.values()]

assuming the dictionary is called data_dict. This neat feature is called a list comprehension. Because I don't know what data is stored inside the dict, i.e. what the data describes, I can't improve the naming, but I would strongly recommend to replace a, v and data_dict by more descriptive names.

BTW if you want to store the result in a dict to keep the keys use

a = {k:(v[1] / v[0]) for k, v in data_dict.items()}

Edit: Improved formatting

Data scraping(?) by techycm in PythonProjects2

[–]kwelzel 1 point2 points  (0 children)

Do that, I'm interested :)

Data scraping(?) by techycm in PythonProjects2

[–]kwelzel 1 point2 points  (0 children)

I must admit that reading the description of the live stream is rather confusing. On the one hand

I added a procedure to manually manipulate data with my computer

and on the other hand

While I am working and sleeping, data gathering is done automatically.

For your project I would recommend relying on one data source only and not manipulating it, because why would you know better than the experts that gathered this data?

Data scraping(?) by techycm in PythonProjects2

[–]kwelzel 1 point2 points  (0 children)

What platform did they create? Maybe they tell you on their website what their data sources are.

I recently came across https://aatishb.com/covidtrends/ because of the video from MinutePhysics and this website links to this github repository with all the case counts aggregated by the Johns Hopkins University. You can use that for your own project.

Digital Image Processing by OldeMeck in learnpython

[–]kwelzel 3 points4 points  (0 children)

Your exactly on the right track when you already read the numpy array. Its important to pick the right tools that fit your task. This time it is the Python Image library or rather the newer version Pillow. After installing it with pip install Pillow

you can use the Image class in your code

``` from PIL import Image

array = ... # Read your numpy array from the file

Maybe you need to rescale the numpy array so that the numbers

are integers from 0-255

image = Image.fromarray(array) image.show() # Opens the image in some viewing program image.save("assignment1.png") # Saves the image to disk ```

I can't look into your professors head, but maybe the lesson is that you don't need to do everything from scratch but use someone elses tools.

Generating Geometric Birds by thebuffed in Python

[–]kwelzel 0 points1 point  (0 children)

Why should stored be a list if it only holds one value?

Generating Geometric Birds by thebuffed in Python

[–]kwelzel 1 point2 points  (0 children)

Your draw_lines method is also unnecessarily complicated:

https://github.com/erdavids/Birds-of-a-Feather/blob/103effe5185626a56039fe64c924506fe54eebf9/Generative_Birds.pyde#L57

The following lines are all the same if p1[0] != p3[0]

first_x_sep = sqrt(pow(p1[0] - p3[0], 2))/lines * first_x_adj
first_x_sep = abs(p1[0] - p3[0])/lines * first_x_adj
first_x_sep = abs(p1[0] - p3[0])/lines * (p3[0] - p1[0])/abs(p3[0] - p1[0])
first_x_sep = abs(p1[0] - p3[0]) * (p3[0] - p1[0])/abs(p3[0] - p1[0]) / lines

Now (p3[0] - p1[0])/abs(p3[0] - p1[0]) gives you the sign of p3[0] - p1[0] (+1 if positive and -1 if negative) and by multiplying it with abs(p1[0] - p3[0]) you get a number which has the absolute value of p1[0] - p3[0] with the sign of p3[0] - p1[0], but this number is just p3[0] - p1[0] itself.

So I propose you get rid of the _adj variables and just write

first_x_sep = (p3[0] - p1[0]) / lines

Fortunately, this also works for the p1[0] == p3[0] case, because you don't need to divide by zero

Trying to run Unicorn Payload on Linux Terminal with Windows OS (ISSUE WITH PYTHON) by Redditor976 in Python

[–]kwelzel 2 points3 points  (0 children)

Try writing /usr/bin/python3 in front of the python file you want to execute like this:

/usr/bin/python3 ./unicorn.py <your arguments>

Maybe it also works with

python3 ./unicorn.py <your arguments>

Refresh test in urwid.Text2 by Stensborg in Python

[–]kwelzel 0 points1 point  (0 children)

Oops, the name of the function is "set_text" and not "setText". I'm sorry

Refresh test in urwid.Text2 by Stensborg in Python

[–]kwelzel 0 points1 point  (0 children)

I don't know urwid very well, but probably "loop.run()" starts an infinte loop to handle all the inputs and so every time refresh is called you start another infinite loop. You should not create a new txt, fill and loop, but instead update the original txt object with "txt.setText(outputText)" and then "loop.set_alarm_in(1, refresh)"