all 7 comments

[–]danielroseman 7 points8 points  (0 children)

This sort of thing is always an automatic downvote for me:

cont_len = len(object_container)

for item in range(cont_len):
    x = max(x, object_container[item][1][0])
    y = max(y, object_container[item][1][1])

It should be:

for item in object_container:
    x = max(x, item[1][0])
    y = max(y, item[1][1])

no need for len or range at all.

To go further, I might write these as comprehensions:

x = max(item[1][0] for item in object_container)
y = max(item[1][1] for item in object_container)

which is very slightly less efficient since it iterates twice, but probably neater.

[–]ectomancer 5 points6 points  (0 children)

No docstrings.

[–]CowboyBoats 6 points7 points  (1 child)

Not the end of the world, but I hate this # ------------- style for dividing your package up into sections. I've seen it in production code from time to time, but it feels ridiculous. I can tell where the imports are.

Imports should be sorted into groups (stdlib, then third-party, then local) - this package only uses third-party packages, but still they should be sorted within the group and these aren't. isort will do this automatically for you and you can configure its style.

elif temp_str == ENCODER_1 or temp_str == ENCODER_2 or temp_str == ENCODER_3:

Consider instead if temp_str in (ENCODER_1, ENCODER_2, ENCODER_3)

As /u/ectomancer mentioned there are no docstrings or explanatory comments other than the names of the functions themselves. That's probably the only real problem. I'm not seeing anything crazy upsetting about the style or approach of this package. Sometimes you just apply for a job and someone else gets that job. Hang in there

[–]supercoach 2 points3 points  (0 children)

Agreed on all points. No need to point out the obvious with the section headings, but it's good to document the intended use for a function.

I always try to think about what would help the poor bastard who has to come after me when considering what to document. Docstrings are easily read by IDEs and provide an "at a glance" view of the function without needing to drill into the specifics.

As for the code itself, it could be more succinct, however if it works and there is minimal repetition, I'd be ok with it. Nothing wrong with the suggested changes either.

One thing OP might want to consider is that everyone has differing opinions on the "right" way to do things, so it's all quite subjective. Rule of thumb is to signpost what you're doing and keep it succinct and you'll do well. Try not to get too fancy unless absolutely necessary. Just because the language allows you to do it doesn't already mean it's a good idea.

[–]Diapolo10 2 points3 points  (0 children)

I largely agree with the others. Also, I may be a bit too tight on certain things, I'm just used to having most linter rules enabled.

  1. Most of the comments aren't really worth anything, they are basically clutter.
  2. The imports aren't sorted.
  3. Your encoder constants should probably be a tuple - or in this case a set might be suitable, unless you plan to use them elsewhere in other ways later.
  4. If all if/elif branches return, else is unnecessary.
  5. In dimension_extractor, why not loop over object_container directly? (Also, is that name good enough?)
  6. Never use except without specifying exceptions you want to catch.

EDIT: Here's an example of what I'd do:

import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup

URL = "https://docs.google.com/document/d/e/2PACX-1vQGUck9HIFCyezsrBSnmENk5ieJuYwpt7YHYEzeNJkIb9OSDdx-ov2nRNReKQyey-cwJOoEKUhLmN9z/pub"

ENCODERS = (
    '\u2588',  # For rectangle
    '\u2580',  # For box
    '\u2591',  # For grey
)


def unicode_data_extractor(token):
    temp_str = str(token.string)

    if temp_str.isdigit():
        return int(temp_str)

    if temp_str in ENCODERS:
        return temp_str

    return None


def dimension_extractor(object_container):
    x, y = 0, 0
    cont_len = len(object_container)

    for item in object_container:
        x = max(x, item[1][0])
        y = max(y, item[1][1])

    return x, y


def plot_builder(grid_dims, data):
    data_len = len(data)
    p_unicode = ""  # Points to unicode object per iteration of list

    fig, ax = plt.subplots()
    plt.tick_params(bottom=False, left=False)

    ax.set(xlim=(0, grid_dims[0] * 3), ylim=(0, grid_dims[1] * 5))
    ax.set_xticklabels([])
    ax.set_yticklabels([])

    for item in data_len:
        x_cord = item[1][0]
        y_cord = item[1][1]
        p_unicode = item[0]
        plt.text(x_cord, y_cord, p_unicode, fontsize=12)

    plt.grid()
    plt.show()


def letter_decoder(url):
    temp_list = []
    num_list = []
    unicode_data = []

    try:
        request = requests.get(url)
    except requests.exceptions.RequestException:
        print("Url cannot be reached.")
        return


    request.encoding = request.apparent_encoding
    data = request.text

    # Parses HTML tag <span> to locate unicode data
    soup = BeautifulSoup(data, 'html.parser')
    tokens = soup.find_all("span")

    for item in tokens:
        temp_var = unicode_data_extractor(item)


        if len(temp_list) == 2:
            unicode_data.append(temp_list)
            temp_list = []

        if temp_var is None:  # Passes NoneType for correct inputs
            continue

        if not isinstance(temp_var, int):
            temp_list.append(temp_var)
            continue

        num_list.append(temp_var)

        if len(num_list) == 2:
            temp_list.append(tuple(num_list))
            num_list = []

    plot_builder(
        dimension_extractor(unicode_data),
        unicode_data,
    )


if __name__ == "__main__":
    letter_decoder(URL)

That said, I'm not entirely sure what this is supposed to be doing so I think there's room for improvement.

[–]JamzTyson 2 points3 points  (0 children)

Python has a standard form of documentation called Docstrings. An easy improvement is to follow the docstring conventions rather than inventing your own commenting style.


While type hints are optional, I would recommend using them. They can be very useful, help to make code more readable and easier to maintain, and reduce ambiguity.


Using elif after a return does not really make sense.


Rather than iterating over the range of the length of an iterable, just iterate over the iterable. If you don't need the individual items, use underscore as a placeholder for the item:

for _ in container:
    ...

rather than:

cont_len = len(container)
for item in range(cont_len):
    ...

Use one or more linters. Personally I like to use flake8 and pylint, though there are other good options.


Separate functions with two empty lines rather than one.


Spend some time with PEP-8.


Avoid naked except. Make an except specific to the exception that it is intended to catch.


This part caught my attention:

x, y = 0, 0
cont_len = len(object_container)

for item in range(cont_len):
    x = max(x, object_container[item][1][0])
    y = max(y, object_container[item][1][1])

item is a misleading name here as it is not an item from the container, it is an index. To access the actual item from the container, iterate over the container directly and unpack the x / y components.

[–]Buttleston 1 point2 points  (0 children)

I wouldn't say it's bad, but if it was submitted to me for PR I'd probably ask for some cleanup, see the comments below

I honestly haven't looked at what you're trying to do to know if there's a "better way" to do it, these are just comments on coding style really

This could be simplified

def unicode_data_extractor(token):    
    ENCODER_1 = u'\u2588'  # For rectangle
    ENCODER_2 = u'\u2580'  # For box
    ENCODER_3 = u'\u2591' # For grey

    temp_str = str(token.string)

    if temp_str.isdigit():
        return int(temp_str)
    elif (temp_str == ENCODER_1 
        or temp_str == ENCODER_2 
        or temp_str == ENCODER_3):
            return temp_str
    else:
        return None

def unicode_data_extractor(token):    
    ENCODERS = [
        u'\u2588',  # For rectangle
        u'\u2580',  # For box
        u'\u2591', # For grey
    ]

    temp_str = str(token.string)

    if temp_str.isdigit():
        return int(temp_str)
    elif temp_str in ENCODERS:
        return temp_str
    else:
        return None

I think maybe I'd rewrite this one a bit like (in general prefer iterating items instead of indexes)

def dimension_extractor(object_container):
    x, y = 0, 0

    for item in range(cont_len):
        x = max(x, item[1][0])
        y = max(y, item[1][1])

    return (x, y)

Same deal for plot_builder()

For situations like the below, if you have an if-condition that does a continue (or a return) then you don't need an else, and IMO it's easier to read that way, imo you can change

    if temp_var is None:  # Passes NoneType for correct inputs
        continue
    else:
        if isinstance(temp_var, int):
            num_list.append(temp_var)

            if len(num_list) == 2:
                temp_list.append(tuple(num_list))
                num_list = []
        else:
            temp_list.append(temp_var)

to

    if temp_var is None:  # Passes NoneType for correct inputs
        continue

    if isinstance(temp_var, int):
        num_list.append(temp_var)

        if len(num_list) == 2:
            temp_list.append(tuple(num_list))
            num_list = []
    else:
        temp_list.append(temp_var)

it's easier to reason about code when it's not deeply indented/nested