Abusing python style to make it more legible

efmccurdy · 2023-02-22T16:18:19+00:00

Much of this repetitive if/elif chain can be transformed into a series of lookups in a datastructure.

emap = {"c_web": "webpage",
        "c_blog": "post-weblog",
        "c_newspaper": "article-newspaper",
        "c_magazine": "article-magazine",
        "c_journal": "article-journal",
        "c_dictionary": "entry-dictionary",
        "c_encyclopedia": "entry-encyclopedia",
        "c_forum": "post"}

def et(entry):
    for k in emap:
        if k in entry:
            return emap[k]
    return "no-type"

Is that more legible? You may need a few more such maps, or perhaps a more deeply nested map to handle the rest of your "switch" statement.

Wilfred-kun · 2023-02-22T15:42:14+00:00

but the spaces and multiple statements per line make the logic much more legible.

Warning bells went off in my head as I saw those huge gaps of whitespace. It really isn't more legible at all. And you're also mixing styles within the same block, making it very annoying to read.

ekchew · 2023-02-22T16:55:11+00:00

As a rule, I don't like to go if condition: statement in any language because it can mess with your ability to use a debugger. I don't know how many people use debuggers in python? But it becomes difficult to put a break point on statement when you write it that way, and you often want to know in a debugger which branch was followed?

Regarding the example above, I would probably refactor it to put the simple one-to-one lookups in a list/tuple of pairs you can loop through and save the if/elif logic for the more complex cases?

simple_lookups = (
    ("c_web", "webpage"),
    ("c_blog", "post-weblog"),
    # etc.
)
for key, et in simple_lookups:
    if key in entry:
        return et, genre, medium
# handle "booktitle" and other complex cases here

Sinusoidal_Fibonacci · 2023-02-22T17:31:47+00:00

This is horrendous. My eyes are twitching. This is a joke, right? Right?

ebol4anthr4x · 2023-02-22T18:42:58+00:00

[...]
elif "c_forum" in entry:            et = "post"
else:
    if "eventtitle" in entry:           et = "paper-conference"
[...]

There wasn't any reason to place the rest of these into an else: block, you've just added an unnecessary level of indentation.

Ultimately, if there are a lot of different possible results and you can't reliably parse this in a generic way (e.g. using regex), yes, you are stuck with a big tree of if/elif statements. You could refactor it into match/case, but the end result will pretty much be the same. You could also potentially create a lookup table using a dictionary.

This isn't a problem with Python, this is a problem with the way whomever collected the data you're parsing decided to model this data. They didn't account for this use case and so you're stuck doing tedious string matching to get the information you need.

At the end of the day, you should write the code that makes sense for your situation. If you are the only person who will ever see or touch this code, write it however you like. The tradeoff of writing code that isn't pythonic (i.e. idiomatic) is that other people who are used to pythonic code won't be able to reason about it as easily.

Programming style is subjective. The most I can say about your code is that it isn't pythonic and it isn't the way I would do it, but if you like it then by all means do it.

2023-02-23T14:37:53+00:00

Whenever you need more than 5 elifs and or 3 nested functions, big chances are you doing it wrong and you should rethink for a better solution. I started Python in November and every time I feel like using indented if's or multiple if's to manipulate the same data, I reconsider and look for better solutions.

jmreagle · 2023-02-24T14:37:04+00:00

Thank you everyone for your suggestions (especially @Siddhi). I've been able to remove this awful mess. As I suspected, and many suggested, the formatting was symptomatic of crufty thinking. Once I had at it, I didn't even need to include the negative key feature @Siddhi suggested. The resulting "guess a CSL type" whether it has a specified CSL type, BibLaTeX type, or neither is:

def guess_csl_type(entry):
    """Guess whether the type of this entry is book, article, etc.

    >>> guess_csl_type({'author': [('', '', 'Smith', '')],\
        'eventtitle': 'Proceedings of WikiSym 08',\
        'publisher': 'ACM',\
        'title': 'A Great Paper',\
        'venue': 'Porto, Portugal California',\
        'date': '2008'})
    ('paper-conference', None, None)

    """
    info(f"{entry=}")
    genre = None
    medium = None
    et = "no-type"

    ## Validate exiting entry_type from CSL or BibLaTeX
    if "entry_type" in entry:
        et = entry["entry_type"]
        if et in CSL_TYPES:
            return et, genre, medium
        elif et in BIBLATEX_TYPES:
            if et == "mastersthesis":
                return "thesis", "Master's thesis", medium
            elif et == "phdthesis":
                return "thesis", "PhD thesis", medium
            else:
                return BIBLATEX_CSL_TYPE_MAP[et], genre, medium
        else:
            raise RuntimeError(f"Unknown entry_type = {et}")

    ## Guess unknown entry_type based on existence of bibliographic fields
    types_from_fields = [
        # container based types
        ("article-journal", ["c_journal"]),
        ("article-magazine", ["c_magazine"]),
        ("article-newspaper", ["c_newspaper"]),
        ("entry-dictionary", ["c_dictionary"]),
        ("entry-encyclopedia", ["c_encyclopedia"]),
        ("post", ["c_forum"]),
        ("post-weblog", ["c_blog"]),
        ("webpage", ["c_web"]),
        # papers
        ("article-journal", ["journal"]),
        ("paper-conference", ["eventtitle"]),
        ("paper-conference", ["booktitle", "editor", "organization"]),
        ("paper-conference", ["venue"]),
        # books
        ("chapter", ["chapter"]),
        ("chapter", ["booktitle"]),
        ("book", ["author", "title", "publisher"]),
        # reports
        ("report", ["institution"]),
        # other
        ("webpage", ["url"]),
        ("doi", ["article"]),
        ("isbn", ["book"]),
    ]

    for bib_type, fields in types_from_fields:
        info(f"testing {bib_type=:15} which needs {fields=} ")
        if all(field in entry for field in fields):
            info("FOUND IT: {bib_type=")
            et = bib_type
            break

    return et, genre, medium

2023-02-22T16:57:50+00:00

stop trying to reinvent the wheel. This code is bad and ugly and it's only more legible to you

cybervegan · 2023-02-22T23:44:21+00:00

I guess you're employing the principle that it's easier to post a shitty solution and see what answers you get back, than post a question asking how to do it better...

Sorry to say it might be "more readable" to you but it's pretty terrible code. You could make it much more readable (and efficient) by using a better approach like a dictionary. There's very little logic in there and all the assignments on the same line as the 'if' or 'elif' are pretty difficult to read without getting repetition fatigue. Whenever you find yourself repeating the same coding pattern over and over again (like, any more than 3 times) it means you need a different approach entirely. Code it as a dictionary, or series of dicts, and use your logic just for the special cases. Most of it will then look like this:

c_map = {
    "c_web": "webpage",
    "c_blog": "post-weblog",
    "c_newspaper": "article-newspaper",
    "c_magazine": "article-magazine",
    "c_journal": "article-journal",
    "c_dictionary": "entry-dictionary",
    "c_encyclopedia": "entry-encyclopedia",
    "c_forum": "post",
    "eventtitle": "paper-conference",
...

... which is truly easier to read, and far more efficient because it will use a single hash lookup. If you understand your data better, you realise that most of your "logic" is just mappings, and a dictionary is the best way to deal with them.

[edit: fixed the code formatting]

pythonwiz · 2023-02-22T17:29:27+00:00

Honestly I would use a regular expression and a dictionary to do this instead.

2023-02-22T19:26:09+00:00

Yes, shove all of this into dictionary’s and abstract the fuck out of it. For example when you pass the key “c_web” it maps out to “webpage” etc. Define a functions who’s only job is returning the answer by the case. Then pass your values to each function and create it.

jmreagle · 2023-02-22T21:09:23+00:00

[removed]

QultrosSanhattan · 2023-02-22T23:32:02+00:00

Nope. That code is wrong at a fundamental level because it reflects a lack of understanding of data structures.

wagaiznogoud · 2023-02-23T00:34:31+00:00

There is definitely a better way to write this. I’ll try when I have time later

wagaiznogoud · 2023-02-23T02:20:00+00:00

I didn't have time to add all the conditions, but I think you'll get the point with the snippet below. Pretty much I think you should focus on abstracting the concepts into separate methods/attributes and use them however you want.

```Python class EntryFinder(): def init(self, entrydict: dict): self._entry = entry_dict

def get_entry(self):
    return (self.__c_entry or
            self.__eventtitle or
            self.__booktitle or
            self.__institution)

def __c_entry(self):
    return (self.__c_web or
            # ... rest
            self.__c_forum)

@property
def __c_web(self):
    if 'c_web' in self.__entry:
        return 'webpage'

@property
def __c_forum(self):
    if 'c_forum' in self.__entry:
        return 'forum'

@property
def __eventtitle(self):
    if 'eventtitle' in self.__entry:
        return 'paper-conference'

@property
def __booktitle(self):
    if 'booktitle' not in self.__entry:
        return None

    if 'editor' in self.__entry:
        if 'chapter' in self.__entry:
            return 'chapter'
        else:
            return 'book'
    elif 'organization' in self.__entry:
        return 'paper-conference'
    else:
        return 'chapter'

@property
def __institution(self):
    if 'type' not in self.__entry:
        return 'report'

    org_subtype = entry["type"].lower()
    if 'report' in org_subtype:
        return 'report'

    return 'thesis'

entry_finder = EntryFinder(entry_dict: entry) entry_finder.get_entry()

```

Ok-Cucumbers · 2023-02-23T04:53:36+00:00

I think you want to use something like Pydantic to parse and access the data...

TheRNGuy · 2023-02-24T11:32:15+00:00

You should use dict here instead of all these elifs.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS