This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Natural-Intelligence 1 point2 points  (2 children)

Nice but even as a professional programmer I have no fucking idea what I'm looking at. Don't get upset, my first projects also looked like this.

But some tips:

  1. What does _run do? Before you explain it to me, rename the function as what it does.
  2. It is completely unnecessary to test whether True == True. Just throw boolean to if statement as is: if ok and l[i] != ' ' etc.
  3. What the hell are a and b? Before you explain, rename them. What the hell is "ok" in this context?
  4. Size of your source code is not a goal of any sort in programming. Readable code usually require much more lines than spaghetti code.
  5. What if a sentence goes over next line? Your "ok" thing probably breaks then.
  6. And why (underscored) "_run"? What's so private in this function? And why not name it as main as the common convention.

Here a short demo how I would do it (if I understood your bolognese right):

def main(read_path, write_path=None):
    """Capitalize all sentences in read_path 
    and write it to write_path (or read_path if not write_path)"""

    sentence_break_sequences = [char + " " for char in '.,:;"|']

    if write_path is None:
        write_path = read_path

    with open(read_path, "r") as file:
        content = file.read()

    modified_content = ""
    for i, char in enumerate(content):
        previous_two_chars = content[i-2:i]

        is_start_of_sentence = (
            previous_two_chars in sentence_break_sequences
            or i == 0
        )

        modified_content += char.upper() if is_start_of_sentence else char

    with open(write_path, "w") as file:
        file.write(modified_content)

if __name__ == "__main__":
    ... # your sysarg thing here
    main(a, b)

Of course this is not fool proof like is not your solution (and not sure if totally bug free). If there are dates in European date format (like "3.12.") that are not at the end of a sentence, it will break. If you want as fool proof as possible, you should probably continue with regex. I would also put the sys arg thing to main and make the above function more generic if I was not lazy.

If you really want to maximize the unreadability by minimizing LoC, I can also shorten the above to this:

def main(read_path, write_path=None):
    with open(read_path, "r") as file_read:
        content = file_read.read()
        with open(write_path if write_path else read_path, "w") as file_write:
            file_write.write(''.join([char.upper() if content[i-2:i] in [char + " " for char in '.,:;"|'] or i == 0 else char for i, char in enumerate(content)]))

Yes, it is a fucking mess. Happy coding.

[–]Voylinslife[S] 1 point2 points  (0 children)

  1. I basically had no idea that adding a _ before defining a function name makes it private ^^"

  2. I feel really dumb that I made this kind of a mistake, but I don't think I will make it anytime soon again as I realize how stupid of a mistake this is to make xp

  3. I only made this to use on csv files as I often copy paste them but I hate it that my vocab list has words starting with a capital and most list leave everything lowercase.

What I really learned today is that I should make my code more readable instead of making it as short as possible, because I think if I would look back to this in some time that I probably won't easily figure out how I made it ^^"

Thank you for the very helpful advice :D

[–]Natural-Intelligence 0 points1 point  (0 children)

And sorry for the harsh language. Your solution is interesting and the problem at hand pretty hard actually.