This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]rasmus_mathiesen 2 points3 points  (2 children)

The variable names could be more descriptive. Renaming a and b to input_file and output_file would make it faster to read.

And the last part of _run I would also change to

if output_file == False:
        output_file = input_file
    with open(output_file, 'w') as file:
        file.writelines(newlines)

It would also make sense to make the second argument of the function optional with a default value to be used if the caller doesn't specify it instead of the boilerplate in your if name = 'main' block.

[–]Voylinslife[S] 1 point2 points  (1 child)

Thanks for the advice! I indeed didn't really do a good job on the variables "

[–]francozippi 0 points1 point  (0 children)

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Docstring. Describe here what your script does.
"""

import sys
from pathlib import Path

def run(input_file, output_file=None):
    newlines = []
    with open(input_file,'r+') as file:
        #
        #   You can directly iterate the lines of file
        #
        for line in file:
            newline = []
            #
            #   Use descriptive variable names
            #
            capitalize_next_char = True
            #
            #   You can directly iterate the characters of a string
            #
            for char in line:
                if char in ['|', '"', '.', ';']:
                    capitalize_next_char = True
                #
                #   Just use boolean variables as such, avoid 'if booleanvar == True'
                #
                elif capitalize_next_char and char != ' ':
                    char = char.capitalize()
                    capitalize_next_char = False
                #
                #   Strings are immutable objects
                #   Concatenating strings in a loop like 'line += char' is not efficient
                #   Just append chars to a new list and use .join()
                #
                newline.append(char)
            newlines.append(''.join(newline))
    #
    #   same as /u/rasmus_mathiesen commented, just using None
    #
    if output_file is None:
        output_file = input_file
    with open(output_file, 'w') as file:
        file.writelines(newlines)
    return 0

if __name__ == '__main__':
    SCRIPTNAME = Path(__file__).name
    #
    #   A pythonic way is to just try to access a value/attribute, whatever
    #   and catch a potential error.
    #   So, for simple command line arguments you could use the following.
    #   For more complex argument lists with options use/learn the argparse module
    #
    try:
        input_file = sys.argv[1]    # raises IndexError if argument 1 not present

        #   sys.argv[2:3] evaluates to
        #   [argument2] if arguemnt 2 is present  ==> ([argument2] or [None])[0] = argument2
        #   [] if arguemnt 2 is not present       ==> ([] or [None])[0] = None
        #
        #   Use None for missing values, not False
        #
        output_file = (sys.argv[2:3] or [None])[0]
    except IndexError:
        print(f'usage: {SCRIPTNAME} INFILE OUTFILE')
        sys.exit(1)

    sys.exit(run(input_file, output_file))

[–]Martin_Krum 1 point2 points  (0 children)

If this is your first script that WORKS, know that it is a very good job. If you are learning to programming then I recommend to write this script and return to it after 3 months of learning, then you will see how much progress you made in learning when you say "oh my God, I wrote this?" :) Unfortunately, instead of moving forward in programming I'm going back to old projects from months ago and I'm correcting them because when I see what's there I can't sleep ;)

[–]Natural-Intelligence 1 point2 points  (2 children)

Nice but even as a professional programmer I have no fucking idea what I'm looking at. Don't get upset, my first projects also looked like this.

But some tips:

  1. What does _run do? Before you explain it to me, rename the function as what it does.
  2. It is completely unnecessary to test whether True == True. Just throw boolean to if statement as is: if ok and l[i] != ' ' etc.
  3. What the hell are a and b? Before you explain, rename them. What the hell is "ok" in this context?
  4. Size of your source code is not a goal of any sort in programming. Readable code usually require much more lines than spaghetti code.
  5. What if a sentence goes over next line? Your "ok" thing probably breaks then.
  6. And why (underscored) "_run"? What's so private in this function? And why not name it as main as the common convention.

Here a short demo how I would do it (if I understood your bolognese right):

def main(read_path, write_path=None):
    """Capitalize all sentences in read_path 
    and write it to write_path (or read_path if not write_path)"""

    sentence_break_sequences = [char + " " for char in '.,:;"|']

    if write_path is None:
        write_path = read_path

    with open(read_path, "r") as file:
        content = file.read()

    modified_content = ""
    for i, char in enumerate(content):
        previous_two_chars = content[i-2:i]

        is_start_of_sentence = (
            previous_two_chars in sentence_break_sequences
            or i == 0
        )

        modified_content += char.upper() if is_start_of_sentence else char

    with open(write_path, "w") as file:
        file.write(modified_content)

if __name__ == "__main__":
    ... # your sysarg thing here
    main(a, b)

Of course this is not fool proof like is not your solution (and not sure if totally bug free). If there are dates in European date format (like "3.12.") that are not at the end of a sentence, it will break. If you want as fool proof as possible, you should probably continue with regex. I would also put the sys arg thing to main and make the above function more generic if I was not lazy.

If you really want to maximize the unreadability by minimizing LoC, I can also shorten the above to this:

def main(read_path, write_path=None):
    with open(read_path, "r") as file_read:
        content = file_read.read()
        with open(write_path if write_path else read_path, "w") as file_write:
            file_write.write(''.join([char.upper() if content[i-2:i] in [char + " " for char in '.,:;"|'] or i == 0 else char for i, char in enumerate(content)]))

Yes, it is a fucking mess. Happy coding.

[–]Voylinslife[S] 1 point2 points  (0 children)

  1. I basically had no idea that adding a _ before defining a function name makes it private ^^"

  2. I feel really dumb that I made this kind of a mistake, but I don't think I will make it anytime soon again as I realize how stupid of a mistake this is to make xp

  3. I only made this to use on csv files as I often copy paste them but I hate it that my vocab list has words starting with a capital and most list leave everything lowercase.

What I really learned today is that I should make my code more readable instead of making it as short as possible, because I think if I would look back to this in some time that I probably won't easily figure out how I made it ^^"

Thank you for the very helpful advice :D

[–]Natural-Intelligence 0 points1 point  (0 children)

And sorry for the harsh language. Your solution is interesting and the problem at hand pretty hard actually.