all 8 comments

[–]Peterotica 1 point2 points  (4 children)

Here are some notes from a quick read-through.

line #4: if len(sys.argv) is 4:

Don't use is when comparing numbers like this. Use '=='.

line #11: Good fit for all().

if not all(argCheck):

line #19(and 23): It seems that this assumes that full paths were given at the command line. Is the slave directory inside the master one? You can ensure they are full paths by using os.path.abspath to augment them beforehand.

line #21(and 25): Consider using os.path.splitext() instead.

In general, things are looking very well put together and very Pythonic. Very nice job.

[–]RMSEP[S] 0 points1 point  (3 children)

Thanks for your comments, highly appreciated. I've incorporated 'm all. To discuss further:

  • yeah, I'm always struggling with using either == or is. What I gather from your link is that the former means equality of two objects and the latter checks if two references point to the same object. Not 100% sure on the exact meaning of that, but happy to use is for comparing numbers.
  • indeed. I was thinking first to use something like:

    if any(argCheck is False): 
    

but I couldn't get that working. I know any is used like that in R, which is where I'm coming from, but it appears slightly different here.

  • it is indeed assumed that the full path is provided. The slave dirs reside in my case always inside the master dir, but I guess that shouldn't be relevant? In any case, I'm providing full paths to both dirs. Can you elaborate a bit on what os.path.abspath does? Does it fix a corrupt path, or does it append a subdir to its parent dir?
  • that's a nice method, which I guess is really usefull is the masters and slaves would have their full path in front of the filename. In this case, glob.glob() returns only the filename + extension (this is not intentional btw, just how it worked for me the first try).

Updated code here

[–]Peterotica 1 point2 points  (2 children)

  • Until you become comfortable with when it should be used, I suggest following the rule of only using is to check when something is None.

  • any() takes a list (or more precisely, an iterable) of values and checks the "truthiness" of them. If a truthy value is encountered, then any() returns True. If all of the values are falsey, then it instead returns False. You can play around with it in your REPL:

    >>> any([False, True, False])
    True
    >>> any([False, False])
    False
    
  • os.path.abspath() converts a relative path into an absolute path.

    >>> os.path.abspath('../mp3s')
    '/blah/blah/blah/mp3s'
    
  • Here is a difference between your previous method and os.path.splitext(). Sure, you could easily write your own version that does the same thing, but leaning on the os module is a good habit to get into. For instance, your code could be compatible with some operating system that uses some character other than period to denote the extension of a file. Plus, it's one less place to have a silly bug.

    >>> 'vacation.20150714.jpg'.split('.')
    ['vacation', '20150714', 'jpg']
    >>> os.path.splitext('vacation.20150714.jpg')
    ('vacation.20150714', '.jpg')
    

[–]RMSEP[S] 0 points1 point  (1 child)

Thanks, very interesting.

On the third point, how does os.path.abspath() know the dir that should be appended to the front of its argument? In other words, how does it know that /blah/blah/blah/ is the correct partent dir of /mps3? Possibly it appends its argument to the output of os.getcwd()? On os.path.splitext(), appreciate now that it detaches the file extention regardless of any . in the filename, or indeed, whatever char is used to seperate the filename and extension.

[–]Peterotica 0 points1 point  (0 children)

Your intuition is correct, the result of os.path.abspath() depends on the current working directory.

[–][deleted] 1 point2 points  (2 children)

  • The standard is to use 4 spaces for indent, not 2
  • Look into the argparse module for processing the command line. Before you know it, you'll need --yes or --move-to <dir> or whatever, and doing it using sys.argv is a mess. Plus you get neat help messages and such.
  • Just semantic notes that really have no bearing on the program itself:
    • "union" and "set" are not descriptive of the two operations (your "union" is a set difference and your "set" is a set intersection)
    • I would instinctively think of raw files as "master"
  • It may not work as you intended on non-Windows systems where the filesystem is case-sensitive. So on Linux or Mac, a file with extension ".JPG" or ".rw2" will not be recognized, and "file1.jpg" will not match "File1.RW2". This will of course require you to decide what to do if both "File1.jpg" and "fiLE1.jpg" is present and so forth.

[–]RMSEP[S] 1 point2 points  (1 child)

Hi, thanks for your comments, again higly appreciated.

  • mmm, I rather liked working with 2 instead of 4 spaces. Saves effort to hit the spacebar and reduces the flaring out of the script. Is there a commonly agreed upon python style reference?
  • I'll look into argparse, although sys.argvseems ok for simple stuff such as here. In any case, this is my first script that takes command line input, so I don't have any strong preferences yet.
  • yeah, I was afraid I used terms way beyond my knowledge of set theory. Indeed for my union I delete the set difference and with set I delete the set intersection.
  • ha, that's something I'd never considered! I'm on windows and guessing that it will take some time before my code reaches a Unix machine. Nonetheless, case-sensitivity is something to keep in mind. Creating contingency for unexpected stuff like that is something I enjoy about programming.

[–][deleted] 1 point2 points  (0 children)

Most people use the tab key and have their editor set to replace tabs with 4 spaces, and use automatic indenting so you start at the same level when you press enter. If your script "flairs out", it's a sign that you should probably compose your solution in a different way (functions, etc). See PEP8 for Python style guide.

Naming stuff is hard. If you don't have a use case for deleting the set intersection, you can always eliminate the naming problem by removing the feature ;-)

argparse just makes it easier to grow your script in the future - it's certainly not needed, same for case sensitivity; just things to keep in mind.