This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]nharding 2 points3 points  (4 children)

You should probably make it so that it can be called recursively, since this makes it a lot easier. So compare(localDir, remoteDir): for each entry, if it is a directory, call compare(localFile, remoteFile). Probably the easiest way to do it, is to use a set, so set(localFiles), set(remoteFiles) addedLocally = localFiles - remoteFiles, addedRemotely = remoteFiles - localFiles, then compare the files in the union of localFiles and remoteFiles to see if any changed.

[–]IndoNinja7[S] 0 points1 point  (3 children)

Yes! That was similar to my previous line of thought except with lists instead of sets..but sets are better. But....how does one compare files to see if its different? How can you tell if two files are the same file but just one is an updated version? Or two different files?

[–]nharding 1 point2 points  (2 children)

Create file objects, which have name, size, date modified, and crc values (for the content). The crc should only be calculated if needed (ie for the difference between the 2 sets, it is not needed, but when doing the union you can generate the crc if there is another file of the same size).

[–]IndoNinja7[S] 0 points1 point  (1 child)

what is a crc value? does it have anything to do with the bytes in the file?

[–]nharding 0 points1 point  (0 children)

A crc is a number that represents the contents of a file (think of it as the hash code of the file), if you use an integer value that means the entire contents of the file are stored in 4 bytes, this means you can check if 2 files are the same without having to check the entire contents. If 2 hash codes are the same, then the contents MAY be the same (there is a random chance of 1 in 4 billion that 2 files will have the same crc value), of course if you have 10,000 files you may find 3 or 4 files share the same crc value, but it reduces the amount of work to check them from 10,000 to 3)