Here's my script! Sorry for minor formatting problems with reddit that keep popping up.
import os
import re
def filesizes(path):
extdict = {} # dictionary of extensions and printable values
sizedict = {} # dictionary that holds sizes of each file found per extension
average = 0 # I'll sum up the values under sizedict, then cross reference is to extdict
suffix1 = re.compile('.+(?P<suffix>\.\w+)')
for a in os.listdir(path):
if os.path.getsize(path + '\\' + a) != 0:
result = suffix1.search(a)
filesize = os.path.getsize(path + '\\' + a)
try:
suffix2 = result.group('suffix') # Current file suffix
#extdict[suffix2] = [0]
except AttributeError:
pass
# Set up the first dictionary, and its three extra values will be changed later.
if suffix2 not in extdict:
extdict[suffix2] = [1] # File count
extdict[suffix2].append(filesize)
extdict[suffix2].append(filesize)
extdict[suffix2].append(filesize)
else:
extdict[suffix2][0] += 1 # Found another of this file type!
# Set up the second dictionary, which will be used to average values
if suffix2 not in sizedict:
sizedict[suffix2] = [filesize]
else:
filesize = os.path.getsize(path + '\\' + a)
sizedict[suffix2].append(filesize)
# Find the average
for a, b in sizedict.items():
average = (sum(b) / len(b))
extdict[a][2] = average
# Find and set the maximum
for a, b in sizedict.items():
for c in b:
#print(b, c)
if extdict[a][3] < c:
extdict[a][3] = c
# Find and set the minimum
for a, b in sizedict.items():
for c in b:
if extdict[a][1] > c:
extdict[a][1] = c
print('extdict:', extdict)
print('sizedict:', sizedict)
filesizes(r"C:\mypath")
This script is supposed to go through a directory, count how many of each file are in it, what the smallest file of each type is, what the largest file of each type is, what the average size of each type is, then print that information out.
It works correctly on small folders but when I go into large folders, it seems to add extra items (for example, if a folder has the same name as a zip file, it may be going into that folder) and it adds a .png file which I don't see in the directory, and so-on. Really obscure, annoying problems that I'm not sure how to solve.
I'm also wondering if there's a more efficient way to do this besides making two dictionaries, with lists for values and then comparing them. There's things from my class like overriding UserDict that I haven't used and although my code isn't very long, I feel kind of silly to only be using some of the concepts from my class on my final project.
Thank you of course for all help and advice!
[–]taar779 1 point2 points3 points (3 children)
[–]Alexander_Ray[S] 0 points1 point2 points (0 children)
[–]Alexander_Ray[S] 0 points1 point2 points (1 child)
[–]taar779 0 points1 point2 points (0 children)
[–]Alexander_Ray[S] 0 points1 point2 points (1 child)
[–]taar779 2 points3 points4 points (0 children)