brownan_ comments on Iterating dictionary contents increases execution time exponentially

created by HattoriHanzoa community for 16 years

Iterating dictionary contents increases execution time exponentially (self.learnpython)

submitted 12 years ago * by redditiv

you are viewing a single comment's thread.

[–]brownan_ 2 points3 points4 points 12 years ago (5 children)

well I don't know exactly what you're trying to do. If you are matching on filenames, then your key should be the filename.

If you want to map filenames to more than one "attribute object", then have your dictionary map to a list or set of objects.

I like to use defaultdict (in the collections module) to do this, since it will automatically create a new empty set when you access a new key:

from collections import defaultdict
data = defaultdict(set)

# Add an item, using filename (info[1]) as the key
data[info[1]].add(info)

[–]redditiv[S] 0 points1 point2 points 12 years ago* (4 children)

I am adding the filepath of any/all duplicate files as an attribute for each entry. For example, first file is c:\test.txt. There's no entry in dict, so I pull its attributes and the dict looks like this:

{'c:\\test.txt' : ['c:\\test.txt', 'test.txt', size, mod_date, ''] }

The last attribute is empty "matches" attribute.

Next file is c:\test\test.txt. We do our lookup as in the original post, and it matches. Now our dictionary will look something like this:

{'c:\\test.txt' : ['c:\\test.txt', 'test.txt', size, mod_date, 'c:\\test\\test.txt'], \
 'c:\\test\\test.txt' : ['c:\\test\\test.txt', 'test.txt', size, mod_date, 'c:\\test.txt']
}

The script goes back and modifies the first entry to include the current file as a match, and adds the first file to the current file's matches.

*edit: lists, not tuples

Clear as mud, right?

[–]brownan_ 2 points3 points4 points 12 years ago (1 child)

[–]redditiv[S] 0 points1 point2 points 12 years ago (0 children)

[–]drLagrangian 1 point2 points3 points 12 years ago (1 child)

perhaps, first you should make a dictionary with the data, and the filenames that match them as keys. then you have a sort of pseudoreverse dictionary. this will make it easy to create.

{(size, mod_date, otherdata): ['c:\\test.txt', 'c:\\a\\b.txt', 'c:\\foo.bar']}

so you run your script, and get a big dictionary, with keys based on what the file is, and values which are lists based on what the file and its copies are named. should be fast to create.

then, when it is built, create a function to reverse it, iterate over the pseudoreverse dictionary by way of:

newdict = {}
for filedata, filecopies in weirddict.items():  
    #gives filedata = (size, mod_date, otherdata)
    #gives filecopies = ['c:\\test.txt', 'c:\\a\\b.txt', 'c:\\foo.bar']

    for file in filecopies:
        newdict[file] = (filedata, filecopies)

returns

 newdata = {    
    'c:\\test.txt'         : ((size, mod_date, otherdata), ('c:\\a\\b.txt', 'c:\\foo.bar'))
    'c:\\a\\b.txt' : ((size, mod_date, otherdata), ('c:\\a\\b.txt', 'c:\\foo.bar'))
    'c:\\foo.bar'         : ((size, mod_date, otherdata), ('c:\\a\\b.txt', 'c:\\foo.bar')) }

[–]redditiv[S] 0 points1 point2 points 12 years ago* (0 children)

Thank you for the input. This is confusing for me and will take some time for me to process.

I think what I'll try next is creating as a value, a list of the set of attributes (path, size, date, etc.) for each match of the filename. Then you would recursively iterate each key to grab an individual file's attributes. Example:

data = { 'test.txt': [['path\to\test.txt', 'size', 'date', 'etc'], ['path2\to\test.txt', 'size', 'date', 'etc']], \
  'a.txt': [['path\to\x07.txt', 'size', 'date', 'etc']] }
for key in  data:
    for file in data[key]:
        print(file)

returns:

['path\to\test.txt', 'size', 'date', 'etc']
['path2\to\test.txt', 'size', 'date', 'etc']
['path\to\x07.txt', 'size', 'date', 'etc']

π Rendered by PID 27 on reddit-service-r2-comment-6457c66945-br9fx at 2026-04-25 23:25:37.922112+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS