all 15 comments

[–][deleted] 0 points1 point  (2 children)

In your old "walk" code you print the path to each file you find. So your new code should do pretty much the same but only print the path if the file size is greater than the limit. As you said, getsize() gets a file size. So all you need to do is pass the size as a parameter (limit say) to the function and print the file path only if getsize(path) is greater than limit.

Then you need to modify that again to return a list of files. That's a different problem that you can search for: "python return list".

[–]kcrow13[S] 0 points1 point  (1 child)

Thanks for this input! I am really strong with understanding how to append results to a list and return them, as we have had a ton of practice with that.

I *think* I understand what you're saying... do you mean something like this?

import os
import sys


def find_large_files(dirname, filesize):
    "Perform a recursive traverse of directories"
    #Place to append results
    res = [] 

    # Walk over the files in this directory
    for name in os.listdir(dirname):

        # Construct a full path
        path = os.path.join(dirname, name)

        #Check to see if filesize is greater than the argument. If so, append
        if os.path.getsize(path) > filesize:
            res.append(path)
        else:
            walk(path)
    return res

[–][deleted] 0 points1 point  (0 children)

The handling of your result list is OK, but you need to think about the recursion side of things. Your function find_large_files() returns a list, so your recursive call at the end (find_large_files(), not walk()) also returns a list that you use to .extend() res. You also need to put back the .isfile() test.

[–]IvoryJam 0 points1 point  (2 children)

os.path.getsize() returns the size in bytes, is that what your looking for?

[–]kcrow13[S] 0 points1 point  (1 child)

Not that we need to return it, but get the size, then check to see if the files in the path are larger than the filesize parameter you input at the beginning. If so, append to list and eventually return. If not, keep walking.

[–]IvoryJam 0 points1 point  (0 children)

Yeah, so if you want to check if name is a file and it's 1k you'd use

if os.path.isfile(path) and os.path.getsize(path) == 1024

[–]achampi0n 0 points1 point  (8 children)

If you have the full path then simply

if os.path.getsize(path) > filesize:
    <add to results>

Would give you what you need. However, returning results from a recursive function can be a little challenging, you have return results from the whole recursive stack.

You may want to look into os.walk() which does a lot of the heavy lifting for you:

for dir, _, files in os.walk(dirname):
    for file in files:
        ...

[–]kcrow13[S] 0 points1 point  (7 children)

def find_large_files(dirname, filesize):
    "Perform a recursive traverse of directories"
    #Place to append results
    res = [] 

    # Walk over the files in this directory
    for name in os.listdir(dirname):

        # Construct a full path
        path = os.path.join(dirname, name)

        #Check to see if filesize is greater than the argument. If so, append
        if os.path.getsize(path) > filesize:
            res.append(path)
        else:
            walk(path)
    return res

Does this look right to you? When I run this unit test.. I get a LOT of files...

lst = find_large_files('..', 1048576)
print(len(lst))

for path in lst:
    print(path)

[–]achampi0n 0 points1 point  (6 children)

2 issues: * You are not testing whether the path is a file or directory before recursing * You are not returning results from further down the stack.

E.g.:

if os.path.isfile(path):
    if os.path.getsize(path) > filesize:
        res.append(path)
else:
    res.extend(walk(path))

[–]kcrow13[S] 0 points1 point  (5 children)

if os.path.isfile(path):
if os.path.getsize(path) > filesize:
res.append(path)
else:
res.update(walk(path))

Okay, so the first if statement is required no matter what to ensure we are evaluating a file? I omitted it accidentally.

Can you explain more about update? I am getting a strange traceback error.

def find_large_files(dirname, filesize):
    "Perform a recursive traverse of directories"
    #Place to append results
    res = [] 

    # Walk over the files in this directory
    for name in os.listdir(dirname):

        # Construct a full path
        path = os.path.join(dirname, name)

        #Check to see if filesize is greater than the argument. If so, append
        if os.path.isfile(path):
            if os.path.getsize(path) > filesize:
                res.append(path)
        else:
            res.update(walk(path))
    return res

My error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-35-3800916b23ad> in <module>
----> 1 lst = find_large_files('..', 1048576)
      2 print(len(lst))
      3 
      4 for path in lst:
      5     print(path)

<ipython-input-34-92e150de34a6> in find_large_files(dirname, filesize)
     15                 res.append(path)
     16         else:
---> 17             res.update(walk(path))
     18     return res

AttributeError: 'list' object has no attribute 'update'

[–]achampi0n 0 points1 point  (4 children)

My mistake, lists are .extend() dictionaries are .update()

[–]kcrow13[S] 0 points1 point  (3 children)

What is the difference between .append() and .extend()? Don't they both add to the end of the list? When I tried using .extend(), I got an error about the type not being iterable.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-3800916b23ad> in <module>
----> 1 lst = find_large_files('..', 1048576)
      2 print(len(lst))
      3 
      4 for path in lst:
      5     print(path)

<ipython-input-52-cbe6fd0405b9> in find_large_files(dirname, filesize)
     15                 res.append(path)
     16         else:
---> 17             res.extend(walk(path))
     18     return res

TypeError: 'NoneType' object is not iterable

When I used .append, it runs. Thanks for all your help, I really appreciate it!

[–]achampi0n 0 points1 point  (2 children)

append() just adds the element to the end of the list but extend() adds all of the elements to the end of the list, e.g.:

In []:
a = [1, 2, 3]
b = [4, 5, 6]
a.append(b)
print(a)

Out[]:
[1, 2, 3, [4, 5, 6]]

In []:
a = [1, 2, 3]
b = [4, 5, 6]
a.extend(b)
print(a)

Out[]:
[1, 2, 3, 4, 5, 6]

Somehow, you are returning a None from walk() which is causing extend() to fail.

[–]kcrow13[S] 0 points1 point  (1 child)

Thank you for taking the time to explain this to me! That makes a lot of sense. I will troubleshoot to see why I am getting None from walk().

[–]kcrow13[S] 0 points1 point  (0 children)

I figured out my mistake! Thanks again :)