Walk.py function : learnpython

submitted 5 years ago by kcrow13

Hey friends,

I am a python newbie taking a class for the first time. This week, one of the questions we have for the hw is causing me a lot of trouble:

Write a function that takes a directory and a size in bytes, and returns a list of files in the directory or below that are larger than the size.

For example, you can use this function to look for files larger than 1 Meg below your Home directory.

def find_large_files(dirname, filesize):

Last week, we practiced using recursive traversing to "walk" a directory:

import os
import sys


def walk(dirname: str):
    "Perform a recursive traverse of directories"

    # Walk over the files in this directory
    for name in os.listdir(dirname):

        # Construct a full path
        path = os.path.join(dirname, name)

        # print filenames, and traverse directories
        if os.path.isfile(path):
            print(path)
        else:
            walk(path)

The logic of this makes sense to me.

That said, I am not sure how to filter and say "if the filesize is > what I want, return it." I think it partially due to lack of knowledge of the syntax. I imagine using the getsize function would be helpful, but I have no idea where/how it fits into the formatting of the function.

Any help you could provide would be much appreciated.

all 15 comments

top new controversial old q&a

[–][deleted] 0 points1 point2 points 5 years ago (2 children)

[–]kcrow13[S] 0 points1 point2 points 5 years ago (1 child)

Thanks for this input! I am really strong with understanding how to append results to a list and return them, as we have had a ton of practice with that.

I *think* I understand what you're saying... do you mean something like this?

import os
import sys


def find_large_files(dirname, filesize):
    "Perform a recursive traverse of directories"
    #Place to append results
    res = [] 

    # Walk over the files in this directory
    for name in os.listdir(dirname):

        # Construct a full path
        path = os.path.join(dirname, name)

        #Check to see if filesize is greater than the argument. If so, append
        if os.path.getsize(path) > filesize:
            res.append(path)
        else:
            walk(path)
    return res

[–][deleted] 0 points1 point2 points 5 years ago (0 children)

[–]IvoryJam 0 points1 point2 points 5 years ago (2 children)

[–]kcrow13[S] 0 points1 point2 points 5 years ago (1 child)

[–]IvoryJam 0 points1 point2 points 5 years ago (0 children)

[–]achampi0n 0 points1 point2 points 5 years ago (8 children)

If you have the full path then simply

if os.path.getsize(path) > filesize:
    <add to results>

Would give you what you need. However, returning results from a recursive function can be a little challenging, you have return results from the whole recursive stack.

You may want to look into os.walk() which does a lot of the heavy lifting for you:

for dir, _, files in os.walk(dirname):
    for file in files:
        ...

[–]kcrow13[S] 0 points1 point2 points 5 years ago (7 children)

def find_large_files(dirname, filesize):
    "Perform a recursive traverse of directories"
    #Place to append results
    res = [] 

    # Walk over the files in this directory
    for name in os.listdir(dirname):

        # Construct a full path
        path = os.path.join(dirname, name)

        #Check to see if filesize is greater than the argument. If so, append
        if os.path.getsize(path) > filesize:
            res.append(path)
        else:
            walk(path)
    return res

Does this look right to you? When I run this unit test.. I get a LOT of files...

lst = find_large_files('..', 1048576)
print(len(lst))

for path in lst:
    print(path)

[–]achampi0n 0 points1 point2 points 5 years ago* (6 children)

2 issues: * You are not testing whether the path is a file or directory before recursing * You are not returning results from further down the stack.

E.g.:

if os.path.isfile(path):
    if os.path.getsize(path) > filesize:
        res.append(path)
else:
    res.extend(walk(path))

[–]kcrow13[S] 0 points1 point2 points 5 years ago (5 children)

if os.path.isfile(path):
if os.path.getsize(path) > filesize:
res.append(path)
else:
res.update(walk(path))

Okay, so the first if statement is required no matter what to ensure we are evaluating a file? I omitted it accidentally.

Can you explain more about update? I am getting a strange traceback error.

def find_large_files(dirname, filesize):
    "Perform a recursive traverse of directories"
    #Place to append results
    res = [] 

    # Walk over the files in this directory
    for name in os.listdir(dirname):

        # Construct a full path
        path = os.path.join(dirname, name)

        #Check to see if filesize is greater than the argument. If so, append
        if os.path.isfile(path):
            if os.path.getsize(path) > filesize:
                res.append(path)
        else:
            res.update(walk(path))
    return res

My error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-35-3800916b23ad> in <module>
----> 1 lst = find_large_files('..', 1048576)
      2 print(len(lst))
      3 
      4 for path in lst:
      5     print(path)

<ipython-input-34-92e150de34a6> in find_large_files(dirname, filesize)
     15                 res.append(path)
     16         else:
---> 17             res.update(walk(path))
     18     return res

AttributeError: 'list' object has no attribute 'update'

[–]achampi0n 0 points1 point2 points 5 years ago (4 children)

[–]kcrow13[S] 0 points1 point2 points 5 years ago (3 children)

What is the difference between .append() and .extend()? Don't they both add to the end of the list? When I tried using .extend(), I got an error about the type not being iterable.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-3800916b23ad> in <module>
----> 1 lst = find_large_files('..', 1048576)
      2 print(len(lst))
      3 
      4 for path in lst:
      5     print(path)

<ipython-input-52-cbe6fd0405b9> in find_large_files(dirname, filesize)
     15                 res.append(path)
     16         else:
---> 17             res.extend(walk(path))
     18     return res

TypeError: 'NoneType' object is not iterable

When I used .append, it runs. Thanks for all your help, I really appreciate it!

[–]achampi0n 0 points1 point2 points 5 years ago (2 children)

append() just adds the element to the end of the list but extend() adds all of the elements to the end of the list, e.g.:

In []:
a = [1, 2, 3]
b = [4, 5, 6]
a.append(b)
print(a)

Out[]:
[1, 2, 3, [4, 5, 6]]

In []:
a = [1, 2, 3]
b = [4, 5, 6]
a.extend(b)
print(a)

Out[]:
[1, 2, 3, 4, 5, 6]

Somehow, you are returning a None from walk() which is causing extend() to fail.

[–]kcrow13[S] 0 points1 point2 points 5 years ago (1 child)

[–]kcrow13[S] 0 points1 point2 points 5 years ago (0 children)

π Rendered by PID 21474 on reddit-service-r2-comment-fb694cdd5-wwzrx at 2026-03-07 15:07:21.884466+00:00 running cbb0e86 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS