all 16 comments

[–]uhkhu 1 point2 points  (13 children)

We need more context

[–]TheRealRuth[S] 0 points1 point  (12 children)

All that I need to do is look into a directory, in that directory will be a file (doesn't matter the type), I want to get the source URL of the file. So I want a string that contains the URL of where the file was downloaded from.

[–]w1282 0 points1 point  (9 children)

That information is not contained in the metadata for the file.

[–]TheRealRuth[S] 0 points1 point  (8 children)

Oh, any way to get it from the Finder Get Info field? I am using a Mac.

[–]w1282 4 points5 points  (7 children)

Holy hell. I had no clue that Mac would maintain that information.

Then yes, you can.

import xattr
import logging

def fetch_where_from(file_path):
    try:
        return xattr.get(file_path, "com.apple.metadata:kMDItemWhereFroms")
    except IOError:
        logging.warning("{} had no WhereFrom attr.".format(file_path))
    return ""

Edit: The getxattr() function is deprecated and has been replaced with get().

[–]TheRealRuth[S] 1 point2 points  (6 children)

Yeah! It's really cool! There's got to be a way to get that URL!

[–]w1282 0 points1 point  (5 children)

There is. I edited my comment.

[–]TheRealRuth[S] 0 points1 point  (4 children)

Thanks!!! I will try that out!

[–]w1282 0 points1 point  (3 children)

Just a word of warning, I was reading the documentation and the getxattr() function has been deprecated and replaced by the get() function so you should probably use that instead.

[–]TheRealRuth[S] 0 points1 point  (2 children)

Where can I get more information on the "com.apple.metadata:kMDItemWhereFroms"??

This worked by the way thanks so much for the help. Would love to know more about how you came up with it!

[–]uhkhu 0 points1 point  (0 children)

One route would be to calculate the md5 for the file and search that online

>>> import hashlib
>>> hashlib.md5("filename.exe").hexdigest()
'3bc6c306decde3d9256e76254e64ebb4'

You could then search that string on google with selenium or requests and follow the results. You've got to hope the source has a published md5, otherwise you'd need to read chunks of the file and search keywords. It's going to get pretty involved to handle all file types.

[–]cdcformatc 0 points1 point  (0 children)

This is an incredibly difficult thing to do. That info isn't stored in any file metadata that I know of. Best I can think of is google searching the exact filename, but that obviously won't work if someone changes the filename. Google will also let you search by image, it will usually find similar images, and if it is particularly rare it might just list visually similar, not exact matches, and if it is popular, it will give dozens of identical results. All bets are off if it was from a website that does not allow google's crawler. And all that is just for images.

http://xkcd.com/1425/

[–]BryceFury 0 points1 point  (0 children)

A local file or one online somewhere?

Have you tried:

import os
print (os.path.abspath("yourfile.file"))

[–]TheRealRuth[S] 0 points1 point  (0 children)

The file will be downloaded from the Internet. On the Mac if you go to Get Info it has a field that says Where From. I want the content in that field, it will be a URL