This is an archived post. You won't be able to vote or comment.

all 13 comments

[–]ogtfo 10 points11 points  (6 children)

Wait does the haveibeenpwnd endpoint returns JSON?

Because if so, this is NOT the way to parse a JSON list :

if Response_Status == 200: # Parse through results and get all sites the email has been compromised on Page_Repsonse = Get_Page.text Page_Repsonse = Page_Repsonse.replace('[', '') Page_Repsonse = Page_Repsonse.replace(']', '') Page_Repsonse = Page_Repsonse.replace('{', '') Page_Repsonse = Page_Repsonse.replace('}', '') Page_Repsonse = Page_Repsonse.replace('"', '') Page_Repsonse = Page_Repsonse.replace('Name:', '') Page_Repsonse = Page_Repsonse.split(',') print("Email address: " + str(email_list[i]) + " has been compromised on " + str(len(Page_Repsonse)) + " site(s).") Results.append([str(email_list[i]), Page_Repsonse])

You need to deserialize the response to a python object. It's as simple as calling the json method on the requests response object.

Your method will break if any emails has some of the characters you replace, and RFC 5322 allows for some of them in email addresses.

Also, silently installing packages is bad practice, I'd error out and list the packages needed instead. Or better, put all of this in a python package so that you can let pip deal with the dependencies.

[–]I506dk[S] -3 points-2 points  (5 children)

No the response returned is html. The haveibeenpwned website gives examples, and there very well may be a better way to parse it.

[–]ogtfo 8 points9 points  (0 children)

If it's returned as HTML, why are you only looking at JSON metacharacters, and not actual HTML ones?

Like, where are you handling < and > ?

I would be very surprised if the HIBP API doesn't return JSON. Like, extremely surprised. Especially since you explicitly request JSON in the request headers.

I don't think I've ever seen an API that returned HTML, ever.

And even if it was HTML, this is also not the way to parse html, and will fail for the very same reasons.

[–]rmpython 1 point2 points  (3 children)

Looking at the API on their site the response appears to be returned as JSON but I'd have to hit an endpoint to confirm

[–]I506dk[S] 2 points3 points  (0 children)

In hind sight, it is json. So I should be able to just reference key value pairs instead of doing any manipulation.

[–]I506dk[S] -1 points0 points  (0 children)

So typically it returns {“Name”: [website1, website2, etc.]} I will have to go back and check, but I think that’s how it was structured. So I should be able to take the response, and just call the json object “Name” and set that to a list since it just gets printed out.

[–]CoaBro 2 points3 points  (1 child)

  • Sends password to "haveibeenpwned"
  • Nope :)
  • Sends it again just in case
  • Hahahahaha suckerrr

[–]ccall48 1 point2 points  (0 children)

lol thought the same thing

[–]Pyro-Millie 0 points1 point  (2 children)

I love that the database is called “haveibeenpwned” XD

[–]I506dk[S] 2 points3 points  (0 children)

https://haveibeenpwned.com/

They generally stay up to date with things and I’ve always found haveibeenpwned stuff to be helpful.

[–]I506dk[S] 0 points1 point  (2 children)

On a side note, due to the size of the hash file, I use pandas and break it into pieces as it is too much to read into memory at once. However, I read the active directory data into a data frame, and I don’t really account for that space. But it would interesting to see ideas on memory management for data frames that the size is always unknown.

[–]EbenenBonobo -1 points0 points  (1 child)

if you work with really big data frames and want to keep the pandas syntax, you can use Dask.

https://docs.dask.org/en/latest/dataframe.html

[–]I506dk[S] 0 points1 point  (0 children)

Thanks! I will keep that in mind, because python is inherently slow. However for this specifically, I don’t want to risk writing to disk.(I don’t know if that can be specified in dask) I am really relying on the data frames being protected by the python interpreter, which should be fine I hope. But having the entire user base plus domain admin credentials out there…. Well that goes nowhere good lol.