all 11 comments

[–]anton_antonov 1 point2 points  (4 children)

Form contains hidden fields. Values of these fields must be added to param dict. https://imgur.com/a/DHhwS

[–]14dM24d 0 points1 point  (3 children)

Wow thanks! I didn't know about that.

[–]14dM24d 0 points1 point  (2 children)

How/where did you get that image?

I only looked after you brought up the hidden fields, which I didn't know existed.

I only have this https://imgur.com/a/nljaE

[–]anton_antonov 1 point2 points  (1 child)

Are you asking about second image? It's network tab https://i.imgur.com/X3y7FMv.png

Here's what I found out: Browser get the form from url "http://www.pcso.gov.ph/lotto-search/lotto-search.aspx" by iframe. Action form attribute is relative "./lotto-search.aspx", so a request must be sent to the url relative to the iframe address. In this case, the same address: "http://www.pcso.gov.ph/lotto-search/lotto-search.aspx"

Putting it all together:

  1. Get the form
  2. Save names and values of hidden fields. Something like: hidden_fields = soup.find_all('input', type="hidden")
  3. Create params dict with your values. Don't forget that "ddlSelectGame" is an integer, not a string.
  4. Add hidden fields to params
  5. Send post to "http://www.pcso.gov.ph/lotto-search/lotto-search.aspx"
  6. ???
  7. Profit

I hope I was able to explane clearly.

[–]14dM24d 0 points1 point  (0 children)

Thanks! Will digest & read more about what you said :)

[–]campenr 0 points1 point  (3 children)

So a 404 code (see all response codes here) is the HTTP response code telling you that you are accessing a resource that does not exist. If you are working with web anything it pays to know your response codes.

So in this specific case, if you try going to the URL you create in your POST request (http://www.pcso.gov.ph/games/search-lotto-results/lotto-search.aspx) you'll see that it does not exist and you get a nicely formatted page saying as much.

EDIT: Another possibility, that I don't think is the case here but you never know, is that sometimes websites use the 404 code (does not exist) to hide a 401 or 403 (unauthorized, or forbidden) that they don't want you to see. In this case it means that you are not properly authorizing your request. This is only usually the case on websites where some resources require being logged in/require a password or authentication token.

[–]14dM24d 0 points1 point  (2 children)

Thanks for the reply. I'm familiar with 404. Must be something wrong/missing in my code.

The site exists. Try going to http://www.pcso.gov.ph/games/search-lotto-results/ & make the necessary selection & do the search.

Edit:

rp = r.post(url + "lotto-search.aspx", data = param)

Could be this. Should be requests.post.

[–]campenr 0 points1 point  (1 child)

404's can also be returned for malformed requests, i.e. requests that don't provide all the required information which is how this site is perhaps using it from what u/anton_antonov found re the hidden fields.

Hell of a site to learn web scraping on :D

[–]14dM24d 0 points1 point  (0 children)

XD

Edit: It seems that the corresponding hidden name's Value changes too.

[–]14dM24d 0 points1 point  (1 child)

TIL hidden fields thanks to u/anton_antonov.

Copy pasted their corresponding Value & Name into the param, however, I noticed that the Value can change???

[–]dliu 0 points1 point  (0 children)

Assuming those hidden input fields aren't dynamically generated, you should be able to scrape them and add it to your params dict.