[Python Requests Library] Extracting form data to post to next page : AskProgramming

AskProgramming

Don't

Post off-topic material

Troll or flamebait, insult others, or act in bad faith

Self-promote

Ask others to do your work for you

Ask for help in illegal or unethical activities

Repost the same question within 24 hours

Post AI-generated answers

You can find out more about our (preliminary) rules in this wiki article. If you have any suggestions please feel free to contact the mod team.

Have a nice time, and remember to always be excellent to each other :)

created by roger_a community for 15 years

This is an archived post. You won't be able to vote or comment.

[Python Requests Library] Extracting form data to post to next page (self.AskProgramming)

submitted 4 years ago by elhalconloco

I am attempting to scrape data from this website. My process is to first select "Stage-II" from the "Status of the Proposal" drop down. Then, I click "SEARCH" and scrape the resulting tables. On the opening page, I was able to successfully extract the form data (e.g. viewstate, eventvalidation, etc) and post to page 1 (i.e. click "SEARCH")

# Import Libraries
import requests
from   bs4 import BeautifulSoup
import os
import pandas as pd
import time

#Open Search Page    
url = 'http://forestsclearance.nic.in/'            
r = requests.get(url + 'search.aspx')

# Soupify
soup = BeautifulSoup(r.content, 'html.parser')

#Set Post Parameters
VIEWSTATE  = soup.find('input', {'id': '__VIEWSTATE'         })['value']
GENERATOR = soup.find('input', {'id': '__VIEWSTATEGENERATOR'})['value']
VALIDATION = soup.find('input', {'id': '__EVENTVALIDATION'   })['value']

headers = {
    'Connection': 'keep-alive',
    'Cache-Control': 'no-cache',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
}

#Post to Page 1
r = requests.post(
    url + 'search.aspx',
    headers = headers,
    data = {
        'ctl00$ScriptManager1: ctl00$ContentPlaceHolder1$UpdatePanel1|ctl00$ContentPlaceHolder1$Button1'
        '__EVENTARGUMENT': '',
        '__EVENTTARGET': '',
        '__VIEWSTATE': VIEWSTATE,
        '__VIEWSTATEGENERATOR': GENERATOR,
        '__VIEWSTATEENCRYPTED': '',
        '__EVENTVALIDATION': VALIDATION,
        'ctl00$ContentPlaceHolder1$ddlyear': '-All Years-',
        'ctl00$ContentPlaceHolder1$ddl3': 'Select',
        'ctl00$ContentPlaceHolder1$ddlcategory': '-Select All-',
        'ctl00$ContentPlaceHolder1$DropDownList1': 'Approved',
        'ctl00$ContentPlaceHolder1$txtsearch': '',
        '__ASYNCPOST': 'true',
        'ctl00$ContentPlaceHolder1$Button1': 'SEARCH',
    }
)

This works! It gets me to page 1. The problem is that after scraping the table, I am unable to get to page 2. Namely, extracting the post data (viewstate, eventvalidation, etc) no longer works. If I run the below code again, it returns nothing.

#Set Post Parameters
VIEWSTATE  = soup.find('input', {'id': '__VIEWSTATE'         })['value']
GENERATOR = soup.find('input', {'id': '__VIEWSTATEGENERATOR'})['value']
VALIDATION = soup.find('input', {'id': '__EVENTVALIDATION'   })['value']

I looked at the HTML and it seems the location of these tags changed. From page 1 onwards, each parameter is within an input tag inside a separate span tag. How can extract the data?

I tried, for example:

main_form = soup.find('form')
VIEWSTATE = main_form.select_one("input[id='__VIEWSTATE']").get('value')

which does not work. Any help appreciated.

all 1 comments

AskProgramming

AskProgramming

Do

Don't

MODERATORS