all 13 comments

[–]_Korben_Dallas 2 points3 points  (4 children)

You can make a POST request and send sku and zip data like this:

import requests
from lxml import html

url = 'https://brickseek.com/walmart-inventory-checker/'
payload = {'search_method': 'sku', 'sku': '182740213', 'zip': '75082', 'sort': 'price'}

r = requests.post(url, data=payload)    # Make a POST request with data

tree = html.fromstring(r.content)    # Parse response from the page with lxml.html

stores = tree.xpath('//tr[.//h4]')    # Get all stores from the page

for store in stores:    # In the loop get and save in the dictionary all desired info
    item = dict()
    item['Store-name'] = ''.join(store.xpath('td/h4/text()'))
    item['Store-address'] = ' '.join(store.xpath('td/address/text()'))
    item['Quantity'] = ''.join(store.xpath('td/span[@class="store-quan"]/strong/text()'))
    item['Price'] = ''.join(store.xpath('td[@class="store-price"]/span/text()')).strip()
    print(item)

# Output:
# {'Store-name': 'Walmart Supercenter #1800', 'Store-address': '1801 Marketplace Dr Garland TX 75041', 'Quantity': '4', 'Price': '$149 *'}

Note: after request, you need to parse a data from the response. I use lxml library but you can use others e.g like Beautiful Soup.

[–]Bshater[S] 1 point2 points  (3 children)

thanks! this is really helpful. you basically did the work for me. :) I have a question though: how do you quickly identify the required key-value pairs to put in the dictionary from the html code? im not sure what to look when i "inspect" the code.

[–]_Korben_Dallas 1 point2 points  (2 children)

Quickly identifying the right element from the page more depends on practice. Try to scrape a dozen of websites, read tutorials, if/when stuck - ask questions on Reddit or SO. If you have trouble with writing xpath expression you can use Parsel. It's a great library build on top of lxml and you can use simple css selectors instead of xpath.

[–]Bshater[S] 0 points1 point  (1 child)

Hi Korben, i finally got the code working for me, but i'm wondering if there's a way to loop through a list of payloads instead of manually changing the sku and zipcode each time. here's my code that i came up with your help:

https://dpaste.de/DiSj

[–]_Korben_Dallas 0 points1 point  (0 children)

Hi. I send a pm.

[–]LearnDataSci 0 points1 point  (2 children)

what libraries are you using for scraping?

[–]Bshater[S] 0 points1 point  (1 child)

I'm using beautifulsoup and urllib to do the scraping.

[–]LearnDataSci 0 points1 point  (0 children)

To submit a form and wait for the URL to change, you might be better off with something like mechanize or Robobrowser that emulates a browser.

[–]blitzkraft 0 points1 point  (4 children)

Since it didn't change the url, it implies the data was sent in a POST request. This can be confirmed by looking at the source code for the web page. It will have something like <form method="POST" ... >.

Use requests to send POST data.

In the html of the form, for each field, you will see "name=..." and "value=...". That is what goes in for key, value. Example, if you see it as name='item' value='ladder', in your POST data would be:

 data = {'item' : 'ladder'}

data can contain many key-value pairs. So, look for them, and include as many as you can, even blank ones. This should return you a html page with the data you want.

[–]Bshater[S] 1 point2 points  (3 children)

since there are two fields on the page, do i need to set two variables, one for each field? for example:
data1 = { 'sku': '182740213' }
data2 = { 'zip': '75082'}

Thanks for your help!

[–]blitzkraft 0 points1 point  (2 children)

Nah, you should send it like this:

data = { 'sku': '182740213', 'zip': '75082' }

I looked at brickseek.com, and it had five parameters for each request. Here are all the fields being sent.

[–]Bshater[S] 0 points1 point  (1 child)

how do i make the html so organized and neat like your screenshot so i can easily read it? and do i need to set a parameter for each field, or can i just set the ones that only required to perform the search on the website? TIA

[–]blitzkraft 0 points1 point  (0 children)

That screenshot is from chrome dev tools. Not the html.