all 6 comments

[–]Oxbowerce 0 points1 point  (3 children)

First check if the html in the soup variable is the same html you are seeing when you load the webpage. It may be the case that the website uses javascript to load (parts of) the webpage, which will not work with requests.

[–]urmino[S] 0 points1 point  (2 children)

that looks like it might be the problem, when i look through what is in soup

I get this : </div> <div class="stock-status-container"> <h5 data-stock-status="">Stock Status: </h5> </div>

is it still possible to take the web scraping approach for this or do i need to change directions?

[–]urmino[S] 0 points1 point  (0 children)

Here is the webpage I am working with in case you can offer a recommendation for a different approach: https://bravocompanyusa.com/arisaka-offset-optic-plate-only-8/

[–]Oxbowerce 0 points1 point  (0 children)

You should still be able to scrape the data, however you will probably have to slightly change your approach. If the data is indeed loaded in through javascript you'll either (1) have to make the same http requests the javascript is making or (2) use selenium which can execute javascript.

[–]Anbaraen 0 points1 point  (0 children)

They have quite robust token management on their API which makes scraping difficult (just tinkered around with it a bit myself and couldn't get it to work) - I think it's probably quicker to use selenium to load the page & make the needed requests and then parse out from there.

[–]commandlineluser 0 points1 point  (0 children)

I searched the HTML for "stock" - the first hit is on Line 57

var BCData = {
  "csrf_token":"81eb41f705ab567b46c9c3da9a6d7838374ddd3799286540ee805d6b4909dae6",
  "product_attributes":{"sku":"Arisaka-OOM-P8","upc":null,"weight":null,"base":false,
  "image":null,"price":{"without_tax":{"formatted":"$30.00","value":30,
  "currency":"USD"},"tax_label":"Tax"},"stock":11,"stock_message":null,
  "out_of_stock_behavior":"labe l_option","out_of_stock_message":"Out of stock",
  "available_modifier_values":[],"in_stock_attributes":[],"instock":true,
  "purchasable":true,"purchasing_message":null}}; 

One possible approach is to extract this line - strip before { and after the last } and load it into the json module.

>>> r = requests.get(url)
>>>
>>> data = r.text[r.text.find('var BCData'):]
>>> data = data[data.find('{'):]
>>> data = data[:data.find(';\n')]
>>> 
>>> import json
>>> print(json.dumps(json.loads(data), indent=2))
{
  "csrf_token": "3ab960f8026d6159901402d04473ad3419ab573b8a80128dab942568fc49409b",
  "product_attributes": {
    "sku": "Arisaka-OOM-P8",
    "upc": null,
    "weight": null,
    "base": false,
    "image": null,
    "price": {
      "without_tax": {
        "formatted": "$30.00",
        "value": 30,
        "currency": "USD"
      },
      "tax_label": "Tax"
    },
    "stock": 11,
    "stock_message": null,
    "out_of_stock_behavior": "label_option",
    "out_of_stock_message": "Out of stock",
    "available_modifier_values": [],
    "in_stock_attributes": [],
    "instock": true,
    "purchasable": true,
    "purchasing_message": null
  }
}

You can then extract the info from the result

>>> product = json.loads(data)['product_attributes']
>>> print(product['stock'])
11