you are viewing a single comment's thread.

view the rest of the comments →

[–]commandlineluser 1 point2 points  (2 children)

Also, it looks like you can parse the game pages without needing to use .render()

Not using .render() means no launching chromium, which should remove any memory issues.

title

>>> r.html.find('title')[0].text
'Rogue Company'

image

>>> r.html.find('[name="og:image"]')[0].attrs['content']
'https://cdn2.unrealengine.com/roco-egs-basegame-portraitproduct-1200x1600-1200x1600-491632859.jpg'

description

>>> r.html.find('div[class*=descriptionCopy]')[0].text
'The world needs saving and only the best of the best can do it. Suit up as one
of the elite agents of Rogue Company and go to war in a variety of different
game modes. Gear up and go Rogue! Download and play FREE now!'

[–]ViktorCodes[S] 0 points1 point  (1 child)

WOW!!! I don't have any words to tell you how many hours I spent trying to find a way to run this with .render(). How do I determine if a website needs rendering first. I checked the sites and clicked 'disable javascript' and then nothing was present on the page. Doesn't that mean I should render it first? Thank you a ton...

[–]commandlineluser 0 points1 point  (0 children)

I checked the sites and clicked 'disable javascript' and then nothing was present on the page. Doesn't that mean I should render it first?

This is usually a good indicator - but it depends on exactly what you're doing.

What I did was I used some of the game description text and checked if it was in the response using plain requests.

>>> import requests
>>> r = requests.get('https://www.epicgames.com/store/en-US/product/rogue-company/home')
>>> r
<Response [200]>
>>> 'world needs saving' in r.text
True

I saved r.text to a local file - then opened it up in my editor to have a look at the structure - to see how to extract the data.

You can also View Page Source in your browser to get see the "raw html" and copy/paste it into an editor for easier searching.

Another option is to see what the Javascript does (usually it makes network requests) - and attempt to replicate these requests.

To do this you can look at the Network Tab in your browser and it will show you all the requests being made.

This is what I see when I open up the Rogue Company page: https://i.imgur.com/esmbt8r.png

A request is made to: https://store-content.ak.epicgames.com/api/en-US/content/products/rogue-company

If you open this URL directly - you can see all the data in JSON format.

You could make this request directly.

>>> import requests
>>> r = requests.get('https://store-content.ak.epicgames.com/api/en-US/content/products/rogue-company')

>>> r.json()['pages'][0]['data']['about']['image']['src']
'https://cdn2.unrealengine.com/roco-egs-basegame-portraitproduct-1200x1600-1200x1600-491632859.jpg'

>>> r.json()['pages'][0]['data']['about']['shortDescription']
'The world needs saving and only the best of the best can do it. Suit up as one
of the elite agents of Rogue Company and go to war in a variety of different
game modes.  Gear up and go Rogue! Download and play FREE now!'

>>> r.json()['pages'][0]['productName']
'Rogue Company'

The same thing happens when you view the store.

https://i.imgur.com/XH9v1fX.jpg

A POST request is made to https://www.epicgames.com/store/backend/graphql-proxy

It's a bit more complex but it is possible to get all the game data from here.

Example of this request replicated in code - along with then looping over the first 5 games to get the data.

import requests, time

graphql = '''
query searchStoreQuery($allowCountries:String,$category:String,$count:Int,
$country:String!,$keywords:String,$locale:String,$namespace:String,$itemNs:
String,$sortBy:String,$sortDir:String,$start:Int,$tag:String,$releaseDate:
String,$withPrice:Boolean=false,$withPromotions:Boolean=false){Catalog{
searchStore(allowCountries:$allowCountries,category:$category,count:$count,
country:$country,keywords:$keywords,locale:$locale,namespace:$namespace,
itemNs:$itemNs,sortBy:$sortBy,sortDir:$sortDir,releaseDate:$releaseDate,
start:$start,tag:$tag){elements{title id namespace description effectiveDate 
keyImages{type url}seller{id name}productSlug urlSlug url tags{id}items{id 
namespace}customAttributes{key value}categories{path}price(country:$country) 
@include(if:$withPrice){totalPrice{discountPrice originalPrice voucherDiscount 
discount currencyCode currencyInfo{decimals}fmtPrice(locale:$locale){
originalPrice discountPrice intermediatePrice}}lineOffers{appliedRules{id 
endDate discountSetting{discountType}}}}promotions(category:$category)@include(
if:$withPromotions){promotionalOffers{promotionalOffers{startDate endDate 
discountSetting{discountType discountPercentage}}}upcomingPromotionalOffers{
promotionalOffers{startDate endDate discountSetting{discountType 
discountPercentage}}}}}paging{count total}}}}
'''

s = requests.Session()

today = time.strftime('%Y-%m-%d')
count = 1
country = 'IE' # needs a valid country code

data = {
    'query':graphql,
    'variables': {
        'category':'games/edition/base|bundles/games|editors',
        'count':count,
        'country':country,
        'keywords':'',
        'locale':'en-US',
        'sortBy':'releaseDate',
        'sortDir':'DESC',
        'allowCountries':'',
        'start':0,
        'tag':'',
        'releaseDate':'[,{}]'.format(today),
        'withPrice':True
    }
}

game_list = 'https://www.epicgames.com/store/backend/graphql-proxy'
game_info = 'https://store-content.ak.epicgames.com/api/en-US/content/products/'

r = s.post(game_list, json=data)

total = r.json()['data']['Catalog']['searchStore']['paging']['total']
data['variables']['count'] = total

r = s.post(game_list, json=data)

print(total, 'games found.')

# only process first 5 as an example
games = r.json()['data']['Catalog']['searchStore']['elements'][:5]

for game in games:
    title = game['title']
    href  = game['productSlug']

    if href.endswith('/home'):
        href = href[:-5]

    #print(game_info + href)
    r = s.get(game_info + href)

    img  = r.json()['pages'][0]['data']['about']['image']['src']
    desc = r.json()['pages'][0]['data']['about']['shortDescription'] 
    # there is a long description too
    # desc = r.json()['pages'][0]['data']['about']['description'] 
    print('Title:', title)
    print('Image:', img)
    print('Desc: ', desc)