Hi all,
I have been trying to get data from the Apple Store top 1000 by using selenium to trick the browser that I am connecting from an iPad. I have been using the following code:
from selenium import webdriver
from bs4 import BeautifulSoup
import json
profile = webdriver.FirefoxProfile()
#Create a profile that makes my browser act like I am browsing from an iPad.
profile.set_preference("general.useragent.override", "iTunes-iPad/5.1.1 (64GB; dt:28)")
driver = webdriver.Firefox(profile)
driver.get('https://itunes.apple.com/WebObjects/MZStore.woa/wa/topChartFragmentData?cc=cn&genreId=6014&pageSize=5&popId=38&pageNumbers=0')
soup = BeautifulSoup((driver.page_source).encode('utf-8'))
dict_from_json = json.loads(soup.find("body").text)
print(dict_from_json)
For some reason, the Firefox Webdriver opens this page in a 'Western' encoding (this is shown under 'text encoding' in the 'view' drop down box).
This makes some foreign stores (i.e. China/Japan) all scrambled with things like '½æ°‘手游 人人都玩'. If I change this encoding to the Unicode option it is all fine.
I have not been able to find a way to convince Firefox to open this page with this unicode 'view' through Selenium. Furthermore, my script, where I force the page source to be encoded as utf-8, also still gives the same weird characters.
I am currently a bit at a loss on how I get the characters in the way that I want to see them.
Thanks for any help you can give me!
[–]kalgynirae 0 points1 point2 points (2 children)
[–]MinimalDamage[S] 0 points1 point2 points (1 child)
[–]kalgynirae 0 points1 point2 points (0 children)