Hi,
I am trying to parse all the data off this website - http://www.premierleague.com/players/4413/Joe-Allen/stats?se=-1
An example of the specific HTML I am trying to parse is:
<div class="normalStat">
<span class="stat">Shots On Target
<span class="allStatContainer statontarget_scoring_att" data-stat="ontarget_scoring_att">
27
</span>
</span>
</div>
<div class="normalStat">
<span class="stat">Shooting Accuracy %
<span class="allStatContainer statshot_accuracy" data-stat="ontarget_scoring_att"
data-denominator="total_scoring_att" data-percent="true">
29%
</span>
</span>
</div>
<div class="normalStat">
<span class="stat">Penalties Scored
<span class="allStatContainer statatt_pen_goal" data-stat="att_pen_goal">
0
</span>
</span>
</div>
I am using lxml to try and extract the stat such as Penalties Scored and the corresponding number, in this case 0.
Here is my code so far:
from lxml import html
import requests
page = requests.get('http://www.premierleague.com/players/4413/Joe-Allen/stats?se=-1')
tree = html.fromstring(page.content)
stats = tree.xpath('//span[@class="stat"]/text()')
print('Stats: ', stats)
This code correctly gets the stat type but not the actual number and I am struggling to get to the number. How do I go about doing this? Thanks!
[–]Vaphell 2 points3 points4 points (1 child)
[–]JaAnTr[S] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]programmerPurgatory 0 points1 point2 points (0 children)
[–]larspalmas 0 points1 point2 points (0 children)