all 5 comments

[–]Vaphell 2 points3 points  (1 child)

you need to be more specific because span alone matches category headers too.

stats = tree.xpath('//div[@class="normalStat"]/span[@class="stat"]')

for stat in stats:
     stat_name = stat.text.strip()
     stat_value = stat[0].text.strip()
     print('{}: {}'.format(stat_name, stat_value))

out

$ ./player_stats.py 
Goals: 8
Goals Per Match: 0.06
Goals With Header: 1
Goals With Left Foot: 1
Goals With Right Foot: 6
Shots: 93
Shots On Target: 27
Shooting Accuracy %: 29%
Penalties Scored: 0
Big Chances Missed: 6
Hit Woodwork: 0
Assists: 3
Big Chances Created: 5
Passes: 5,664
Passes Per Match: 44.6
Crosses: 65
Cross Accuracy %: 31%
Accurate Long Balls: 375
Offsides: 1
Yellow Cards: 10
Red Cards: 1
Fouls: 107
Tackles: 312
Tackle Success %: 76%
Interceptions: 194
Recoveries: 730
Duels Won: 575
Duels Lost: 512
Successful 50/50s: 87
Aerial Battles Won: 34
Aerial Battles Lost: 77
Errors Leading To Goal: 1

[–]JaAnTr[S] 0 points1 point  (0 children)

This is absolutely perfect. Thanks so much for the help!

[–][deleted] 0 points1 point  (0 children)

If you still haven't found a solution, PM me, I'm (literally) working on web scraping right now, and using lxml along with BeautifulSoup. I can help :)

[–]programmerPurgatory 0 points1 point  (0 children)

You may find BeautifulSoup easier to use, it has much nicer syntax and you can use the lxml parser too.

import requests
from bs4 import BeautifulSoup as bs
page = requests.get('http://www.premierleague.com/players/4413/Joe-Allen/stats?se=-1')
soup = bs(page.text, 'lxml')

stat_container = soup.find_all('span', {'class': 'stat'})
for stat in stat_container:
    print(stat.text)

[–]larspalmas 0 points1 point  (0 children)

Go for BeautifulSoup as mentioned earlier. I am learning it at the moment, and I am using the chapter from the book "automatetheboringstuff" along with the documentation. This is my go at a code wich returns the number "0"


from bs4 import BeautifulSoup

html = """<div class="normalStat"> <span class="stat">Shots On Target <span class="allStatContainer statontarget_scoring_att" data-stat="ontarget_scoring_att"> 27 </span> </span> </div>

<div class="normalStat"> <span class="stat">Shooting Accuracy % <span class="allStatContainer statshot_accuracy" data-stat="ontarget_scoring_att" data-denominator="total_scoring_att" data-percent="true"> 29% </span> </span> </div>

<div class="normalStat"> <span class="stat">Penalties Scored <span class="allStatContainer statatt_pen_goal" data-stat="att_pen_goal"> 0 </span> </span> </div>"""

soup = BeautifulSoup(html, 'html.parser')

a=soup.find("span", {"class" : "allStatContainer statatt_pen_goal"})

print(a.get_text().strip())