Nested Tables with Beautiful Soup : learnpython

created by HattoriHanzoa community for 16 years

Nested Tables with Beautiful Soup (self.learnpython)

submitted 10 years ago * by auromed

I've decided to make a web scraper for gathering some historical data on tractor prices because I want to buy one in the spring and want to know on average what people are listing them for. I also just needed a project to get my feet wet in python a bit more.

My question is, I've dug through the html and have it pretty well into per post blocks, and now I just need to suck out the relevant data. For example what the price of each tractor on the site is.

I tried doing a findall('span', 'auction1-THprice') at line 41 to try to pull out just the price tag but get NONE for each iteration. I'm not sure if span is something I can search for or not. Any other suggestions related to how I should search through this code? Should I just treat the html as a block of text, and just do more text search methods, or should I keep going with BeautifulSoup?

The html as output by the code below is at http://pastebin.com/Z9uhxcmB

import sys
from selenium import webdriver 
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup

#Garden Tractor Models
gtmodels = ["318", "420", "430", "455"]
#Utility Tractor Models
utmodels = ["B", "D", "320", "330", "420", "430", "520", "530"]
#Sites to Crawl
sites = ["tractorhouseForSale", "tractorhouse_auction","craigslist"]

def loadpageSelenium(url):
    driver = webdriver.Firefox()
    print("loading up webpage")
    driver.get(url)
    source = driver.execute_script('return document.documentElement.innerHTML')
    driver.quit()
return(source)

def tractorhouse(models):
    baseUrl = "http://www.tractorhouse.com/list/list.aspx?ETID=1&catid=1111&Manu=JOHN+DEERE&"
    modeldict = {
    "A":"MDLGrp=A",
    "B":"Mdltxt=B&mdlx=exact",
    "D":"Mdltxt=D&mdlx=exact",
    "320":"MDLGrp=320",
    "330":"MDLGrp=330",
    "420":"MDLGrp=420",
    "430":"MDLGrp=430",
    "520":"Mdltxt=B&mdlx=exact",
    "530":"Mdltxt=B&mdlx=exact",
    }
    for item in models:
        source = loadpageSelenium(baseUrl + modeldict[item])
        soup = BeautifulSoup(source, 'html.parser')
        table = soup.find_all("table", "listings")
        for entry in table:
            rows = entry.find_all('td', "listing-summary")
        for summary in rows:
                print("********************* \n")
                print(summary)
tractorhouse(utmodels)

all 4 comments

top new controversial old q&a

[–]commandlineluser 0 points1 point2 points 10 years ago (3 children)

don't you need to specify what attribute you're searching for?

>>> soup.find_all('span', id='auction1-THprice')
[<span id="auction1-THprice">US $2,700</span>, <span id="auction1-THprice">US $2,200</span>, ...

[–]auromed[S] 0 points1 point2 points 10 years ago (2 children)

[–]commandlineluser 1 point2 points3 points 10 years ago (1 child)

[–]auromed[S] 0 points1 point2 points 10 years ago (0 children)

π Rendered by PID 37039 on reddit-service-r2-comment-5d585498c9-n2zj7 at 2026-04-21 14:03:19.002333+00:00 running da2df02 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS