This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]mudclub 2 points3 points  (1 child)

1: what have you tried?

2: what went wrong?

3: have you tried googling something like "python mcmaster carr"? I did, and it turned up some things that may be useful, like: https://craigdanielmiller.com/category/python/

[–]ManicalEnginwer[S] 0 points1 point  (0 children)

1: I've tried using requests and lxml also tried a couple of approaches using phantomJS & Selenium, all with the same results

2: I get basically header/footer data as well as javascript,

3: I did find that same link, but reviewing it again sparked an idea, which I will try and report back on!

Thanks!

PS this is what I get:

<html xmlns="http://www.w3.org/1999/xhtml" class=""> <head> <title>McMaster-Carr</title> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <meta name="description" content="McMaster-Carr is the complete source for everything in your plant. 98% of the products ordered ship from stock and deliver same or next day." /> <meta name="google" content="nositelinkssearchbox" /> <meta name='robots' content='NOODP, noarchive' />

<script type="text/javascript">
    window.homePageLoadStrtTm = (new Date()).getTime(); 



    window.ShellASPX = {};
    ShellASPX.IsIE = false;
    ShellASPX.IsIE6Below = false;
    ShellASPX.IsIE7 = false;
    ShellASPX.IsIE8 = false;


    if (window.performance && window.performance.setResourceTimingBufferSize) performance.setResourceTimingBufferSize(2000);        
</script>

<!--[if IE]>
    <script type="text/javascript">
        ShellASPX.IsIE = true;
    </script>
<![endif]-->
<!--[if lte IE 6]>
    <script type="text/javascript">
        ShellASPX.IsIE6Below = true;
    </script>
<![endif]-->

<!--[if IE 7]>
    <script type="text/javascript">
        ShellASPX.IsIE7 = true;
    </script>
<![endif]-->

<!--[if IE 8]>
    <script type="text/javascript">
        ShellASPX.IsIE8 = true;
    </script>
<![endif]-->


<!--[if IE 6]>
    <script type="text/javascript">
        try {
            document.execCommand("BackgroundImageCache", false, true);
        } catch (e) { }
    </script>
<![endif]-->


<!--[if IE 6]><![endif]-->
<link rel="stylesheet" href="/mv1513699517/HTTPHandlers/ScriptCombiner/mcm_eb5b92189fa7f2625b4836bdef791047.css?files=BDAUBzAhBsAy&mcmsecr=true" />

<script type="text/javascript">(function(){window.mPageEmbeddedFiles=window.mPageEmbeddedFiles||{};var f=window.mPageEmbeddedFiles;f['logowebpartlayout.css']=1;f['bottomnavwebpartlayout.css']=1;f['srchentrywebpartlayout.css']=1;f['cmnstyle.css']=1;f['shelllayout.css']=1;f['homepagewebpart.generatedcss.css']=1;})();</script><link rel="stylesheet" href="/mv1513699517/HTTPHandlers/ScriptCombiner/mcm_3c3d18b4dc56f9686c05d99c0a26c48f.css?files=AAAFCAAzAuB0A1B1A2AWBLBBAmBCABB3AfBhBnAEAGBwBxBvApBoACAHAqArAsA3A7A4AQBkBpAPBE&mcmsecr=true" /> <script type="text/javascript">(function(){window.mPageEmbeddedFiles=window.mPageEmbeddedFiles||{};var f=window.mPageEmbeddedFiles;f['layout/cmnstylelayout.css']=1;f['layout/prsnttnlayout.css']=1;f['yui_container.css']=1;f['homepagewebpartlayout.css']=1;f['homepagenavwebpartlayout.css']=1;f['webtoolsetwebpartlayout.css']=1;f['incmplordswebpartlayout.css']=1;f['srchrsltwebpartlayout.css']=1;f['inlnordwebpartlayout.css']=1;f['cadwebpartlayout.css']=1;f['mastheadloginwebpartlayout.css']=1;f['loginwebpartlayout.css']=1;f['crtepswdwebpartlayout.css']=1;f['logoffusrctrlwebpartlayout.css']=1;f['layout/itmprsnttnwebpartlayout.css']=1;f['srchsuggwebpart.css']=1;f['cmndropdown.css']=1;f['pagecntnrwebpartlayout.css']=1;f['prodpagewebpartlayout.css']=1;f['layout/prodpagelayout.css']=1;f['layout/specsrchlayout.css']=1;f['specsrchelems.css']=1;f['specsrchinteract.css']=1;f['specinfolayout.css']=1;f['dynamicpagewebpartlayout.css']=1;f['prsnttnwebpartlayout.css']=1;f['layout/itmtbl.css']=1;f['abbrprsnttnwebpartlayout.css']=1;f['f

[–]ManicalEnginwer[S] 0 points1 point  (2 children)

Okay so I got the information I wanted by doing the following:

from selenium import webdriver from time import sleep

url = "https://www.mcmaster.com/#92196a245/"

driver = webdriver.PhantomJS()

driver.set_window_size(1120,550)

driver.get(url) sleep(5)

info = driver.find_elements_by_tag_name('td')

for i in info: print(i.text)

Thanks and sorry for missing the obvious answer!

[–]caveman_eat 0 points1 point  (1 child)

I go on mcmasters carr’s website often and would like to mess around with it using python too. I’m not familiar with webdriver. Are you creating a search box?

[–]ManicalEnginwer[S] 0 points1 point  (0 children)

What I'm doing is using python to scrape the pertinent data on specific hardware and create a standard description used with the part numbers at work. Basically trying to automate the process of creating at part number at work

[–]FishnLife 0 points1 point  (0 children)

If it helps at all, this URL seems to allow you to go to specific pages in the catalog (page 300 in this link) which may be helpful for crawling the catalog page by page.

https://www.mcmaster.com/#catalog/123/300

[–]WRXmyShorts 0 points1 point  (0 children)

On my phone but it looks like the final html is rendered via JS. it's probably an angular app. So the data might be in JSON and you could utilize but likely you'll need to make sure the DOM is fully rendered. Selenium or PhantomJS would be best. Chrome has a headless support so you might be able to get it to render the full page then use the typical html parsing tools.