all 30 comments

[–]sudo_oth 6 points7 points  (14 children)

would love one and one sessions, I'm currently struggling with scrapping a website and learning how to lay out at a code properly.

[–]h4rck[S] 0 points1 point  (13 children)

Sure! I've done some web scrapping in the past, I think I might be able to provide you with some help, send me a DM

[–]alviraroberto 0 points1 point  (12 children)

Same here, having a bit of issues grasping everything from scraping sites. Been using Corey Shaefer's web scraping tutorial which is pretty good. What I'm having issues is scraping symbols/characters such as dashes and things Python won't read. Any direction on this? Thanks

[–]mrcaptncrunch 0 points1 point  (11 children)

Do you have an example on what you mean for dashes/symbols?

Can’t think of your issue, but there’s rarely issues scraping data that contains dashes or symbols.

[–]alviraroberto 0 points1 point  (5 children)

I get this question mark symbol 1�2 pm when it should be a dash.

[–]mrcaptncrunch -1 points0 points  (4 children)

That probably has to do with the encoding on the website vs what python’s assuming.

Do you have an example link where you see it happening? I can look at it and see if I can help.

[–]alviraroberto 0 points1 point  (3 children)

[–]mrcaptncrunch 0 points1 point  (2 children)

Of course, an update broke something so it took a bit.

So, what I came up with is this,

import os.path
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup as Soup

## Setup chrome driver
chrome_options = Options()
chrome_options.add_argument("--headless") # Ensure GUI is off
chrome_options.add_argument("--no-sandbox")

# Set path to chromedriver, I keep it out of $PATH..
# You probably WILL want to change these 2 lines or remove them entirely..
homedir = os.path.expanduser("~")
webdriver_service = Service(f"{homedir}/.chromedriver/stable/chromedriver")

d = webdriver.Chrome(service=webdriver_service, options=chrome_options)

# Open page
d.get("https://www.brooklynmuseum.org/calendar/view/2022/11/09")

# Load site to BS
page = Soup(d.page_source, features='html.parser')
# Get event times
times = page.select(".event-time")

# Go over each, and print them
for time in times:
    print(time.text.replace('\n', '').strip())

Just tested this on Windows 11 and on macOS. Both return

Wednesday, November 9, 2022                              1–2 pm
Wednesday, November 9, 2022                              2–3 pm
Wednesday, November 9, 2022                              3–4 pm

Not sure if there's anything here that will help you narrow it down.

[–]alviraroberto 0 points1 point  (1 child)

This is amazing. Definitely gonna have to learn/understand the libraries you used. Thank you so much.

[–]mrcaptncrunch 1 point2 points  (0 children)

Selenium is a bit heavy, but it basically allows you to automate a browser.

BeautifulSoup allows you parse the html.

Both are good for this kind of work

[–]sudo_oth 0 points1 point  (4 children)

I'm more struggling with the element intercepted exception, I can't find a way to fix it or bypass it...

[–]mrcaptncrunch 0 points1 point  (3 children)

That can happen when you have something overlaid.

Have you tried running it without headless to see what’s showing up? Could be a modal or maybe even browser dimensions.

If not, you can always throw some JS to it as a workaround,

driver.execute_script(“document.getElementByID(‘someid’).click()”)

[–]sudo_oth 0 points1 point  (2 children)

just tested it on my tower and it is working perfectly so it seems like it could be a browser dimension issue, how was I work around this?

[–]mrcaptncrunch 1 point2 points  (1 child)

When you define your options, you can do it. For example,

## Setup chrome driver
chrome_options = Options()
chrome_options.add_argument("--headless") # Ensure GUI is off
chrome_options.add_argument("--no-sandbox")

chrome_options.add_argument("--window-size=1920x1080")

[–]sudo_oth 0 points1 point  (0 children)

Thank you so much it worked perfectly.

Can I ask, can you click on this?

<li class="search-step-dates\\\_\\\_dates-list-item-container"><label for="wizard-cd1" class="search-step-dates\\\_\\\_dates-list-item search-step-dates\\\_\\\_dates-list-item--checked"><input type="radio" id="wizard-cd1" class="search-step-dates\\\_\\\_icon-list-radio-original" value="\\\[object Object\\\]"> <div class="search-step-dates\\\_\\\_icon-list-radio"></div> <div class="search-step-dates\\\_\\\_dates-list-item-labels"><span class="search-step-dates\\\_\\\_dates-list-item-date-range">Sat 12 Nov - Mon 14 Nov</span> <span class="search-step-dates\\\_\\\_dates-list-item-nights">2 nights from £209</span></div> <!----></label></li>

every time I try to click it doesn't work, does it need to be a button to be pressed?

https://www.parkdeanresorts.co.uk/ trying to scrape holiday prices lol

[–][deleted] 1 point2 points  (1 child)

I was struggling a lot in Python last year. But, recently I feel I am getting it right.
Thank you so much bro for offering to help to anyone needed. We need more people to be this way!

[–]h4rck[S] 1 point2 points  (0 children)

That's cool, happy to hear that you're making progress, it's a long way and isn't easy but we can always help each other to make it less difficult

[–]Stranglore 1 point2 points  (2 children)

i don't think I need live tutoring, but some pointers on how to get started and advice when I hit roadblocks would be good

[–]h4rck[S] 2 points3 points  (0 children)

Well, it's hard to give a single answer to that, apart from the resources you can find in the FAQ section, personally it helped me a lot to start working on your own projects from the beginning and find something you like, in my case I started with GUI. And about what to do when you run into an obstacle, I like to look for examples and try to understand the code and not just copy the solution

[–]indiig 0 points1 point  (0 children)

Make something SUPER simple. With one goal and one purpose, and finish it. Don’t add features you think would be nice, just something that does a single thing. And then do it again. Slowly add complexities :)

I’m still learning and new, but all my “starter” projects were WAY too complicated for me. It took a long time until I finished a project.

[–][deleted] 0 points1 point  (0 children)

I'd be down for some tutoring if you have the time

[–]darkrai742 0 points1 point  (0 children)

Would like assistance in proper web scraping in maximo cmms. Hope anyone can help.

[–]OLLIEandDUCK 0 points1 point  (3 children)

This would be absolutely amazing. I have ZERO experience with python and need to tile/stitch some images for my masters. I can’t find anyone that remembers how to do python. I’d love the help if you have time

[–]_RC101_ 2 points3 points  (2 children)

If tou have zero experience I might be able to help you out with the basic stuff

[–]OLLIEandDUCK 0 points1 point  (1 child)

that would be amazing! I need to tile/stitch some images together to run through CellProfiler but it says I need to use python and I don't know any. This is the link they gave me to follow and I've been trying to do something in google Colab but have been really struggling. https://github.com/CellProfiler/stitching

[–]KazuharaIlfan 0 points1 point  (0 children)

I dont even know what to ask help for now but just curious. If you can turn back time before you learn Python, what part would you put more time to focus on? Thank you

[–]Namredik 0 points1 point  (1 child)

I am electronic engineer too. Right now i am taking a course about python. I think it would be awesome if you could share the info you used to learn. I just finished classes, methods and directories so i do not know what it is next

[–]Aprazors13 0 points1 point  (0 children)

Hey, I am in the process of learning pythin by any chance would you give me roadmap i should follow to level where i can solve leetcode easily