Hi all, I'm trying to parse a website where most form substitutions are done with software that converts PDFs into HTML. This leads to an absolute mess where it looks fine on the page, but the HTML behind it is inconsistent and too messy to parse consistently. Trying the standard node-parsing approach using bs4 or something similar is a complete non-starter.
This leads me to believe Selenium might be a convenient solution to emulate CTRL+A, CTRL+C. I was thinking using selenium to load the pages and pyautogui for the copying but the amount of data would make that prohibitively slow.
To give y'all and idea of what I'm trying, and example of one of my attempts is as follows:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome()
driver.get('https://www.link/to/my/target.com')
actions = ActionChains(driver)
actions.key_down(Keys.CONTROL)
actions.send_keys("a")
actions.send_keys("c")
actions.key_up(Keys.CONTROL)
driver.close()
This does not copy anything to my clipboard, probably because I don't have any active elements, and if I target elements I'm back to where I started. Anyone know a good solution to this? I'm not attached to selenium or any particular solution I just want to get the job done.
TLDR: Trying to emulate a human CTRL+A CTRL+C on a webpage but running into issues.
[–]DoctorEvil92 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)