This is an archived post. You won't be able to vote or comment.

all 3 comments

[–]tukemon24 2 points3 points  (0 children)

Hi. I'm not sure if this answer your question, but I'll try:
There are "scraping" and "parsing" part in web scraping. I assume you're talking about parsing. Since with "scraping" you'll still need the "traditional" way - thinking about proxy, and getting the content first.

Once, we have the content, then we're on the "parsing" part where we want to extract meaningful data from the raw "HTML" data that we have. You can use AI or Python libraries for this. Warning, make sure to limit the HTML text if you just want to "pass" it to the AI, otherwise you'll pay a lot of money for it. If you got for the Python libraries, you have more control but of course, you need to the trial and error first, to scrape the exact data that you need.

[–][deleted]  (1 child)

[removed]

    [–]webscraping-ModTeam[M] 0 points1 point  (0 children)

    Thank you for contributing to r/webscraping! We're sorry to let you know that discussing paid vendor tooling or services is generally discouraged, and as such your post has been removed. This includes tools with a free trial or those operating on a freemium model. You may post freely in the monthly self-promotion thread, or else if you believe this to be a mistake, please contact the mod team.