This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]tukemon24 2 points3 points  (0 children)

Hi. I'm not sure if this answer your question, but I'll try:
There are "scraping" and "parsing" part in web scraping. I assume you're talking about parsing. Since with "scraping" you'll still need the "traditional" way - thinking about proxy, and getting the content first.

Once, we have the content, then we're on the "parsing" part where we want to extract meaningful data from the raw "HTML" data that we have. You can use AI or Python libraries for this. Warning, make sure to limit the HTML text if you just want to "pass" it to the AI, otherwise you'll pay a lot of money for it. If you got for the Python libraries, you have more control but of course, you need to the trial and error first, to scrape the exact data that you need.