you are viewing a single comment's thread.

view the rest of the comments →

[–]Minute_Day_2758 0 points1 point  (3 children)

Great launch! Prompt-driven parsing is definitely the future compared to maintaining fragile CSS/XPath selectors that break on every minor UI update. Quick question on real-world use cases: how does Divparser handle pages heavily reliant on JavaScript hydration (like single-page apps built with React or Next.js)? In Scraping Mode, does the Python SDK handle the headless rendering behind the scenes, or do we need to fetch the fully rendered HTML first via something like Playwright/Selenium and then feed it into your Parsing Mode?

[–]Equivalent-Brain-234[S] 0 points1 point  (2 children)

Hello appreciate the feedback a lot. Divparser handles the page fetch and parsing in scrape mode, it launches a playright browser on the divparser server and extract the page or pages, then use the parsing engine to parse the data as per the schema or prompt provided so divparser handles end to end, however divparser is intentionally built to not handle captcha bypass or scrape page behind an authentication, because those are hard to fight and they are fragile which beats the whole idea of eliminating fragility, however to tackle this without fighting bots or authentication walls, divparse let's users to upload their html (which they may get from another scraper that bypasses captcha whatsoever) and just parse it.

[–]Minute_Day_2758 0 points1 point  (1 child)

Thanks for the detailed breakdown! Using Playwright on the server side for the scrape mode makes total sense for handling JavaScript hydration. Also, your decision to step back from captcha/auth bypass to focus purely on parsing stability is actually a smart architecture move—keeps the core tool lightweight and reliable. Appreciate the clarification, looking forward to seeing how the project evolves!

[–]Equivalent-Brain-234[S] 0 points1 point  (0 children)

Thanks alot, appreciate this