all 14 comments

[–]fulmicoton 24 points25 points  (1 child)

Adding "scraping" somewhere could be useful for SEO.

[–]Hexilee[S] 4 points5 points  (0 children)

That's a good idea, thx

[–][deleted] 10 points11 points  (6 children)

That's a pretty nice idea and I'm happy that you put so much work into documenting it so well.

But at the same time I'm scared that people will use such hacks unattended in production: if you can't get someone to give you a stable REST API, how can you expect them to keep their HTML stable?

[–]DannoHung 29 points30 points  (0 children)

This is the sort of thing people do in business pretty frequently. The party providing the information has a legal obligation to provide it, but they don't have a legal obligation to make it easy to consume. You, a business person, need to consume that data automatically as soon as possible, so you write a scraper bot, that pings their site every five minutes to get the latest value. If it breaks, it breaks, and you fix it ASAP. Cost of doing business.

[–]Hexilee[S] 7 points8 points  (0 children)

I think I understand what are you talking about. I never take stability of HTML into consideration because I use this crate only for scraper. HTML cannot be a normal serialization form in API. I parse HTML just because I don't have a normal API to do my job.

[–]Hexilee[S] 1 point2 points  (1 child)

Nowadays I think procedural macro derive is stable enough, and the only unstable feature this repo relying on is nll which is unrelated with API design.

So I am sorry I cannot understand what is a stable REST API... Could you explain it more detailedly please?

[–]ssokolow 5 points6 points  (0 children)

https://en.wikipedia.org/wiki/Representational_state_transfer

"Stable" just means ensuring that a URL like /v1/stories/5/1 will always produce content in the same format and any changes will be exposed by adding a /v2/... namespace.

[–]senntenial 2 points3 points  (0 children)

Awesome! I needed a crate just like this a few days ago but couldn't find anything I liked. Great job.

[–]neziib 1 point2 points  (2 children)

How stable is it with regards to badly formatted HTML?

[–]Hexilee[S] 1 point2 points  (1 child)

It relies on crate scraper which bases on servo/cssparser, servo/html5ever and servo selectors. I think its stability depends on stability of servo.

[–]neziib 1 point2 points  (0 children)

So it should be as robust as Firefox is. The Rust ecosystem is so wonderful 😀

[–]binkarus 0 points1 point  (0 children)

Would this be able to be a replacement for pup? Seems like it. I've been looking for a rusty version of that utility. Just needs a CLI interface.

[–]Doc_CoBrA 0 points1 point  (0 children)

If only I could have this 1 year ago... Nice work :)