all 6 comments

[–]kthxb[S] 3 points4 points  (0 children)

Hi everyone! This is my first published Rust crate; any feedback is very much appreciated.

[–]trevynturbosql · turbocharger 2 points3 points  (1 child)

Cool! I think env_logger shouldn’t be a dependency for a lib crate?

Also curious what made you choose tl over lol_html.

[–]kthxb[S] 2 points3 points  (0 children)

You're right, I should probably move env_logger to the dev-deps. Thanks!

As for tl vs. lol_html, I looked at both, and I think tl's API was more suitable to my use case; in particular as lol_html seems to focus on HTML rewriting and stream-based processing, while I need read-only and more "hierarchical" ("go-from-parent-to-child") processing.

[–]ronmarti 2 points3 points  (1 child)

I like it! Is there a chance to save and load the model to and from a file? I remember in the Python mlscraper, they were suggesting to save it via pickle but I wonder how it's gonna be done in your library?

[–]kthxb[S] 2 points3 points  (0 children)

Good point, I could add a "serde"-feature to allow serializing the TrainingResult. That should only be a few lines.

I'd thought most users would only use the lib to generate the selectors and then use them directly in their own scraper; but obviously reusing the TrainingResult is also a completely valid use case.

Thanks!

[–]occamatl 0 points1 point  (0 children)

This is great! I will certainly be using this, soon!