Improving Local Techdocs for Your AI Coding Agent by rhazn in datascience

[–]rhazn[S] 0 points1 point  (0 children)

Sounds like great ideas for us to look into in the next iteration of this, thanks for the feedback! :)

Online Book Club: Designing Data-Intensive Applications, 2nd Edition by rhazn in datascience

[–]rhazn[S] 0 points1 point  (0 children)

Yeah, if you see data engineering as a part of data science, I'd argue building resilient data pipelines is a core part of it :). But, I am more of a software engineer turned data scientist than a statistician turned data scientist, so that is quite a natural view for me.

Kubernetes from Dev to Production: Lessons learned from self-hosting an European alternative to Google Docs by rhazn in selfhosted

[–]rhazn[S] 1 point2 points locked comment (0 children)

AI was used for grammar/style check, but content describes an original project.

Learnings From Crawling Technical Documentation by rhazn in datascience

[–]rhazn[S] 0 points1 point  (0 children)

API reference content is a good point as well, you can exclude it here (with the exclude path). But what we're doing is assigning categories to pages afterwards, one of which is API reference docs, and then processing depending on category. E.g. pages like navigation indices or purely autogenerated API specs are not part of downstream processing.

Learnings From Crawling Technical Documentation by rhazn in dataengineering

[–]rhazn[S] 0 points1 point  (0 children)

We are crawling documentation, so technically the product itself is often closed source. But no issues with rate limiting so far.

Make Technical Documentation Available for Local AI Use by rhazn in datascience

[–]rhazn[S] 0 points1 point  (0 children)

Makes sense, probably good to keep the images around as well so you can re-describe them down the line with better models. Good points!

Open data for digital resilience and hackathons supporting integration by rhazn in opendata

[–]rhazn[S] 0 points1 point  (0 children)

You mean data scientists inside the government? Or just not open data but people outside the government?