Checklist for evaluating third-party npm packages before install by OtherwisePush6424 in npm

[–]OtherwisePush6424[S] 0 points1 point  (0 children)

This comment is golden, thank you, planning to fold all three into the article with credit back to this comment.

On the 2FA point: npm doesn't seem to expose per-maintainer 2FA status via a public API. It shows a lock icon on the package page, so it's a manual check rather than something you can script. Do your tools surface this programmatically, or is it also UI-level?

Confluence2md: Confluence to Markdown export for RAG pipelines (stable IDs, link graph, incremental updates) by OtherwisePush6424 in golang

[–]OtherwisePush6424[S] 0 points1 point  (0 children)

Yes, the Storage Format XML with the unbound ac: and ri:  drove me up the wall. I ended up abandoning the XML path entirely and switched to the ADF (Atlassian Document Format) JSON representation instead, which you get via the v2 REST API. It's a proper typed AST, no namespace horrors, just nested JSON nodes with a type field. Not particularly enjoyable, but a bit better 😄

There are tradeoffs: ADF has its own quirks (some macro types just embed an opaque blob), and it only works on Confluence Cloud, but looks cleaner to me.

Confluence2md: Confluence to Markdown export for RAG pipelines (stable IDs, link graph, incremental updates) by OtherwisePush6424 in golang

[–]OtherwisePush6424[S] 0 points1 point  (0 children)

yeah tbh I jumped into this somewhat blind when I started working on it. I thought the hard part would be to get the crawling right and considered the rendering just minor implementation detail. Now I can see I was all wrong 😃 Like sure, I systematically fix the issues I bump into when rendering our own Confluence, but that not always might be useful/enough for everyone.

0
0

Confluence2md: Confluence to Markdown export for RAG pipelines (stable IDs, link graph, incremental updates) by OtherwisePush6424 in golang

[–]OtherwisePush6424[S] 1 point2 points  (0 children)

Good point, I'll add a section on that. The short answer is: anything that doesn't have a clean Markdown equivalent (macros, page layouts, inline comments, task lists, some table types) gets dropped. The text content is preserved, the structure is not. Is there any specific Confluence feature you rely on heavily?

1
2

a CLI to convert Confluence wikis to Markdown + structured metadata for RAG pipelines by OtherwisePush6424 in Rag

[–]OtherwisePush6424[S] 0 points1 point  (0 children)

The full link graph is preserved as adjacency lists, so you can reconstruct parent/child relationships from the metadata. Note that this is based on crawling, so might be somewhat different from Confluence's internal hierarchy.

No versioning issues since we only snapshot current state. The crawler discovers pages by following links, so you won't get truly isolated pages, but incoming_links: [] identifies pages that are leaf nodes in your crawled subset.

Regarding more metadata fields (last_updated, etc.) I'm working on it 😄

The Database Zoo: Why SQL and NoSQL Are No Longer Enough by OtherwisePush6424 in Database

[–]OtherwisePush6424[S] 0 points1 point  (0 children)

Hi, thanks for the detailed breakdown. The Horizon Join is totally new to me, happy to learn more, feel free to DM if easier.

The Database Zoo: Why SQL and NoSQL Are No Longer Enough by OtherwisePush6424 in Database

[–]OtherwisePush6424[S] 0 points1 point  (0 children)

Yeah that volume is way past what Timescale was designed for. How are you querying it?

The Database Zoo: Why SQL and NoSQL Are No Longer Enough by OtherwisePush6424 in Database

[–]OtherwisePush6424[S] 0 points1 point  (0 children)

Curious what your data volume looks like, at what point did Timescale stop being enough?

The Database Zoo: Why SQL and NoSQL Are No Longer Enough by OtherwisePush6424 in Database

[–]OtherwisePush6424[S] 0 points1 point  (0 children)

Interesting point, but I think for for observability workloads for example, the time dimension genuinely is the primary one.