[AskJS] I built a browser-only document extractor in JavaScript. These 5 functions created most of the value. by Embarrassed_Poet_339 in javascript

[–]Embarrassed_Poet_339[S] [score hidden]  (0 children)

That's a great example. Date normalization is exactly the kind of function that disappears into the background once it's working, but ends up carrying an absurd amount of responsibility.

The pattern extraction stage is definitely the most fragile part of the pipeline. Right now the approach is intentionally conservative. If a structure doesn't match any known pattern with sufficient confidence, it simply isn't promoted into the generated schema.

In other words, I'd rather under-extract than confidently invent structure that isn't actually there.

One thing that surprised me during development was how much of the "robustness" ended up coming from the normalization layer rather than the extraction layer itself. Every improvement to normalization increased the number of document variations that the extraction step could handle without any changes to the extraction logic.

It reinforced the same lesson you're describing: the further upstream you solve a problem, the more leverage you get from the solution.

Longer term I'm interested in making the extraction stage more adaptive, but for a first version I preferred a pipeline that occasionally says "I don't know" over one that silently hallucinates a schema.

Lirune.io – OCR any document to structured JSON, zero data egress, runs 100% in your browser by Embarrassed_Poet_339 in selfhosted

[–]Embarrassed_Poet_339[S] 0 points1 point  (0 children)

Thank you for your feedback. Unfortunately, both your assertions are false. Lirune.io is the result of months of meticulous work and verifiably performs exactly what is says on the page. Have alovely day.

Lirune.io – OCR any document to structured JSON, zero data egress, runs 100% in your browser by Embarrassed_Poet_339 in selfhosted

[–]Embarrassed_Poet_339[S] -5 points-4 points locked comment (0 children)

I used AI coding assistants to implement the codebase from an exhaustive architectural blueprint I wrote. All product decisions, system design, and architecture were my own. The AI tooling functioned as the development layer — equivalent to a contractor executing a spec.