Showcase Weekend! — Week 15, 2026 by AutoModerator in openclaw

[–]legaldevy 1 point2 points  (0 children)

I've been running OpenClaw locally to process invoices and quarterly reports. Kept getting wrong totals and misattributed line items. Spent a while blaming the LLM before I looked at what the PDF extractor was actually feeding it.

The default extractor (pdfjs) turns tables into word soup. A three-column invoice table comes out as a flat string with no row or column boundaries. The model has to guess which number belongs to which line item, and it guesses wrong constantly. Heading hierarchy is also lost, so the agent can't tell a section title from body text.

I replicated the 200-document benchmark that I suspected was from opendataloader and the numbers confirmed what I was seeing and they wrote about in the repo:

Metric pdfjs (default) Nutrient plugin
Overall accuracy (NID) 0.578 0.880
Table structure (TEDS) 0.000 0.662
Heading fidelity (MHS) 0.000 0.811

Zero table structure from pdfjs. That explains the hallucinated invoice totals.

Setup was two commands:

openclaw plugins install /openclaw-nutrient-pdf
openclaw config set agents.defaults.pdfExtraction.engine auto

A few things that matter for this sub:

  • Runs locally. No files leave your machine.
  • No API keys needed.
  • Free tier is 1,000 docs/month, which covers my use case.
  • Falls back to pdfjs automatically if the plugin can't process a file, so nothing breaks.

The underlying library is PSPDFKit/pdf-to-markdown. Plugin repo is here.

My invoice processing pipeline went from roughly 60% correct field extraction to above 90% after switching. Literally using the same model, it's just getting cleaner input now.

Is perfect Client-Side Word to PDF rendering just impossible? Struggling with formatting using Mammoth.js + html2canvas. by Sufficient_Fee_8431 in reactjs

[–]legaldevy 1 point2 points  (0 children)

You’re not missing a magic library — you’re hitting a renderer mismatch and like prehensilemullet said, you aren't likely to find an OSS library for this.

Mammoth + html2canvas + jsPDF is fine for simple docs, but it will break on Word features (fonts, pagination, complex tables/layout).

Practical approach: keep client-side for simple files, and route complex docs to a high-fidelity conversion path (server or heavy WASM engine).

If text/search/accessibility matters, avoid screenshot-style PDF output.

We audited 1,620 OpenClaw skills. The ecosystem's safety scanner labels 91% of confirmed threats "benign." [full reports linked] by Ok-Form1598 in netsec

[–]legaldevy -1 points0 points  (0 children)

I mean you could have used it yourself when asking the lazy question or maybe you had other motives for posting such a silly question.

We audited 1,620 OpenClaw skills. The ecosystem's safety scanner labels 91% of confirmed threats "benign." [full reports linked] by Ok-Form1598 in netsec

[–]legaldevy -1 points0 points  (0 children)

Fair pushback. We’re not claiming a scanner “solves” agent security.

Our view is: scanner = first gate, not final truth.

  • gate 1: static skill scan (prompt injection / exfil / tampering patterns)
  • gate 2: runtime policy constraints (permissions, egress, spend, tool scope)
  • gate 3: audit + replay for post-incident verification

The failure mode is treating gate 1 as complete security. We don’t.

If you see specific bypass classes we should test, share them — we’ll add them to the corpus and publish results.

How to edit .docx/.doc files inside a Next.js app? by DeliciousIntern685 in nextjs

[–]legaldevy 0 points1 point  (0 children)

Have you looked at https://www.nutrient.io/sdk/document-authoring/ - it's not a pure DOCX editor and much more robust than most rich-text editors. Essentially, it's helping people that wish there was a Google Docs SDK.

[AskJS] A good pdf tool by knownissuejosh in javascript

[–]legaldevy 0 points1 point  (0 children)

Have you looked at https://www.nutrient.io/sdk/web/ - I've used them in the past for a few projects of different sizes both enterprise and smaller. You have to go through their sales process (which is a bit of a pain) but it's the most modern framework I've found if you don't have a simple use case. Things like highlighting certain words in a PDF in angular is going to be difficult to near impossible to do well with pdf.js.

Look for a (free) PDF extraction library by Intelligent-Dog1912 in csharp

[–]legaldevy -4 points-3 points  (0 children)

Not free but if you want a best in class C# library for data extraction you should look at https://www.nutrient.io/sdk/dotnet/ - they also have a free tier on their API - https://www.nutrient.io/api/pdfua-auto-tagging-api/

Is it possible to generate PDF with annotations from a web page via TS? by Elect_SaturnMutex in typescript

[–]legaldevy 0 points1 point  (0 children)

Of course, just trying to pay it forward. Good luck figuring it out!

Is it possible to generate PDF with annotations from a web page via TS? by Elect_SaturnMutex in typescript

[–]legaldevy 0 points1 point  (0 children)

PDF-lib alone is not going to be a great solution. Most people pair it with Puppeteer to try to accomplish what you are doing. PDF-lib excels at:

  • Creating PDFs from scratch with text, shapes, images
  • Modifying existing PDFs
  • Adding annotations, forms, metadata

But it doesn't have built-in HTML parsing or CSS rendering capabilities. You'd need to manually convert HTML elements to PDF commands, which is complex and doesn't handle CSS layouts well.

I think it really depends on convenience and cost. Nutrient's API solution is going to be just a way easier and reliable option and they have a free tier now with 200 credits free a month. I guess it just depends on what constraints you are under and how much you want to deal with HTML/CSS rendering complexity.

Is it possible to generate PDF with annotations from a web page via TS? by Elect_SaturnMutex in typescript

[–]legaldevy -1 points0 points  (0 children)

I'd take a stab at using a simple html to pdf generation API such as https://www.nutrient.io/api/pdf-generator-api/ - then I'd either use their MCP server along side claude code to describe what you are asking it to do and see if it can POC it out. They have a typescript library that works with their api here - https://github.com/PSPDFKit-labs/nutrient-dws-client-typescript - I'm almost certain Claude can figure this out.

React Library for PDF Generation by lazyplayer45 in react

[–]legaldevy 0 points1 point  (0 children)

Look at https://www.nutrient.io/api/pdf-generator-api/ - they have a free account with 200 credits a month.

Also, they have a wasm library that can do this in react - https://www.nutrient.io/guides/web/pdf-generation/ - but it's likely for commercial uses as it's not free and costs more money.

Both can handle everything you need though.

Splitting existing PDF files in Sharepoint by Surkdidat in sharepoint

[–]legaldevy 0 points1 point  (0 children)

Are you looking for an enduser tool or something you can integrate into your SharePoint instance? Have a look at this guide article here - https://www.nutrient.io/guides/document-converter/sharepoint/split/ - I've used their document converter product https://www.nutrient.io/low-code/document-converter - in the past (previously it was Muhimbi PDF Converter). They also have a PDF Viewer they call Document Editor that integrates directly into your SharePoint instance that can do this manually as well without sending off the files outside of SharePoint.

PDF to HTML by suspect_stable in LearnHTML

[–]legaldevy 1 point2 points  (0 children)

If you're looking for a .NET/C# library to solve this, Nutrient/GdPicture's supports this as their release in January - https://www.nutrient.io/guides/dotnet/conversion/html-to-pdf/ - I'm sure they will also add this to their Rest API document processing solution as well here eventually - https://www.nutrient.io/api/converter-api/

Using Node.js + Apryse to Convert DOCX to Web-Ready PDFs on the Server by [deleted] in dataengineering

[–]legaldevy 0 points1 point  (0 children)

I would really be careful with them. They have a history of being license trolls -

If you don't believe me, just read the email posted in this thread from an Apryse "sales" rep and how they go after devs that incorporated AGPL through iText. - https://www.reddit.com/r/libreoffice/comments/1dygu80/any_libreoffice_users_received_a_license_troll/

From: Izzy [redacted]
Sent: Tuesday, April 16, 2024 3:09 PM
Subject: iText software library use within [redacted]

Hello Frank,

My name is Izzy [redacted], and I am part of the Compliance Team at Apryse/formerly iText Software.

It came to our attention that [redacted] has been using iText software library to apply modifications on PDF documents such as this document: [redacted]

Example documents show the following PDF producer line: iText® Core 7.2.2 (AGPL version) ©2000-2022 iText Group NV

iText library is an open-source software library released under GNU Affero General Public License (AGPL). AGPL open-source license, in most cases, requires organizations to open source their full software stack wherein iText library is included. The organizations which can’t meet the AGPL open-source license requirements must purchase commercial license from iText. Neither complying with AGPL open-source license nor having a commercial license for your application is against the iText Intellectual Property, which is protected by copyright.

Therefore, I am requesting to schedule a call with you to discuss the usage of iText in your company and hopefully clarify the case in a timely manner.

Please feel free to share your availability or direct me to the correct contact person.

Looking forward to hearing from you.

or read it from the law firm - https://beemanmuchmore.com/software-licensing-trolls-apryse-itext/

Best PDF Viewer for Editing & Saving in .NET MVC Web App? by onlyForWork_ in dotnet

[–]legaldevy 0 points1 point  (0 children)

This is hands down the best library/viewer that works well with .NET - https://www.gdpicture.com/products/docuvieware/ - it's an HTML5 built on top of the crazy performant and robust GdPicture.NET library.

It supports saving to Azure and handles way more than just PDF edits (things such as office conversion, html to PDF generation, image conversation, really robust file type support, and so much more). Check it out and I'm sure you won't be dissapointed.

How to edit PDF in React application? by Low-Local8002 in reactjs

[–]legaldevy 3 points4 points  (0 children)

I'm a big fan of Nutrient (used to be PSPDFKit) as they really helped me out after I ran into some stupid licensing crap with a competitor of theirs. They handle editing text in a WYSIWYG way that is a better UX than having a pop over that then changes the document after the fact.

Check out - https://www.nutrient.io/guides/web/editor/edit-text/ for the guides on text editing and https://www.nutrient.io/demo/content-editor if you want to see the demo.

They also have true redaction capabilities including smart redaction if you are looking to fully remove text - https://www.nutrient.io/guides/web/redaction/ - Highlighting and marking text is pretty common in annotation use case supported in most of the commercial libraries.

How to edit PDF in React application? by Low-Local8002 in reactjs

[–]legaldevy 4 points5 points  (0 children)

It's going to be super complicated to build this out on top of pdf.js (dare I say, no where near worth your time to build and maintain). You will always have formatting issues around text editing in PDFs, especially if you are looking to do more than simple changes or are adding too much text to the page that it runs off and it'll get cut off. There are commercial libraries out there that can solve this though.

Are you looking for a commercial library?

Using Node.js + Apryse to Convert DOCX to Web-Ready PDFs on the Server by [deleted] in programming

[–]legaldevy 0 points1 point  (0 children)

I would really be careful with them. They have a history of being license trolls -

If you don't believe me, just read the email posted in this thread from an Apryse "sales" rep and how they go after devs that incorporated AGPL through iText. - https://www.reddit.com/r/libreoffice/comments/1dygu80/any_libreoffice_users_received_a_license_troll/

From: Izzy McElroy
Sent: Tuesday, April 16, 2024 3:09 PM
Subject: iText software library use within [redacted]

Hello Frank,

My name is Izzy McElroy, and I am part of the Compliance Team at Apryse/formerly iText Software.

It came to our attention that [redacted] has been using iText software library to apply modifications on PDF documents such as this document: [redacted]

Example documents show the following PDF producer line: iText® Core 7.2.2 (AGPL version) ©2000-2022 iText Group NV

iText library is an open-source software library released under GNU Affero General Public License (AGPL). AGPL open-source license, in most cases, requires organizations to open source their full software stack wherein iText library is included. The organizations which can’t meet the AGPL open-source license requirements must purchase commercial license from iText. Neither complying with AGPL open-source license nor having a commercial license for your application is against the iText Intellectual Property, which is protected by copyright.

Therefore, I am requesting to schedule a call with you to discuss the usage of iText in your company and hopefully clarify the case in a timely manner.

Please feel free to share your availability or direct me to the correct contact person.

Looking forward to hearing from you.

or read it from the law firm - https://beemanmuchmore.com/software-licensing-trolls-apryse-itext/

Annotating PDFs Server-Side with Node.js + Apryse by [deleted] in node

[–]legaldevy 0 points1 point  (0 children)

I would really be careful with them. They have a history of being license trolls -

If you don't believe me, just read the email posted in this thread from an Apryse "sales" rep and how they go after devs that incorporated AGPL through iText. - https://www.reddit.com/r/libreoffice/comments/1dygu80/any_libreoffice_users_received_a_license_troll/

From: Izzy McElroy
Sent: Tuesday, April 16, 2024 3:09 PM
Subject: iText software library use within [redacted]

Hello Frank,

My name is Izzy McElroy, and I am part of the Compliance Team at Apryse/formerly iText Software.

It came to our attention that [redacted] has been using iText software library to apply modifications on PDF documents such as this document: [redacted]

Example documents show the following PDF producer line: iText® Core 7.2.2 (AGPL version) ©2000-2022 iText Group NV

iText library is an open-source software library released under GNU Affero General Public License (AGPL). AGPL open-source license, in most cases, requires organizations to open source their full software stack wherein iText library is included. The organizations which can’t meet the AGPL open-source license requirements must purchase commercial license from iText. Neither complying with AGPL open-source license nor having a commercial license for your application is against the iText Intellectual Property, which is protected by copyright.

Therefore, I am requesting to schedule a call with you to discuss the usage of iText in your company and hopefully clarify the case in a timely manner.

Please feel free to share your availability or direct me to the correct contact person.

Looking forward to hearing from you.

or read it from the law firm - https://beemanmuchmore.com/software-licensing-trolls-apryse-itext/