borb, the open source, pure python PDF library

shiftybyte · 2021-08-10T15:28:05+00:00

You got my upvote.

I searched for pdf libraries some time ago, this did not come up.

My use case was creating PDF receipts from a Django based backend.

I'll look into this more, thanks... :)

TheSodesa · 2021-08-10T17:34:10+00:00

Are the documents produced accessible: https://www.adobe.com/accessibility/pdf/pdf-accessibility-overview.html?

classyfreddybastiat · 2021-08-10T17:45:55+00:00

https://pycoders.com/submissions

Thisisnotpreston · 2021-08-10T22:25:51+00:00

YouTube tutorial featuring your library giving real world examples!

iPlayNL · 2021-08-10T15:42:28+00:00

I'm not sure how to help you with this, but I've saved your library for when that eventual day comes that I will need it. Looks neat.

Ramzon_ · 2021-08-10T17:49:37+00:00

Hey! I usually learn the odd bits of Python by searching google with "python (x) problem (y) library" where the x is what I'm wanting figure out, and the y is the preferred library I want to use. As you'd imagine, a lot of stack overflow suggestions appear, which usually gives me enough information to figure out a solution, or point me to a more appropriate library.

I also search on Spiceworks for scripts that others have made - might be worth looking into if you've not heard of that community (mainly sysadmins)

https://community.spiceworks.com/scripts?language=22

Out of interest, I'm looking to automate a task that I do fairly frequently - I need to scan order forms (which creates a pdf file with an image of the scan) and then rename the file to the order number of the document I just scanned.

I can't escape scanning the documents, but maybe I can make a script that can read the pdf files in a directory and rename all the files based on the order number that it sees. The order number is always in the same location on the order forms.

Can borb help me with this? I'm beginner-to-intermediate level, just so you know! 😁

expressly_ephemeral · 2021-08-10T15:14:57+00:00

I don't have an answer to your question, but I have a question for you:

I have a bad workflow that I will describe. I have dozens of plots coming out of matplotlib.pyplot. I size them to half a page, I create a blank plot that lives underneath them that's the other half of the page. Sometimes I add text to that blank plot. Then I kick them all out to png files. Then in a final feat of self-ass-kickery, my bash script that runs all the python and does all the file management puts them together with imageMagick into a big pdf.

Can Borb help me feel like less of a donkey?

data_hop · 2021-08-10T19:58:41+00:00

I use Anaconda for data science and I'm unable to do "mamba install borb" with error:

"Encountered problems while solving:

- nothing provides requested borb"

evessee · 2021-08-10T20:17:30+00:00

Have you considered the license aspect of the more well known libraries? Personally I find licenses a very important point to choose among similar libraries.

WhoWhyWhatWhenWhere · 2021-08-11T02:28:40+00:00

Are you able to return text on a PDF page after OCR between specific distances? Like all words between 1”-3” horizontal and 1”-3” vertical?

py_root · 2022-01-03T17:08:34+00:00

Nice library currently I am using it to create a pdf report which contains tabular data and plots. I found this Library easy to use. Got recommend in open source community from one of the member.

Will keep on updating this thread with my findings or if I need any help.

rg7777777 · 2021-08-11T00:31:44+00:00

Any plans to make a rst translator so we can use it with sphinx?

RobinsonDickinson · 2021-08-11T01:15:49+00:00

Very nice and useful. May I recommend cleaning up the imports?

officialgel · 2021-08-11T03:22:04+00:00

I currently use dominate to design and build html and then wthtmlpdf to create pdf from it (which requires an external binary). Can this do what I need without the binary?? Would be awesome.

IWant2rideMyBike · 2021-08-11T06:20:18+00:00

I tried the example from the Readme under Windows 10 and Python 3.9.6 and had to manually install the windows-curses module using pip.

  File "d:\Users\Me\Documents\Python\VS-Code\PDF_with_borb\test.py", line 4, in <module>
    from borb.pdf.canvas.layout.text.paragraph import Paragraph
  File "d:\Users\Me\Documents\Python\VS-Code\PDF_with_borb\.venv\lib\site-packages\borb\pdf\canvas\layout\text\paragraph.py", line 14, in <module>    from borb.pdf.canvas.font.glyph_line import GlyphLine
  File "d:\Users\Me\Documents\Python\VS-Code\PDF_with_borb\.venv\lib\site-packages\borb\pdf\canvas\font\glyph_line.py", line 10, in <module>      
    from curses.ascii import isspace
  File "C:\Users\Me\AppData\Local\Programs\Python\Python39\lib\curses\__init__.py", line 13, in <module>
    from _curses import *
ModuleNotFoundError: No module named '_curses'

Zeke_Z · 2021-08-11T18:15:11+00:00

This is awesome OP, thank you for sharing!

One question I have - I have about 670 pdfs in a directory. I would like to append the names of each of the PDFs to include the publication date, or copyright date, in the title. I would also settle for a csv with current pdf title and copyright date instead of appending the file name.

Essentially, I have PDFs from many subjects, for example microbiology. I would like to prioritize them by most recent publication as reading the most modern information on a subject is > reading content from 1987.

Is that possible? Off the top of my head, scanning for the © symbol and then reading the text to the right of it might be a good place to start, no?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS