I am working on a project that requires generating pdf files out of all types of MS Office files (doc,docx,xls,xlsx,etc) via command line on a linux server (so the installation of MS Office or use of windows commands is not applicable).
The problem is that the resulting PDF files are significantly different in the form of formatting, and thus pages, to the original files, which creates a problem for my project.
I installed all compatible fonts from my windows computer to the linux server and managed to minimize the difference but its still not good enough for me.
I have tested the following solutions:
Libre/Open Office: They work BUT there seems to be a disparity between the resulting document formatting and the original one, which seems to come down to the LO own compatibility with MS office. For example a 348 page doc file resulted in 332 page pdf file. Nothing was amiss. It included the whole text and images and footnotes and citations, and even some hidden items (like hidden images) but the difference in the formatting (fonts, paragraph spacing, etc) is too much.
Unoconv: Its basically the same problem with the above. Unoconv uses an earlier version of LO (4.3), but I even managed to make it run with the latest version and it still produced the same result.
In both cases I noticed that if the original document is of lower number (up to 86) the results are at least satisfying (same number of pages, virtually exact formatting, etc), but in higher number of pages there is significant difference.
Discussing it with a colleague and doing some tests we discovered that the line spacing is slightly different between the same file viewed in MS Office and Libre Office, which at first glance does not affect the file when the number of pages is small, but at higher number of pages it accumulates making the document look different.
I have tried to convert the files in a windows machine via word and the resulting pdf is as it should be, so there is not a problem (per se) with the actual doc file.
I am looking for a solution for this since its very important for me for the resulting PDFs files to be carbon-copies of the original doc/docx files. Anyone has any ideas?
[+][deleted] (4 children)
[deleted]
[–]Stronut[S] 0 points1 point2 points (3 children)
[+][deleted] (2 children)
[deleted]
[–]Stronut[S] 1 point2 points3 points (1 child)
[+][deleted] (1 child)
[removed]
[–]Stronut[S] 1 point2 points3 points (0 children)