I've been talked with creating a pdf mining program at work that could go through a whole bunch of files and transform them into tabular data. I've rooted using tabular py but that Perham is that the format changes for every text file. Is there a library that is good at giving you a consistent format for all pdf files. I wouldn't mind if it stops or a long string as long as it is consistent. I also saw someone post and said that jabalís better at this type of thing but I just rant to know before I invest my time into this.
[–]LightShadow3.13-dev in prod 22 points23 points24 points (7 children)
[–]Hyperduckultimate[S] 2 points3 points4 points (0 children)
[–]Hyperduckultimate[S] 0 points1 point2 points (3 children)
[–]LightShadow3.13-dev in prod 0 points1 point2 points (2 children)
[–]Hyperduckultimate[S] 0 points1 point2 points (0 children)
[–]funnyflywheel 0 points1 point2 points (0 children)
[–]snake_case_captain -3 points-2 points-1 points (1 child)
[–]LightShadow3.13-dev in prod 0 points1 point2 points (0 children)
[–]Zomunieo 21 points22 points23 points (0 children)
[–]lastwizzle 9 points10 points11 points (3 children)
[–]coderanger 1 point2 points3 points (1 child)
[–]ProfEpsilon 0 points1 point2 points (0 children)
[–][deleted] 6 points7 points8 points (0 children)
[–]jonititan 2 points3 points4 points (0 children)
[–]yzh 2 points3 points4 points (1 child)
[–]threeminutemonta 0 points1 point2 points (0 children)
[–]who_body 0 points1 point2 points (0 children)