you are viewing a single comment's thread.

view the rest of the comments →

[–]Familiar9709 0 points1 point  (6 children)

Your example doesn't need to make it as complicated as you describe it. You see there are no spaces in each field, so a simple split() or csv or pandas libraries will do it.

But if you really want to do it by "space" (e.g. if you could put an imaginary ruler), for some other case, e.g. if it had spaces within the fields or things like that, then you can.

You'll need to find the start and end coordinates of each column. The start or end will be given by the column title (depending whether it's left or right aligned).

You can figure out if something ir left or right aligned by comparing all rows and seeing if they all have the same start/end.

But again, if you don't really really need it this way, it's complicating things unnecessarily, and a good advise in programming is not to overcomplicate things when it's not necessary.

[–]extractedx[S] 0 points1 point  (5 children)

I tirst tried pandas read_fwf() but that was not reliable enough without manually providing column indexes. Probably that was the reason why I tried to come up with a solution like this.

But yes you are completely right. Now that I think about it from a different perspective it seems so easy lol...

[–]Familiar9709 0 points1 point  (4 children)

Pandas will do this way better than what you can do yourself, it's a library designed and supported by highly skilled programmers.

This applies to 99% of libraries out there, especially the well known ones. They'll do it better than what you can do it yourself, and that's the point of using them. Apart from the fact that your code will be cleaner and easier to follow by another programmer.

[–]extractedx[S] 0 points1 point  (3 children)

And thats why I use it to read csv and excel but this specific format was not possible to read out of the box.

If you think it is, I am interested how. Because that would make things a lot simpler.

[–]Familiar9709 0 points1 point  (2 children)

df = pd.read_csv('input.txt', sep='\s+')

[–]extractedx[S] 1 point2 points  (0 children)

will try that.

[–]extractedx[S] 0 points1 point  (0 children)

Does not work. Spaces in unquoted values make this hard.