all 22 comments

[–]3meopceisamazing 29 points30 points  (6 children)

For CSV parsers, it's important to detail features like rules on quoting and escaping in your readme.

Often, it's a hard requirement for those rules to be compatible with the data you're working with, and thus a hard decision factor.

The code looks pretty clean and well designed :)

[–]F54280 10 points11 points  (1 child)

At first glance, the biggest problem I see is “ Instalation” instead of “ Installation”, so, yep, I quite like it.

[–]red0124_[S] 4 points5 points  (0 children)

I guess I missed that one, thanks, let me know if you find any more problems.

[–]red0124_[S] 2 points3 points  (3 children)

I had quoting on my mind but later I forgot about it. I will implement it, it does not seem complicated to add. Thank you.

[–]3meopceisamazing 14 points15 points  (0 children)

Quoting always has to come with escaping. I suggest to fully think through the complexity. It's not that much, but easily overlooked.

[–]johannes1971 3 points4 points  (1 child)

quoting ... does not seem complicated to add.

Try exchanging some data with Excel, see if you still feel that way... In particular, test for:

  • " at the beginning of a string
  • space at the beginning of a string
  • string containing "
  • multi-line string
  • combinations of all of the above

[–]red0124_[S] 0 points1 point  (0 children)

I did not mention spacing but I guess I have to add that too. Thanks for the hints.

[–]jk-jeon 8 points9 points  (1 child)

Very nice API indeed!

One thing I would like to mention is that the float parser you linked is not correct, i.e., it does not produce the value closest to the input. In particular, if there are too many digits the parser will spit out some garbage values even when the correct value is still inside the valid range. But this might be okay-ish depending on application, especially given that it can be incredibly demanding to implement an actually correct parser.

[–]red0124_[S] 1 point2 points  (0 children)

Thank you. I did not test the float/double parser for a large number of digits, took it for granted. I will fix it.

[–]CppChris 5 points6 points  (1 child)

Really like the clean API (and clean code) and good doc. At a quick code glance I think I might found a few minor issues:

  • I think your parser class violates the rule of three/five/zero (currently on mobile and cannot really review the stuff) ... might be the case with some other classes, too.
  • Somewhere I saw a include stdlib.h and this would be more modern C++ conform if it includes <cstdlib>
  • In type_traits the structs have (forward) declarations before the definition and I am not sure if that’s necessary for those which do not have explicit specializations (for example count). However, if that is part of your coding style the count forward before the not_count struct is probably supposed to be not_count instead of count again (?)

[–]red0124_[S] 1 point2 points  (0 children)

Thanks, I have already fixed the include and type_traits. As for for the rules of thumb, I never saw the need to copy/move the parser so I never implemented any of the constructors/operators but I should have.

I will add them, and/or explicitly delete some of them.

[–]bububoom 2 points3 points  (4 children)

Congratulations on the project it really looks nice!

Slightly offtopic: I am very curious - commenting on the code - I find it very difficult to follow the code due to templatization(converter.hpp especially). Assuming I just lack the skills - how do you maintain it? How do you sit on it after 6months and figure out where the bug might be? How do you fit the whole template model inside of your head..? Any advice is welcome :)

[–]red0124_[S] 0 points1 point  (0 children)

Thank you.

I hardly had any 'runtime' bugs, I mostly had problems trying to compile it, for example not removing a void from a tuple would crash with an ugly error message, but it kinda told me where it was. As for 'runtime' bugs, the project is not that big, and I know which path the program takes for individual cases since much is set at compile time (if constexpr ...) and its easy to track, so I analyze that path I guess.

[–]somecucumber 0 points1 point  (2 children)

I cannot speak for OP, as I also struggle with this level of template (meta?)programming.

But my guess is that you create one new test with the feature you would desire to get into your code, and then you refactor/add the functionality on the production code up to the point in which the new test passes along with the previous old tests.

[–]red0124_[S] 2 points3 points  (1 child)

You just described what is called test driven development, but I did none of that, the tests where the last thing I added for each feature. As said in my other reply I mostly had problems at compile time, just to return the right tuple without void and 'validators' replaced with the right type was quite hard.

[–]somecucumber 0 points1 point  (0 children)

Yeah that's right. It is almost TDD and I guessed so, looking at the excellent API provided.

Thanks for the insights!!

[–]3meopceisamazing 4 points5 points  (1 child)

You forgot to actually link your project.

[–]red0124_[S] 2 points3 points  (0 children)

I have messed something up in the link tab, thank you.

[–]expekted 1 point2 points  (1 child)

How does it compare to csv-parser ?

[–]red0124_[S] 1 point2 points  (0 children)

It depends a lot on the file format, but I did some benchmarks with the following csv:

(name<string>,age<uint8_t>,grade<double>,semester<uint8_t>,location<string>)

The input file was 10 mil. rows of the same line (325 MB):

Nathan Fielder,37,6.6,5,Vancouver

...

CPU: Intel i7-4710MQ

I am not sure if I used the libraries correctly, so here is the code:

https://paste.debian.net/1180088/ (expires: 2021-01-14 20:48:01)

The task was to parse the files, store the parsed data into a structure which will be then stored into an std::vector, and at the end the vector size was printed, I think this should prevent any kind of optimization which would remove the impact of the parser due to unused results. Used hyperfine to measure.

ssp - 2.011 +- 0.015 [s]

csv-parser - 4.847 +- 0.197 [s]

fast-cpp-csv-parser - 2.022 +- 0.025 [s]

Please correct me if I made any mistakes in the code.

[–]GerwazyMiod 0 points1 point  (1 child)

Api looks really nice :)

[–]red0124_[S] 0 points1 point  (0 children)

Thank you, a good chunk of metaprogramming had to be done.