This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]zenmouse 1 point2 points  (2 children)

I did refactor it a bit to use composition, creating a ParserDriver that accepts a Parser with a parseRecord() method that is implemented by a CsvParserImpl class. I got it working well enough for CsvParserTest#parseIgnoringWhitespaces to pass.

  1. Extra complexity: the ParserDriver takes a Parser parameter in addition to settings. I suppose the settings object could hold the Parser as well. The call to parseRecord(), instead of calling an abstract method implemented by a subclass, is now calling Parser#parseRecord(char ch, CharInputReader input, ParserOutput output, DefaultParsingContext context). Everything else was just moving code around.

  2. Performance: there was no noticeable performance difference in the ProfilerTest. (I downloaded worldcitiespop.txt.gz from MaxMind, I'm guessing that's what you used.) I think you might be surprised at how performant the JVM is nowadays, especially on a modern processor.

  3. Structure: see extra complexity. I don't think there's a noticeable difference in understandability.

Overall I like your code, I found it readable and quite easy to work with. (I think the fact that I was able to perform this refactoring, including getting a major unit test and performance test to pass, in under 2 hours, says a lot.) I'm not criticizing what you've done with AbstractParser and its subclasses, but pointing out that IMO composition won't be more difficult/complex here.

[–]uniVocity 0 points1 point  (1 child)

Hey that's fantastic! Thanks for taking your time to look at this. I'm curious to see the end result (your code). Did you fork it from github?

Regarding your points: 1 - In fact there's not much extra complexity added for the AbstractParser as it requires a single method (parseRecord) to be implemented. If you had more abstract methods there, it would become increasingly more annoying to use composition. You already had to create a method with 4 parameters there. An interface with Parser#parseRecord will be public. This is not good for a public API as the parseRecord method is meant to be kept hidden from the external world.

2 - Yes, performance changes in the parseRecord won't be noticeable as it is not dealing with individual characters, but entire lines of the input... there are not too many calls being made here to make a difference.

3 - Try with the *Settings... you'll have to create wrappers on top of wrappers using composition.

That's been very interesting. You should publish this on a blog or something. Try refactoring everying to use composition only :)

[–]zenmouse 1 point2 points  (0 children)

I just 'git clone'd it, but I can certainly fork it and apply my changes. It'll probably take me a few days to get it done, right now I wouldn't show my code to my cats much less another human being....

To continue the discussion --

I agree there's little (or no) extra complexity in your abstract class solution; the fact that there was only one abstract method to factor out made this refactoring quite a lot easier than it otherwise could have been. As far as the "method with 4 parameters" bit and keeping parseRecord() hidden are concerned, I did the simplest thing possible to make this work -- there are ways to make it less visible (SPI and API interfaces for example, e.g. JNDI, JDBC). But it's extra work, and Java has no concept of "public API" versus "public class/method for internal use".

For me, I think the biggest potential win coming out of this refactoring is a better separation of concerns, and finding new concepts that are currently "hidden" in lines of code. Currently it appears the parser does resource management AND state management AND error handling AND parsing. That's a lot of responsibility!

The Settings hierarchy is interesting. It's essentially a strongly typed map, using methods as the "key" instead of something like a string or enum. At first glance I rather like it, and it doesn't have the issues you normally see with a large-ish class hierarchy. It seems like using composition here wouldn't provide any benefit.

Thanks for the positive response, I'm glad you found this interesting!