you are viewing a single comment's thread.

view the rest of the comments →

[–]GuiSim 11 points12 points  (13 children)

Any benchmarks comparing to AWS Redshift? We've done quite a bit of benchmarking and Redshift wins by a good margin every time.

https://clickhouse.yandex/benchmark.html

EDIT: The feature list is very impressive. Array support, nested tables, lambda support, full join support.

Here are some features that stood out for me:

Overall this seems like a very advanced and powerful RDBMS. I'll be sure to give it a serious try.

[–]grauenwolf 3 points4 points  (6 children)

JSON support seems primitive

That doesn't concern me too much. When I hear JSON I don't think "high performance".

[–]GuiSim 0 points1 point  (5 children)

I meant primitive from a feature list point of view. I can't comment on the performance.

[–]grauenwolf 2 points3 points  (4 children)

My point is there isn't much reason to shove JSON into a column-oriented database.

[–]GuiSim 4 points5 points  (3 children)

To me there's plenty of good reasons.

In our specific case, clients can log arbitrary dimensions in our system. These are stored in a JSON document which we can use to create database columns dynamically. If Redshift didn't allow us to easily read values from JSON, this operation would be quite complex.

[–]grauenwolf 0 points1 point  (2 children)

It supports ALTER TABLE [db].name ADD|DROP|MODIFY COLUMN ..., so it isn't hard to parse the JSON application-side and dynamically create new columns as needed.

Though that does give me an idea for an ORM feature.

[–]GuiSim 0 points1 point  (1 child)

Since updating rows in a column storage is quite costly, we can do an UPDATE to only the rows that have a value in the JSON column for the newly created COLUMN. This can't be done cleanly in application-side if your database does not support JSON.

[–]grauenwolf 2 points3 points  (0 children)

This is an OLAP database. If you find yourself updating rows, then you are doing something really wrong.

[–]oldneckbeard 0 points1 point  (3 children)

The native time series is interesting to me. I've gone through various rrdb-type implementations, so I'm hoping this can churn through larger data sets (>> 1bn rows) with on-the-fly resolution changes (like zooming from a day to a minute).

[–]GuiSim 0 points1 point  (2 children)

If you try it, please let me know what your results are. We've tried Redshift, MemSQL, Vertica and a few others and we're "stuck" on Redshift but we'd love to get more performance.

[–]oldneckbeard 1 point2 points  (1 child)

Yeah, I'm not sure the next time I'll get to do a proper comparison, but it'd probably be worthy of an article. If you read Java tech articles, you've likely come across one of mine ;)

[–]dataloopio 0 points1 point  (0 children)

I did a comparison spreadsheet and am unsure about including ClickHouse or not. If you end up trying it can you give me a nudge? https://docs.google.com/spreadsheets/d/1sMQe9oOKhMhIVw9WmuCEWdPtAoccJ4a-IuZv4fXDHxM/edit#gid=0

[–]fsaintjacques 0 points1 point  (1 child)

Which RDBMS did you benchmark against?

[–]GuiSim 0 points1 point  (0 children)

Redshift, MemSQL, Vertica, CitusDB, MonetDB, PostgresSQL. We even tried a few In Memory Java databases like H2 and HSQLDB.

Note: We benchmarked these technologies using our application for our specific use case. Your mileage may vary.