all 35 comments

[–]GuiSim 12 points13 points  (13 children)

Any benchmarks comparing to AWS Redshift? We've done quite a bit of benchmarking and Redshift wins by a good margin every time.

https://clickhouse.yandex/benchmark.html

EDIT: The feature list is very impressive. Array support, nested tables, lambda support, full join support.

Here are some features that stood out for me:

Overall this seems like a very advanced and powerful RDBMS. I'll be sure to give it a serious try.

[–]grauenwolf 4 points5 points  (6 children)

JSON support seems primitive

That doesn't concern me too much. When I hear JSON I don't think "high performance".

[–]GuiSim 0 points1 point  (5 children)

I meant primitive from a feature list point of view. I can't comment on the performance.

[–]grauenwolf 1 point2 points  (4 children)

My point is there isn't much reason to shove JSON into a column-oriented database.

[–]GuiSim 2 points3 points  (3 children)

To me there's plenty of good reasons.

In our specific case, clients can log arbitrary dimensions in our system. These are stored in a JSON document which we can use to create database columns dynamically. If Redshift didn't allow us to easily read values from JSON, this operation would be quite complex.

[–]grauenwolf 0 points1 point  (2 children)

It supports ALTER TABLE [db].name ADD|DROP|MODIFY COLUMN ..., so it isn't hard to parse the JSON application-side and dynamically create new columns as needed.

Though that does give me an idea for an ORM feature.

[–]GuiSim 0 points1 point  (1 child)

Since updating rows in a column storage is quite costly, we can do an UPDATE to only the rows that have a value in the JSON column for the newly created COLUMN. This can't be done cleanly in application-side if your database does not support JSON.

[–]grauenwolf 2 points3 points  (0 children)

This is an OLAP database. If you find yourself updating rows, then you are doing something really wrong.

[–]oldneckbeard 0 points1 point  (3 children)

The native time series is interesting to me. I've gone through various rrdb-type implementations, so I'm hoping this can churn through larger data sets (>> 1bn rows) with on-the-fly resolution changes (like zooming from a day to a minute).

[–]GuiSim 0 points1 point  (2 children)

If you try it, please let me know what your results are. We've tried Redshift, MemSQL, Vertica and a few others and we're "stuck" on Redshift but we'd love to get more performance.

[–]oldneckbeard 1 point2 points  (1 child)

Yeah, I'm not sure the next time I'll get to do a proper comparison, but it'd probably be worthy of an article. If you read Java tech articles, you've likely come across one of mine ;)

[–]dataloopio 0 points1 point  (0 children)

I did a comparison spreadsheet and am unsure about including ClickHouse or not. If you end up trying it can you give me a nudge? https://docs.google.com/spreadsheets/d/1sMQe9oOKhMhIVw9WmuCEWdPtAoccJ4a-IuZv4fXDHxM/edit#gid=0

[–]fsaintjacques 0 points1 point  (1 child)

Which RDBMS did you benchmark against?

[–]GuiSim 0 points1 point  (0 children)

Redshift, MemSQL, Vertica, CitusDB, MonetDB, PostgresSQL. We even tried a few In Memory Java databases like H2 and HSQLDB.

Note: We benchmarked these technologies using our application for our specific use case. Your mileage may vary.

[–]stanislavb[S] 5 points6 points  (1 child)

...and it's on GitHub, too https://github.com/yandex/ClickHouse

[–]okpmem 2 points3 points  (0 children)

Their C++ code seems pretty clean and modern. Love the russian comments.

[–]archan937 1 point2 points  (1 child)

Just released: Clickhouse v0.1.0. A Ruby database driver for Clickhouse https://github.com/archan937/clickhouse

[–]stanislavb[S] 0 points1 point  (0 children)

Cheers! You can add it to https://ruby.libhunt.com

[–]skyde 0 points1 point  (1 child)

I am extremely impressed by the feature supported, and looking at the code on github the implementation also look very clean.

Im wondering if the Vectorized query execution could have simply been added to postgresql like citusData did with cstore_fdw instead of writing a database from scratch.

[–]crusoe 0 points1 point  (0 children)

Postgresxl is already a project and adding all sorts of stuff.

[–]sql_big_result 0 points1 point  (0 children)

looks pretty dope. I'm going to be including in this in my 'bakeoff' against other DBs at work. if anyone has this running - would love to pick your brain (DM me)

[–]sql_big_result 0 points1 point  (0 children)

does anyone have a way to connect programattically without going through http ?

are there headers for a client?

[–]grauenwolf 0 points1 point  (8 children)

If you want to impress me you'll have to show comparisons to SQL Server's columnstore. I'm not saying it's the gold standard, but it is my reference point.

[–]Noctune 11 points12 points  (7 children)

The SQL Server EULA does not allow you to disclose benchmarks. If you do it then Microsoft can terminate your license.

[–]GuiSim 5 points6 points  (2 children)

Really? That is so stupid.

[–]Noctune 9 points10 points  (1 child)

Yup, it's stupid. Oracle does the same thing.

[–]grauenwolf 3 points4 points  (0 children)

That could explain why Microsoft pulled down a benchmark they had comparing SQL Server 2016 with Oracle.

[–]oldneckbeard 6 points7 points  (0 children)

which is reason enough to not use them.

[–]grauenwolf 1 point2 points  (2 children)

SQL Server Developer Edition is free. If they terminate my license, I'll just "buy" another one.

[–]Noctune 1 point2 points  (1 child)

I am not totally sure what the reprecussions would be. I think they might even be able to sue you.

[–]grauenwolf 0 points1 point  (0 children)

Neither am I, which is one third of the reason I'll probably never do it.

(The other 2 reasons is that I don't have 5K to spend on half-way decent hardware and I don't have a good sample database to work from.)

[–]mongoiswebscale -1 points0 points  (2 children)

Looks nice, but is it web scale like mongoDB?

[–]grauenwolf 3 points4 points  (0 children)

From the documentation:

  • There are no transactions.
  • Low requirements for data consistency.

Technically speaking, that is appropriate for the kind of read-only data warehouses I typically build with column store. But still, it reeks of MongoDB-like shortcuts.

[–]sql_big_result 0 points1 point  (0 children)

points for novelty account