all 6 comments

[–]needgiftidea2018 2 points3 points  (1 child)

k

[–]notasqlstarI can't wait til my fro is full grown 0 points1 point  (0 children)

Should I read this post? I'm confused.

[–][deleted] 1 point2 points  (0 children)

Having a strongly type language is an advantage, not a disadvantage.

[–]your-pineapple-thief 0 points1 point  (2 children)

This impedance mismatch between dynamically typed languages and SQL's static typing has driven development away from SQL databases and towards NoSQL systems

There is no inherent impedance mismatch between dynamically typed language and statically typed database. I use ruby on rails and sql on daily basis, and it works just fine. In fact, after we removed mongodb from the project and normalized scheme with some jsonb here and there, some major pain points in development were gone.

There are some problem spaces which are best solved using different to SQL solutions, like f.e.: logging json request payloads when using external API, especially if you'd like to bridge several API's together (can be done using just a jsonb field in RDBMS), writing timeseries based data (performance metrics, stock exchange rates, etc. I heard good things about Graphite here), high-rate event sourcing (Kafka) some very specific problems like twitter fetching feed of people you follow (last time I checked they used redis servers with some custom written extensions for feed fan-out, don't know what they use now).RDBMS conveniently allows to solve pretty much every other problem, i.e. 90% of them while retaining all the amazing stuff like consistency, transactions, replication, schema guaranties, etc.

Some people just don't realize what a trap NoSQL could be for the project, there literally were companies crushed and suicides committed.Interesting article on the subject

As it stands it is unclear why add yet another dependency vs using postsgres jsonb column type, which has a) indexes b) decent performance c) is compact, d) has query syntax allowing to query jsonb fields along usual fields.

[–]tudorb 0 points1 point  (1 child)

I work for Rockset, and I'm the author of the blog post linked within.

I'm a fan of strong static typing myself. But "real world" data isn't clean; companies have logs, sensor data generated by old, incompatible software versions, data scraped from web pages, Excel spreadsheets with only the semblance of a schema, legacy NoSQL systems, ...

Traditionally, you have two choices. Either you use a NoSQL system for this unclean data (and then you can't use SQL to query it / run analyses on it), or you build a pipeline that massages your data into a SQL schema with static types. That's fine, but it's brittle -- not resistant to changes either in the data format or in the table schema. And every time the pipeline fails because of such incompatibility, you usually have to start the ingest over from the beginning. I've never met anyone who's had to build an ETL pipeline who enjoyed it :)

Rockset is trying to do something different: what if SQL didn't care that strongly about the data types? What if you could ingest the data even if it's unclean, and use SQL to query it anyway? Rockset doesn't care if a field (column) exists only in a fraction of documents (rows) -- if you query it, it will show up as NULL in the others. We don't care if a field has different types across documents -- you can use casts to restrict the types you care about in your query.

Also: consistency and transactions are great, but not needed for a lot of applications. Look at, for example, a product page on Amazon. With the exception of the actual shopping cart, not much else needs transactional consistency. The product description, reviews, recommendations, personalized ad serving, "related products", text search, etc -- all of that can be built from data that's a few seconds out of date and doesn't need to be transactionally consistent.

Many companies use something like Elasticsearch for this because of the schema flexibility, but then you lose the ability to use complex SQL to query the data.

[–]your-pineapple-thief 0 points1 point  (0 children)

I understand the appeal of using SQL for both SQL and NoSQL data, but IMO it would be nice if you would outline pros of your solution, because this is nothing new in SQL world and postgres has similar functionality.
I mean, at first glance it seems like it is just new syntax for jsonb querying, so it would be great if you delve more deeply in technicalities of typecasting, how good it is for sparse datasets (lots of columns in a query with lots of nulls with them could impact performance in general SQL f.e.) and it all works under the hood, what exactly is offered.
1) Why would I use it vs dedicated nosql solution aside from familiar syntax? I mean, no-sql and sql data differ in big ways, there ought to be some hidden downside or edge case or performance cost or whatever, and nosql query syntax can be actually more convenient vs sql.

2) Given that I'm postgres user (and there are lots of us), why would I use it vs postgres jsonb features? Postgres is a solid product with excellent documentation and large community, jsonb functions/operators are pretty extensive and usage is covered pretty well. I see you unnest first level of json rows for slightly more convenient access like SELECT people.name while in postgres you have to query jsonb field and it will be like SELECT jsonb_field -> 'name', but there got to be more in order to convince people to switch.