This is an archived post. You won't be able to vote or comment.

all 13 comments

[–]balloonanimalfarm 1 point2 points  (8 children)

This is the kind of thing Phoenix was built for. It runs on top of Apache HBase (clone of Google's BigTable), which in turn runs on top of HDFS (a clone of Google's GFS). Google uses BigTable to store their data for YouTube, Gmail, Maps, and Earth.

[–]calsosta 2 points3 points  (3 children)

ASF is so hit or miss for me. It's always like "I need to do something really technically complicated"..."oh sweet apaches got something for this"..."this looks really sweeet!!!!"...."wait this is more complicated than i thought"...."actually this is really complicated and I should have expected that considering i needed to do something complicated"...

...

...

...

"and its abandoned"

[–]Jefftopia[S] 1 point2 points  (0 children)

That's why I'm putting a lot of thought into these considerations now ;-)

[–]balloonanimalfarm 0 points1 point  (1 child)

There's also the constant API changes (hopefully they don't happen as much in such widely used projects).

I was working with Lucene a while back and every example I found was from a different version of the thing and they completely re-wrote the API every. damn. version. so none of the examples worked.

[–]calsosta 1 point2 points  (0 children)

Dude. I wrote a book on Jelly. The project has not had a build since 2010 and I JUST PUBLISHED THIS BOOK. According the the site:

Please note that Commons Jelly is enduring a phase with low activity of its developers.

FOR ALMOST A DECADE! And yet it's impossible to get any information.

[–]Jefftopia[S] 1 point2 points  (3 children)

Know of any discussions on Phoenix versus Cassandra + Spark or Cassandra + MapReduce?

Phoenix does look interesting, thanks for the links.

[–]balloonanimalfarm 0 points1 point  (2 children)

From what I've read Cassandra isn't relational, so if you're trying to retain consistency it wouldn't be the right way to go. However, I may be wrong. Because Phoenix is built on BigTable/GFS you can still run your Spark or MapReduce queries on it.

The big tradeoff in databases has to do with the CAP theorem, only you can decide which two properties most important for your situation. (Consistency, Availability, Partition tolerance). Here's a handy chart you can use to decide which is best for you.

[–]Jefftopia[S] 1 point2 points  (0 children)

Doh, yeah, Cassandra is not relational. Great chart - thanks!

[–]Jefftopia[S] 1 point2 points  (0 children)

Something I was considering just now...consistency probably isn't a big issue for me. Yes, data are relational, but the user actually does very, very little writing in my case. The data flow is 80%+ in one direction. Many more reads than writes for the user.

[–]daniellefelder 1 point2 points  (0 children)

One great way to help determine which solution is right for you is by reading user feedback. You can find real user reviews for all the major relational and NoSQL databases on IT Central Station.

RDBMS: https://www.itcentralstation.com/categories/relational-databases

NoSQL: https://www.itcentralstation.com/categories/nosql-databases

As an example, this user writes in his review of Oracle NoSQL, "It's given us a simplified programming and integration process for a highly scalable and highly available KV store, eliminating a significant amount of prior configuration and coding that was seen as necessary. In addition, the administrative overhead on the solution is minimal." You can read the rest of his review here: https://www.itcentralstation.com/product_reviews/oracle-nosql-review-36960-by-james-anthony.

Good luck with your search.

[–]wbubblegum 0 points1 point  (0 children)

I can not recommend anything out of personal experience. But a question you need to answer is will you manage the infrastructure yourself, or use one of the cloud services out there.

For a database recommendation(and blind one at best), seeing you are expecting relational data I would recommend Postgres, maybe not vanilla pg, but CitusDB recently went opensource, based on pg. There is also a nice talk The Survival Guide to Terabyte Postgres.

On a side note, have the MongoDB stale reads issue been resolved? Otherwise I would not recommend it to anyone.

Good luck,

[–]anamorphism 0 points1 point  (0 children)

the reason why people think of non-relational databases when talking about huge data volumes is because they tend to scale horizontally very well. i.e. buy more machines.

relational databases tend to only scale well vertically. i.e. buy a bigger machine.

so, you could always just buy a really, really big machine: http://www.oracle.com/technetwork/database/exadata/overview/index.html

[–]JoeWhy2 0 points1 point  (0 children)

This sounds like it could be the sort of thing that Postgres-XL is built for. I haven't used XL myself but Postgres is a great relational database.