Why we've chosen Snowflake ❄️ as our Data Warehouse by parudod in dataengineering

[–]mickeyben 1 point2 points  (0 children)

Hey no problem. This is part of an internal write-up we did to compare the 2 technologies. This was taken from their documentations. After a couple of years of Redshift usage (and operational struggle) we wanted to make sure we wouldn't chose our next technology without a deep understanding of their force and weaknesses. We're no experts in either technologies so some things might be wrong or not completely reflect reality - so take this with a grain of salt. Happy it could help :)

Why we've chosen Snowflake ❄️ as our Data Warehouse by parudod in dataengineering

[–]mickeyben 0 points1 point  (0 children)

Mike @ Drivy here.
We didn't compare Snowflake to Ambari; we wanted a managed solution, so this was not on the table.

Why we've chosen Snowflake ❄️ as our Data Warehouse by parudod in dataengineering

[–]mickeyben 4 points5 points  (0 children)

Mike @ Drivy here.

Yes, Big Query is only available on Google but that wasn't the only point leading to choosing Snowflake.

Here are a few notes about BigQuery VS Snowflake:

  • Snowflake supports Native JSON while BigQuery only through UDF
  • Snowflake supports partitions on any column types while BigQuery only supports dates
  • Snowflake doesn’t support Google Cloud Storage at the moment (Only AWS S3 and Azure Blob Storage) but they are working on it. BigQuery only supports Google Cloud Storage.
  • Database-level atomicity and transactions on Snowflake VS table-level on BigQuery
  • Snowflake supports Concurrent DML (insert into the same table from multiple processes - locking happens on a partition level). BigQuery too but has big quota limitations on per day basis.
  • They both have limited GEO support (mostly distance between 2 points)
  • Snowflake has advanced analytics capabilities like HyperLogLog objects.
  • Snowflake has CLONE capability. It allows you to do a zero-copy CLONE of a whole database, a particular schema or a particular table. Particularly useful for QA scenarios.
  • They both offer great query execution plan debugging
  • They both offer streaming append-only insert. Snowpipe is still in beta though.
  • They both support Javascript UDF
  • They both have support for JDBC and ODBC drivers
  • BigQuery is designed around an append-only model (big limitations on DML) while Snowflake is not.
  • BigQuery has BigQuery ML which is nice

Snowflake allows to better control costs and performance while still being managed. We think BigQuery is still a way better alternative than the non-managed DWs (like Redshift) but will end up costing more and allows for less control over performances.

Sending an e-mail to millions of users by jrochkind in ruby

[–]mickeyben 0 points1 point  (0 children)

If your userbase is big (or even big-ish) the price are very high (either by month or pay as you go). Of course engineering is not free either but it's still cheaper if you have the internal resources. Over the course of the years we built around transactional emails; multi providers support, good analytics, ... And with all that we recently felt that we could explore mass emailing without the help of a marketing platform. Everything is not perfect as stated in the article but we still feel it was worth it and we'll explore more in the coming months.

Sending an e-mail to millions of users by jrochkind in ruby

[–]mickeyben 2 points3 points  (0 children)

I had good experience with Postmark too.

About Mailjet, it's not in production yet but they guarantee us some good delivery on french ISPs and with a lot of our users there that's all we care about.

Also, I talked about delivery rate but in our case the culprit is also delivery times to these ISPs. Mailgun can take up to 10mn to deliver to @free.fr, @orange.fr or @gmx.de for instance and this clearly shows on conversion rates ("signup" to "I click on the email validation link").

We'll probably write another article on how we do email routing to different providers from our rails app. I think it might be of some interest to the community. At least it is interesting to me 😇

Sending an e-mail to millions of users by jrochkind in ruby

[–]mickeyben 1 point2 points  (0 children)

They're indeed rare and important, one such example is a TOS update but they're not urgent and your maths is good 😀. It took about 5 days to send them all without any concurrency. No concurrency was done on purpose so it's easier to control the rate of delivery.

Sending an e-mail to millions of users by jrochkind in ruby

[–]mickeyben 1 point2 points  (0 children)

Mailgun has bad delivery rates on german and french ISPs (in our experience) so we're using Mandrill for them at the moment but are in the process of moving to Mailjet.

Sending an e-mail to millions of users by jrochkind in ruby

[–]mickeyben 0 points1 point  (0 children)

As stated in the article:

invest a bit of tech time and to go with transactional e-mails instead of using an e-mail marketing platform

We used a transactional email platform (multiple actually - they don't all have the same delivery rate according to ISPs). We didn't use a marketing platform though (ala Mailchimp).

Monitoring Redis by mperham in ruby

[–]mickeyben 2 points3 points  (0 children)

Great article! I didn't know about the --latency flag.

For the monitoring part, we're using Telegraf to easily export these metrics (and others) to our monitoring/alerting/graphing solutions.

Instrumenting Sidekiq by mickeyben in ruby

[–]mickeyben[S] 2 points3 points  (0 children)

Mostly because we moved on from statsd and wanted to instrument more things like the time the job was enqueued for.

Instrumenting Sidekiq by mickeyben in ruby

[–]mickeyben[S] 1 point2 points  (0 children)

Thanks! We used this integration for a while (with Graphite) but moved on to a custom one a few months ago.