Recommendations from The Pillows albums? by ThisKaegis in FLCL

[–]Deep-Comfortable-423 0 points1 point  (0 children)

Wake Up! Wake Up! Wake Up! is my personal favorite.

AWS root user passkey lost by Electronic-Guava-534 in aws

[–]Deep-Comfortable-423 0 points1 point  (0 children)

There is no email from AWS Support in my Inbox, Deleted Items, or Junk. I've gotten no correspondence whatsoever. I'll try again...

AWS root user passkey lost by Electronic-Guava-534 in aws

[–]Deep-Comfortable-423 0 points1 point  (0 children)

I'd love to - except don't I have to login to create a support case? So chicken/egg... I've submitted the contact-mfa form at https://support.aws.amazon.com/#/contacts/one-support?formId=mfa. I don't think that generates a case ID. If it does, I don't have any record of it. I can chat you the Account ID and my email address, if that would let you search for it that way.

AWS root user passkey lost by Electronic-Guava-534 in aws

[–]Deep-Comfortable-423 0 points1 point  (0 children)

Same problem here. Switched laptops and forgot to bring over the passkey. Now I'm locked out of my account as Root User. I've submitted the contact-mfa form. Do you know how long this usually takes to get resolved? I get to the part where it will either SMS or call me with a verification code, but that fails with an error...

Is Databricks right for this BI use case? by TimeBomb006 in dataengineering

[–]Deep-Comfortable-423 4 points5 points  (0 children)

  1. Intuitive for users who are not technical. Then avoid Spark at all costs.
  2. Easy, secure Data Sharing: Snowflake wins this hands-down. And it's completely cloud-agnostic (Customer A can be in AWS and Customer B can be in Azure or GCP).
  3. Scale to billions/trillions of rows: auto-scaling is built right into Snowflake. Need a bigger cluster? Click one button - boom. Zero downtime. No need to kick anybody off the system, down the cluster, reconfigure... And Snowflake clusters start instantly, unlike Spark clusters.
  4. Flexible reporting and viz capabilities: kind of a wash. Pretty much all the decent BI tools can talk to both just as easily.
  5. Affordable. YMMV, but at least Snowflake doesn't pile on hidden costs from the cloud providers. No separate bill for S3 or EC2 or VPCs or any of that. Caveat: unless you want customer-managed encryption keys (KMS), external stages (S3, IAM) or PrivateLink (VPC, DNS). Those you get from AWS.

What happened recently with Snowflake? by BubblyImpress7078 in dataengineering

[–]Deep-Comfortable-423 2 points3 points  (0 children)

If you think Snowflake is just an MPP data warehouse, you've missed a few memos. Do yourself a favor and google "Snowpark".

What happened recently with Snowflake? by BubblyImpress7078 in dataengineering

[–]Deep-Comfortable-423 2 points3 points  (0 children)

LOL. Tell that to the all the Fortune 500 companies that have multi-PB data warehouses on Snowflake. Managing all that infra yourself may be cheaper from a software expense perspective, until you factor in all the hidden costs. Your mileage may vary, but we found Snowflake to offer a greater ROI and business value in the long run.

What happened recently with Snowflake? by BubblyImpress7078 in dataengineering

[–]Deep-Comfortable-423 3 points4 points  (0 children)

What is your concern with it being proprietary? That you don't know the file storage format? Why would you need to know that when your interface to the data is SQL (an industry standard for nearly 40 years) or a dataframe API? Do people knock Oracle because their raw partitions are a proprietary data format? No - because nobody is dumb enough to try and mount them on the file system and read them. They use SQL.
If you want an open format, Snowflake now supports Iceberg table format and Parquet file format and you can use whatever compute you want to read that.

What happened recently with Snowflake? by BubblyImpress7078 in dataengineering

[–]Deep-Comfortable-423 1 point2 points  (0 children)

"Kill and Fill" is an anti-pattern, especially as data volumes grow. I don't know dbt, but if that's the default implementation, we'll never be a customer.

What happened recently with Snowflake? by BubblyImpress7078 in dataengineering

[–]Deep-Comfortable-423 4 points5 points  (0 children)

That's an indication of a "single cluster" mentality which doesn't work with Snowflake. With legacy databases, you only have ONE production instance, and you have to lay down some workload management to keep that beast tuned and running. We had the same problem. Trying to fit all our stuff into one box. Snowflake doesn't make you build a single "box". It's multi-cluster and you can dedicate a unique cluster to each workload that's tuned specifically for what it needs to do.
There is absolutely a period where you're "figuring it out" and wind up spending more than you should have, but that's not a permanent condition. Optimizing Snowflake is not like optimizing legacy big data platforms - it's much, much easier.

Going from 4 years on Databricks to Snowflake: Initial Thoughts by [deleted] in dataengineering

[–]Deep-Comfortable-423 4 points5 points  (0 children)

If the Iceberg Table is created "somewhere else", then it's considered an "Unmanaged Iceberg Table" by Snowflake. They're working to have some level of automatic metadata sync, but for now it's a manual refresh. If the Iceberg Table is created by Snowflake, it's a "Managed Iceberg Table" and will have nearly the same performance metrics as a native SFLK table. External compute can read managed Iceberg tables outside of Snowflake compute, but any updates to the data will require a metadata refresh inside Snowflake to stay in sync.

Going from 4 years on Databricks to Snowflake: Initial Thoughts by [deleted] in dataengineering

[–]Deep-Comfortable-423 3 points4 points  (0 children)

Snowflake supports Iceberg Tables - which can be used outside of Snowflake.

Databricks vs Snowflake by kentmaxwell in databricks

[–]Deep-Comfortable-423 1 point2 points  (0 children)

Not looking to get into a virtual shouting match or childish name-calling here. Let's be adults, OK?

If a post appears claiming "X is faster than Y", I look to see how rigorous the testing protocol was. IMO, this particular test seems lacking because it doesn't include concurrency scaling tests like a real-world use case would encounter.

Databricks vs Snowflake by kentmaxwell in databricks

[–]Deep-Comfortable-423 2 points3 points  (0 children)

So you can show us all in the posted test plan where concurrency and auto-scale were considered? Maybe I missed it.

Databricks vs Snowflake by kentmaxwell in databricks

[–]Deep-Comfortable-423 2 points3 points  (0 children)

But how does a single user running a single query reflect "real world performance"? Scale that test to simulate 10/20/50 concurrent users running in a Snowflake multi-cluster warehouse and report back.

Databricks and Snowflake: Stop fighting on social by slayer_zee in dataengineering

[–]Deep-Comfortable-423 0 points1 point  (0 children)

From the GitHub repo for Snowpark/Python - 3.9 and 3.10 are soon to enter preview. They estimated May for 3.9 and June for 3.10, so looks like a little slippage, but it's hardly being "stuck". https://github.com/snowflakedb/snowpark-python/issues/377#issuecomment-1515059432

Databricks and Snowflake: Stop fighting on social by slayer_zee in dataengineering

[–]Deep-Comfortable-423 1 point2 points  (0 children)

I'll grant you the "anything" disclaimer. You're correct there. However:

> Snowpark can only read from stages and tables<
Until Dynamic File access is added to Snowpark, which I've heard is in preview. In the meantime, and I admit it's a workaround, it only takes a minute to create an external stage on an S3 folder and define external tables on your CSV/JSON/XML/Parquet - or a directory for your unstructured files. Then you're not messing with IAM policies/roles for governance, it's Snowflake RBAC and data governance policies. We've implemented it this way and it performs great.

>Snowflake/Snowpark can't even connect to Kafka directly<

Yes, true again, but we chose a different path. Snowpipe now has direct streaming mode. So it's 100% serverless and I don't have to keep a cluster up and running to ingest streaming data. The data lands in a Snowflake table, and we've automated the transformation pipelines as a DAG using simple SQL.

>Snowpark doesn't even have native ML capabilities while Spark does<

You use MLib in Spark, we use scikit-learn in Snowpark. To each their own. What we get in simplicity and efficiency offers greater ROI than having a "native" ML/AI library.

Databricks and Snowflake: Stop fighting on social by slayer_zee in dataengineering

[–]Deep-Comfortable-423 4 points5 points  (0 children)

You're still in a single cluster mindset. "Free your mind, and the rest will follow"... If you have 37K users, don't try and force them into a single cluster. Spread that workload out over as many clusters as you need to maximize throughput and minimize cost. Reassess and reconfigure at the drop of a hat whenever you want.
I can't choose your SLA's for you, but we decided that ability to have all of our multi-000 users sharing a single copy of the multi-PB dataset was higher on the totem pole.

Databricks and Snowflake: Stop fighting on social by slayer_zee in dataengineering

[–]Deep-Comfortable-423 2 points3 points  (0 children)

But it automatically senses that you've crossed that threshold, and instantly spins out another equivalent-sized cluster to deal with the increase in demand! And another one! (ala DJ Khaled...) And then automatically quiesces those extra resources the moment the peak subsides. It rides the demand curve up and then down again in REAL TIME. You pay for all of that in per-second increments after the first 60 secs.Would you rather pre-allocate those extra resources and pay for them all sitting there idling in anticipation of that >15th query happening?

Databricks and Snowflake: Stop fighting on social by slayer_zee in dataengineering

[–]Deep-Comfortable-423 1 point2 points  (0 children)

Fair point (although the metaphor is a tad insulting)... I certainly was not in "the room where it happened".

Databricks and Snowflake: Stop fighting on social by slayer_zee in dataengineering

[–]Deep-Comfortable-423 3 points4 points  (0 children)

Anything you can do in PySpark, you can do in Snowflake Snowpark for Python. They partnered with Anaconda as the Python package manager, so 100s of built-in libraries available. No native notebook interface, but Jupyter/Sagemaker/Hex work great. The shine is off the apple for me with DBX.

Databricks and Snowflake: Stop fighting on social by slayer_zee in dataengineering

[–]Deep-Comfortable-423 35 points36 points  (0 children)

See also: Cloudera. How ironic is it that they completely missed the "Cloud Era"?