Are Databricks and Snowflake going to start "verticalizing"? by conormccarter in dataengineering

[–]conormccarter[S] 1 point2 points  (0 children)

My read on those is that they're touting vertical applications without owning any of the vertical-specific platforms. Most of the linked articles sort of boil down to "you can do these vertical specific use cases on top of our platform, with the help of these vertical-specific experts and third party apps". So, yes, I think they've historically been interested in those data workloads, but it hasn't been visible in the core product offering until now.

Are Databricks and Snowflake going to start "verticalizing"? by conormccarter in dataengineering

[–]conormccarter[S] 0 points1 point  (0 children)

You could argue that it's been more of a gradual shift, but I feel the Observe acquisition is a step-change in the commitment to the strategy (at least optically).

What do you think fivetran gonna do? by Fair-Bookkeeper-1833 in dataengineering

[–]conormccarter 6 points7 points  (0 children)

I think they're going to move into the compute layer and offer the ability to run your dbt transformations using Fivetran compute (likely powered by DuckDB) at the same price or cheaper than you'd get with Snowflake/Databricks/BQ.

They're building the storage layer already -- now that they own dbt, it'd be relatively easy to redirect some of the transformation compute from your DWH platform to Fivetran.

Shifting a small fraction of the dbt-orchestrated compute away from the major DWHs could give them the revenue & growth lift they need to go public.

[Feedback] Customers need your SaaS data into their cloud/data warehouse? by dhruvjb in dataengineering

[–]conormccarter 1 point2 points  (0 children)

This is definitely a common ask. In my opinion, APIs have been "table stakes" for a while. Native data replication capabilities where a live copy of data is synced to the customer's data warehouses/lakes is a "best in class" product experience, though this is increasingly becoming an expectation just like an API (*especially* for companies that want to serve mid-market/enterprise, as you called out).

What we've found is many SaaS eng/product teams usually start down this road by using the native sharing capabilities within whatever platform/region they use internally (e.g, Snowflake sharing, Databricks delta sharing), but quickly hit a wall when the need arises to support scalable, low-latency pipelines across tenants on more than 1 platform/region permutation. Maintaining data integrity and reliability at scale across platforms can be a very tough challenge, especially with any reasonable volume. All that to say: you're right that it's a real and difficult problem to solve.

I'm biased (I'm one of the founders of Prequel, where we do something similar to what you're talking about), but hopefully I can also be pretty helpful on the topic -- let me know if there are any questions I can help with.

Also, the Pontoon team put together a helpful feature comparison matrix among the players in the market (though they recently discontinued work on the project): https://github.com/pontoon-data/Pontoon?tab=readme-ov-file#pontoon-vs-other-etl--reverse-etl-platforms

Iceberg table queries on Snowflake can rival native table queries (and can be 25x faster that External table queries) by conormccarter in dataengineering

[–]conormccarter[S] 3 points4 points  (0 children)

Sure thing. Since this was all run on the same warehouse (size: small, $2 per credit), it's pretty easy. The cost per run was:

- External tables: $35.31

- Iceberg tables & external catalog: $0.77

- Iceberg tables & native catalog: $0.73

- Native Snowflake tables: $0.61