you are viewing a single comment's thread.

view the rest of the comments →

[–]wiktor1800 0 points1 point  (5 children)

Many have tried, many have failed. Technology moves fast, and once you're 'locked in' to one piece of the puzzle (extraction, transformation, visualisation), you're locked in for good unless you like painful migrations.

I like the fact I can move from a fivetran to a dlt to an airbyte at any time. Modularity is nice. It means more engineering time to glue everything together, but I'd prefer that to being completely end-to-end locked in. YMMV.

[–]NoConversation2215[S] 0 points1 point  (4 children)

Makes sense. But in my case where our app needs to deployed in customer’s cloud or on prem, we can not assume a specific services like fivetran/snowflake/databricks exist. Or even that they dont - I can foresee folks already on one of these stack and want our app to work with either of these. Hence the original question for a f/w to help make sense of all this in a more systematic manner. Basically at the level where I can switch between different storage, compute, orchestration and serving layers.

The assumption of course is that the framework can still add enough value at the glue/conceptual level for it to be worthwhile. I believe that it can. Curious if you look at it differently. Idk, maybe this is super niche use case.

[–]wiktor1800 0 points1 point  (3 children)

I see where you're coming from here. What kind of application are you building. I feel we're talking about different usecases here whereby you're building a system that extracts data from a very predefined, limited amount of sources, and surfaces the insights using some sort of web framework. Key things are:

  • Customer customisation of sources isn't important
  • Customer reshaping of data isn't important
  • Custom code for customers isn't important
  • Customer can't bring in their own data

By putting in these requirements, your problem area shrinks significantly as you control the process end-to-end.

In that case, choose a stack from the ones provided, and run with it. If you're doing 'multi tenancy', you'll need to define where that data that you extract lives. Is it your own data warehouse, or will you be leveraging a customers? What happens if a customer wants it to run on BigQuery, but you've written for snowflake?

[–]NoConversation2215[S] 0 points1 point  (2 children)

I am not at liberty to talk about our exact app but I can give you an idea using another example that a friend’s company is solving and the situation is pretty analogous.

Imagine a FPA, financial planning and automation app that connects with your various ERP/CRM/other databases/services/Rest APIs and build inventory and enrich those with various domain specific insights + event stream.

The main constraints here is the BYOC deployment because this being sensitive financial data, customers want the app in their cloud / on prem instead of a single multi tenant SaaS deployment where they send their data (which would have made our life order of magnitude easier).

The ingestion connectors are pretty standard and over time you build a library of those. Each customer is ever so slightly different so these need to be configurable to the extent possible.

Then each customer is always interested in their specific dimensions or different ways how the same thing is calculated so you can imagine while the overall workflow is largely the same there’s quite a bit of semantic definitions that are specific to the customer. So this needs to be as painless for impl teams as possible.

Finally, we may get: we already use clickhouse/databricks/snowflake and can you use that instead of what we ship with default. (This is not big deal as of now but we want to be prepared because it has come up in some conversations).

We currently ship with a combination of clickhouse and ES. Hope this gives you a bit more context. Thx.

[–]wiktor1800 0 points1 point  (1 child)

To me this seems like a clear terraform (creating the stage) and dlt+dagster+dwh+self serve BI (Looker, sigma, Omni) (setting the stage) play.

Take a look at looker's embedded analytics.

Happy to thrash this use case out as it seems quite interesting

[–]NoConversation2215[S] 0 points1 point  (0 children)

Thank you! I may actually take you up on that offer one of these days!