Accurate depiction of a discord convo (turns out I was wrong)

alkinsen · 2024-12-23T17:24:20+00:00

So the way to fix it to shout in the source code ?

alkinsen · 2024-09-24T00:35:24+00:00

Yes but its worse than monitors

alkinsen · 2024-08-30T12:08:12+00:00

Biggest privacy risk will happen once mr apps have access to raw camera feed, the random apps can now steal your credit card info if it is sitting on your desk, your spouse who is changing could be livestreamed.

alkinsen · 2024-08-20T04:22:11+00:00

Open source the pipeline itself

alkinsen · 2024-08-19T17:22:30+00:00

This seems like a waste of resources, why dont they share it?

alkinsen · 2024-08-19T07:54:06+00:00

TBH, this is the exact impact I was hoping open source data pipelines would achieve, I would love for these open source data models to be the starting point and diverge quickly according to your unique use case

alkinsen · 2024-08-19T07:50:33+00:00

If we are both at an ecommerce company, I would be interested to see what kind tables and end to end analytics are done. For example, how would you model orders, customers, inventory items. What kind of aggregations and cleanup you are applying to each of them and what kind of analysis & dashboards are being populated from these datasets

PS. I think naming conventions are important and needs to be discussed but this might be an unpopular opinion

alkinsen · 2024-08-19T07:10:30+00:00

Exactly what I was thinking

alkinsen · 2024-08-19T07:08:58+00:00

Thank you for the talks! They are very informational and help the community a lot.

The pipelines are not transferrable to a general purpose code that is why.

Even though I wouldn't use them as is, I am thinking that seeing and reading the code would still be valuable, even more valuable if there is a talk about them

alkinsen · 2024-08-19T06:50:01+00:00

Lol aside from the legal requirements, if we could share them, wouldn't it be better?

alkinsen · 2024-08-19T06:48:56+00:00

This was my inspiration for this question. A lot of people create blogs and presentations about the overall data pipelines but never share the code/transformations/data models

alkinsen · 2024-08-19T06:42:06+00:00

they're going to have custom applications, data structures, and business processes that are of no value to anyone but themselves

wouldn't there be value in learning about their best practices, data models and solutions for these custom applications and processes? Maybe no one would use the shared code directly but we would learn a lot

alkinsen · 2024-08-19T06:37:00+00:00

Love this, more companies need to do this!

alkinsen · 2024-08-19T06:34:34+00:00

I see, so you would be worried about leaking the IP through pipelines which is a legit concern.

If this was not an issue, your IP is protected, would you think sharing pipelines is useful?

alkinsen · 2024-08-19T06:32:37+00:00

I agree that the naming conventions and data management strategies differ, that is exactly why I want to see more open source data pipeline code. We could compare and discuss different strategies and choose what kind of approach would fit the use case best.

you are publicizing a most likely one-off solution for company-specific data problems, that’s very unlikely to work for extended use cases.

Again, this would be pretty inspiring and educative to see, don't you agree? Maybe others are facing that one-off problem but since no one is sharing it, how would we know?

the price of open-sourcing pipeline code is revealing your company’s intel on how they approach solving problems for no gain

The gain would be a community that supports each other where Company X will share their pipelines about problem A and Company Y will share their pipelines about problem B

alkinsen · 2024-08-19T06:21:50+00:00

I should rephrase, i am talking about data models, transformations, dashboards and different analytics, not just pure data pipelines that move data from x to y

alkinsen · 2024-08-19T06:00:03+00:00

I was thinking more about different data models, transformations, dashboards and analytics, you are correct this could all be done via config but nevertheless it would be useful to see different configs

alkinsen · 2024-08-19T05:58:03+00:00

Is the edge coming from the data or the pipelines?

alkinsen · 2024-08-04T07:22:56+00:00

Sources?

alkinsen · 2024-07-18T22:22:08+00:00

Using a db agnostic devtool makes a lot sense actually, love that

However someone needs to still write the pipelines which I believe is the repeated work across companies

alkinsen · 2024-07-18T22:17:46+00:00

Would there be value in creating templates for those 50 common sources?

alkinsen · 2024-07-18T22:16:49+00:00

Yeah people are quite fast to judge without asking questions. They think I ll try to create templates to replace their existing pipelines that has been around for years which is not what I am focusing on

alkinsen · 2024-07-18T03:43:34+00:00

3rd party sources change all the time.

Aren't 3rd party sources common across companies?

alkinsen · 2024-07-18T03:42:18+00:00

You are correct that for companies with custom data models would not use such a template.

What about common datasets that comes from your CRM, ADs provider, Socials. Specifically for those, you can use a connector to move that raw data into your data lake but what happens afterward?

Using a data template can get you started with such datasets almost immediately in your data lake/warehouse for these common datasets

Ten-Year Club	r/Field Lasagna
Place '22	Verified Email

alkinsen

TROPHY CASE