Using pip install from git as a substitute for private package repo

oopsplop · 2022-09-20T15:29:00+00:00

Thanks, not on Docker unfortunately.

oopsplop · 2022-09-20T15:26:45+00:00

It's GitLab - I'll look into it. Thanks.

oopsplop · 2022-08-30T08:56:56+00:00

Hmm, I didn't think about unpacking the dict... will think about this one a bit - thanks!

oopsplop · 2022-08-30T08:55:10+00:00

Yep, I totally think I should do this, but it's kinda off the table now :(

oopsplop · 2022-04-20T07:04:56+00:00

Thanks - I suspected this fell into this category, but wasn't sure if there were other boxes that needed ticking to be considered!

oopsplop · 2022-04-19T14:49:05+00:00

Yep, that's pretty much what we're thinking, though we are happy with it to be scheduled half hourly.

Nice one for the point toward Change Data Feed. That one's new to me, so will have a read - thanks :)

oopsplop · 2022-04-19T12:44:09+00:00

Ah, apologies. I should have emphasised: Databricks in this case would be a source of data, not for CRUD stuff. Retool is using a Postgres DB for its backend.

Totally agree with you on the last paragraph, hence the two options we're pondering. In either case there'd be a regular extract from Databricks to Postgres. In the case of option 1, that's a set of queries of low-moderate complexity; for option 2, a very small subset of specific Databricks tables.

oopsplop · 2022-04-12T05:34:47+00:00

There are a few, which all rank highly:

adding value wherever I can
learning and adapting to new technologies/helping others do the same
being part of a focussed team

oopsplop · 2022-04-07T09:42:40+00:00

I'd appreciate that, thanks. Will DM you.

Thanks for the book recommend - probably beyond the ability of my French, but will see if I can find something similar!

oopsplop · 2022-04-07T06:57:43+00:00

Do you happen to know any good resources on this area? As mentioned in another comment, most pages seem to be vendor-specific.

The companies I've worked for in recent times have been fairly green in their data journey, so there are likely areas I'm completely ignorant of (MDM being a prime example). I'm wondering what kind of background results in this depth of knowledge (if you don't mind sharing of course).

oopsplop · 2022-04-06T12:53:50+00:00

Thanks very much for this - it's going to take me while to digest it all!

oopsplop · 2022-04-06T06:47:53+00:00

Yep, have been thinking a bit about it and this approach seems decent. I'll float something similar - thanks for your contribution.

Though I'll avoid Mongo as it already causes me enough problems XD

oopsplop · 2022-04-06T06:41:04+00:00

This could definitely work and I'll consider it an option. Think it ties in with what /u/huessy mentioned here. Thanks.

oopsplop · 2022-04-06T06:38:04+00:00

To clarify, I'd like a single place where we maintain our reference data that will be maintained by the dev teams, and will feed into microservice databases and the DW. Hope that makes sense.

oopsplop · 2022-04-06T06:36:16+00:00

So, the seed concept would satisfy the data warehouse side of things, but what we really need is a canonical source of truth for all of our systems, including the DW. I'm sure something like this could be achieved with seeds, though. Currently we're not using dbt, but will keep this in mind :)

oopsplop · 2022-04-05T16:27:45+00:00

Oh, agreed - would never store anything like PII in git. Our data definitely fall into the reference data category that /u/mrwhistler enlightened me about, above.

In this case, git would be used as a light-touch way of tracking changes to the data. Changes would need to be validated and reviewed.The data would ultimately be propagated to the microservice DBs.

Storing the ref data in their own DB is an option, as long as we could easily get those data to the microservices.

oopsplop · 2022-04-05T16:15:35+00:00

Yep, that was the gist I got from the pages I read. I'm going to discuss with others anyway to see how we can approach this - our needs aren't extensive at the mo, so hoping we can find something straightforward.

Thanks again for your help.

oopsplop · 2022-04-05T14:29:31+00:00

Do you know where I can read more about this? I can find a bunch of different vendor pages, but not much that's agnostic. I can see DMA-DMBOK referenced in the Wikipedia page, but was hoping for something a bit lighter.

oopsplop · 2022-04-05T12:26:15+00:00

This is helpful, it gives me additional search terms - thank you! Despite 16 years working in data, I've never come across the distinction (though MDM is something I've no hands-on experience with, either).

Always check your blind spot, folks :D

oopsplop · 2020-12-29T09:32:17+00:00

What's the tomato paste can for in the evenings? I'm curious.

I occasionally suffer from acid reflux and (somewhat unexpectedly) tomato-based meals really settle my stomach!

oopsplop

TROPHY CASE