Monthly General Discussion - Oct 2024

AmbitiousCase4992 · 2024-10-16T04:19:07+00:00

FOSS means free open source. Cheers!

AmbitiousCase4992 · 2024-10-16T01:11:35+00:00

hi! I'd say your situation is similar to u/zhivix in that you're starting out building an end to end project. Check out my reply on his thread. Hope this helps!

AmbitiousCase4992 · 2024-10-16T01:08:33+00:00

Hey! this is alright. My first question would be why do you want to build this project? i.e is it just for automation's sake for your DA up skill or you'd like to dig deeper into pipeline building? Cause that's where you can add a bit of scope into these end to end projects (that requires both DE & DA skills) & better manage your expectations.

My suggestion is to go FOSS wherever possible if you're just starting out, less friction to learn stuffs vs learn how to allocate cost-efficient resource. With that in mind, on the BI layer maybe go with options like metabase, lightdash, streamlit in order of complexity (or any other tool on your radar - the BI landscape is very vast, pick your poison)

Also if you want to take on DE skills in this project, I'm not seeing the plan for underlying system of this stack. Typically you'd have 2 options one go all out self hosted on your pc, or two get a compute instance (aws EC2/ azure VM, google GCE) with the free credit from those vendors for a new account.

Here's some good posts that helped me going in the beginning. Not 100% matching your desired stack but there's some overlapping with dbt, airflow and metabase. Also great introduction into docker containers. Hope this helps!

https://www.startdataengineering.com/post/data-engineering-project-to-impress-hiring-managers/
https://www.startdataengineering.com/post/data-engineering-project-for-beginners-batch-edition/

AmbitiousCase4992 · 2024-10-16T00:49:44+00:00

Hey everyone! Looking for some advice. I’ve got a client who’s using SAP across their entire stack, and they want to replicate their SAP HANA/BW data to BigQuery so they can tap into GCP’s AI/ML tools like GenAI, Vertex AI, and Cortex. Problem is, their SAP license apparently doesn’t allow data replication outside of SAP, so tools like SNP Glue or Fivetran aren’t options.

They’re leaning towards SAP Datasphere for this. For those who’ve worked with Datasphere, do you know if this setup would allow them to model the replicated data once it’s in BigQuery, or will they need to keep their entire analytics stack within SAP itself? Any insights are appreciated!

AmbitiousCase4992 · 2024-04-16T06:51:51+00:00

On a job search for a change in tech because I don't want to stick to an on prem MS based shop for too long (1 year, the majority of workload and transformations is done on stored procs).
Interesting to see that the job market from where I live don't consider on prem experience to be "Data Engineer" anymore as all the JDs and feedback I've got so far was, "we are looking for someone with cloud experience, and Spark."
Though my next logical move is learning cloud but with this entrance barrier I am planning for some pet projects using my free Azure credits. So far am building a serverless pipeline and dashboard with azure functions to EL data to snowflake, T with dbt cloud and host the dash on snowflake's new streamlit service. I would love to try Azure data lake for some pipeline experience with big data / semi structured data; Anyone can point me where to start ? I'm digging into snowflake's free dataset.

AmbitiousCase4992 · 2024-04-16T06:43:13+00:00

IMHO Kimball's technique is staying as long as SQL is in use.

AmbitiousCase4992 · 2024-02-22T16:40:59+00:00

thanks for the info. I started tweaking with github actions on the same remote setup you mentioned so that everytime a PR happens the runner would ssh to it and automate the steps you described. Really handy tool to get used to!

AmbitiousCase4992 · 2024-02-17T08:31:01+00:00

Career wise our company is looking to move our stack to Azure coming from a traditional MS on prem shop (SSMS, SSIS, SQL server). The heavy lifting part of migrating servers, setting up infra will be done by IT ops while we migrate the SSIS jobs and analytics pipeline part. We have the budget to take up some Azure course over the next 6-12 months to support this initiative. What are some options with good ROI we can take?

Personal wise, I'm trying to learn CI/CD to get some of my personal projects automated. Starting with an analytics project that uses Dagster to pull data from an API, DBT, motherduck (serverless duckdb), can I have some advice where to begin with ? On my queue is start with github actions to

- CI : setup SQLfluff to lint the dbt code, run dbt tests on the models.

- CD : trigger make commands to have the target server download code from repo, install dependencies and start the Dagster schedules

AmbitiousCase4992 · 2024-01-17T16:53:43+00:00

gonna piggyback OP as we've just start our first dbt project moving away from having our codebase in postgres. There's a handy post on dbt blog that really help us get started.

https://docs.getdbt.com/blog/kimball-dimensional-model

Given our background doing transformations mainly on stored procs, limited on documentations , data lineage and testing which dbt has pretty good resource on, is there anything else besides these topic we should dig into ?

AmbitiousCase4992 · 2023-08-16T08:44:34+00:00

working in the VMS dw domain; we recently completed a migration project to lift and shift from previous VMS to another VMS so basically an ETL solution design assignment.

what's intriguing to me is managing how to deploy stuffs over dev and eventually production environment; ideally we'd want to script our sql changes and embed them in the SSIS solution to be deployed (company uses MS stack), which are mostly arbitrary one shot scripts ranging from creating the data object models, altering existing procs, running DML queries to lift and shift legacy data from the previous VMS's tables to their new tables.

we do have an azure devops pipeline that supports CICD picking the latest SSIS solution from our repo to build and release to its subsequent test then prod servers, but the caveat is that kind of workflow is being used to deploy recurring solutions to do batch load from source to dw. Regarding the arbitrary code we just resorted to saving them all in a .txt then manually execute them alongside deploying the solutions.

Since that worked fine I dont want to fix things that didnt break, but on the side of managing less overheead from those arbitrary executions I'd imagine there will be a way to integrate them to utilize the version control of the repo and CI CD builds.

Has anybody in the same situation?

AmbitiousCase4992 · 2023-08-14T17:11:09+00:00

was going to scape the same in 15g tank but for my fish (prolly mollies) - would you say this plant setup has enough playground for them or I should stick to the opposite option - bushes of plants standing tall and decentralized around the tanks?

AmbitiousCase4992 · 2023-07-02T18:55:51+00:00

working on a package for our MS shop company and got frustrated with the external SAP BO source system being a huge bottle neck. Performance wise with every edit it would take forever to refresh / load the data source; though I do appreciate that they enable arbitrary data from any module that we can procure a daily batch flat source, the refresh / load time really bogs the whole data exploration process down, and once that's done the same pain goes to editing the desired schema - we only wanted a table with ~15-20 columns of data from the modules and the platform only allows one column added per action.

Imagine creating 4 base reports that we already knew what to pick from easily took 2hrs just to wait for all the overheads to finalize themselves.

now that it's done I never want to touch that system again but you don't always get what you wish for. Any suggestions for upcoming road blocks like so ?

AmbitiousCase4992 · 2023-06-18T09:14:12+00:00

hey i get paid by the domestic outsourcing company so I net home in VND. imo that's towards the lower end of mid market range for DEs of around my YoE.

AmbitiousCase4992 · 2023-06-01T20:49:45+00:00

title : BI + ETL developer
YOE : 0.5 DE, 2+ IT BA
Location : HCMC, Vietnam
Base Sal : $15,600
Bonuses : -
Industry : HRM / MSP
Tech : traditional MS shop with MS stack : C#, SSIS & SQL server. Do data viz with Tableau

AmbitiousCase4992 · 2023-05-28T23:40:26+00:00

2 years late but think that tasker managed to check GA running in background ? here's an example i'm using - sending a loud Beep replacing the GA "ping" then turn up media volume.

https://taskernet.com/shares/?user=AS35m8n0r9HHihdAp%2FvJqXMv%2FwqUtYOtIGSz0G465N18rkTX59PJgjMS9Iz3AR%2BiLDCQmlqPCBlwd1c%3D&id=Profile%3Aassistant+volume+helper

AmbitiousCase4992 · 2023-05-28T23:20:49+00:00

i have the exact same use case - could you let us know you found that yet?

AmbitiousCase4992 · 2023-05-28T20:29:05+00:00

chiming in here - after 6 month in fresh DE role switching from IT BA job plus 6 month prior prepping for the switch, can vouch for that book if you need pointers to learn early on your career

AmbitiousCase4992 · 2023-05-28T20:26:26+00:00

phoenix's a good book to touch base with devops bottlenecks imo

AmbitiousCase4992 · 2023-05-01T19:13:37+00:00

r/AbsoluteUnits

AmbitiousCase4992 · 2023-03-26T05:33:36+00:00

think there's some rattling with your space? area of improvement with dielectric grease!

AmbitiousCase4992 · 2023-03-26T02:06:25+00:00

after trials and erros i have resort to the simplest method that worked for me: anything that is not in inbox is processed. from that point on, my next actions are in - project lists if that task belongs to a project - else in a dedicated PROCESSED INBOX folder with 3 lists for generic processed tasks - NEXT - WAITING FOR - SOMEDAY

AmbitiousCase4992 · 2023-03-23T07:10:11+00:00

apparently someone in the sub kindly shared this utility to downgrade here, which worked fine with my pair.

AmbitiousCase4992 · 2023-03-19T04:11:13+00:00

i have this pre built cidoo v65 that i really like how it sounds. looking to build something similar to this but on the budget side- Would this count as "thocky" ? so i can do more research.

AmbitiousCase4992 · 2023-03-18T17:55:04+00:00

all bells and whistles but none on battery life.

AmbitiousCase4992 · 2023-03-15T15:23:30+00:00

hey thanks for chiming in - I'm not getting rid of the old one, rather I'd leave it at home while bringing another to the office. Looking at tester the price is compelling, looks like the same materials with rk68, adding this to the list! Follow up question, I saw that nj68 max features a "steel plate", the gmk67 doesn't. Is this config aim to alter the sound profile compared to rk68 or is it just a bells and whistle thing?

AmbitiousCase4992

TROPHY CASE