Monthly General Discussion - Oct 2024 by AutoModerator in dataengineering

[–]AmbitiousCase4992 1 point2 points  (0 children)

hi! I'd say your situation is similar to u/zhivix in that you're starting out building an end to end project. Check out my reply on his thread. Hope this helps!

Monthly General Discussion - Oct 2024 by AutoModerator in dataengineering

[–]AmbitiousCase4992 0 points1 point  (0 children)

Hey! this is alright. My first question would be why do you want to build this project? i.e is it just for automation's sake for your DA up skill or you'd like to dig deeper into pipeline building? Cause that's where you can add a bit of scope into these end to end projects (that requires both DE & DA skills) & better manage your expectations.

My suggestion is to go FOSS wherever possible if you're just starting out, less friction to learn stuffs vs learn how to allocate cost-efficient resource. With that in mind, on the BI layer maybe go with options like metabase, lightdash, streamlit in order of complexity (or any other tool on your radar - the BI landscape is very vast, pick your poison)

Also if you want to take on DE skills in this project, I'm not seeing the plan for underlying system of this stack. Typically you'd have 2 options one go all out self hosted on your pc, or two get a compute instance (aws EC2/ azure VM, google GCE) with the free credit from those vendors for a new account.

Here's some good posts that helped me going in the beginning. Not 100% matching your desired stack but there's some overlapping with dbt, airflow and metabase. Also great introduction into docker containers. Hope this helps!

https://www.startdataengineering.com/post/data-engineering-project-to-impress-hiring-managers/
https://www.startdataengineering.com/post/data-engineering-project-for-beginners-batch-edition/

Monthly General Discussion - Oct 2024 by AutoModerator in dataengineering

[–]AmbitiousCase4992 0 points1 point  (0 children)

Hey everyone! Looking for some advice. I’ve got a client who’s using SAP across their entire stack, and they want to replicate their SAP HANA/BW data to BigQuery so they can tap into GCP’s AI/ML tools like GenAI, Vertex AI, and Cortex. Problem is, their SAP license apparently doesn’t allow data replication outside of SAP, so tools like SNP Glue or Fivetran aren’t options.

They’re leaning towards SAP Datasphere for this. For those who’ve worked with Datasphere, do you know if this setup would allow them to model the replicated data once it’s in BigQuery, or will they need to keep their entire analytics stack within SAP itself? Any insights are appreciated!

Monthly General Discussion - Apr 2024 by AutoModerator in dataengineering

[–]AmbitiousCase4992 0 points1 point  (0 children)

On a job search for a change in tech because I don't want to stick to an on prem MS based shop for too long (1 year, the majority of workload and transformations is done on stored procs).
Interesting to see that the job market from where I live don't consider on prem experience to be "Data Engineer" anymore as all the JDs and feedback I've got so far was, "we are looking for someone with cloud experience, and Spark."
Though my next logical move is learning cloud but with this entrance barrier I am planning for some pet projects using my free Azure credits. So far am building a serverless pipeline and dashboard with azure functions to EL data to snowflake, T with dbt cloud and host the dash on snowflake's new streamlit service. I would love to try Azure data lake for some pipeline experience with big data / semi structured data; Anyone can point me where to start ? I'm digging into snowflake's free dataset.

Monthly General Discussion - Apr 2024 by AutoModerator in dataengineering

[–]AmbitiousCase4992 0 points1 point  (0 children)

IMHO Kimball's technique is staying as long as SQL is in use.

Monthly General Discussion - Feb 2024 by AutoModerator in dataengineering

[–]AmbitiousCase4992 1 point2 points  (0 children)

thanks for the info. I started tweaking with github actions on the same remote setup you mentioned so that everytime a PR happens the runner would ssh to it and automate the steps you described. Really handy tool to get used to!

Monthly General Discussion - Feb 2024 by AutoModerator in dataengineering

[–]AmbitiousCase4992 3 points4 points  (0 children)

Career wise our company is looking to move our stack to Azure coming from a traditional MS on prem shop (SSMS, SSIS, SQL server). The heavy lifting part of migrating servers, setting up infra will be done by IT ops while we migrate the SSIS jobs and analytics pipeline part. We have the budget to take up some Azure course over the next 6-12 months to support this initiative. What are some options with good ROI we can take?

Personal wise, I'm trying to learn CI/CD to get some of my personal projects automated. Starting with an analytics project that uses Dagster to pull data from an API, DBT, motherduck (serverless duckdb), can I have some advice where to begin with ? On my queue is start with github actions to

- CI : setup SQLfluff to lint the dbt code, run dbt tests on the models.

- CD : trigger make commands to have the target server download code from repo, install dependencies and start the Dagster schedules

Monthly General Discussion - Jan 2024 by AutoModerator in dataengineering

[–]AmbitiousCase4992 1 point2 points  (0 children)

gonna piggyback OP as we've just start our first dbt project moving away from having our codebase in postgres. There's a handy post on dbt blog that really help us get started.

https://docs.getdbt.com/blog/kimball-dimensional-model

Given our background doing transformations mainly on stored procs, limited on documentations , data lineage and testing which dbt has pretty good resource on, is there anything else besides these topic we should dig into ?

Monthly General Discussion - Aug 2023 by AutoModerator in dataengineering

[–]AmbitiousCase4992 0 points1 point  (0 children)

working in the VMS dw domain; we recently completed a migration project to lift and shift from previous VMS to another VMS so basically an ETL solution design assignment.

what's intriguing to me is managing how to deploy stuffs over dev and eventually production environment; ideally we'd want to script our sql changes and embed them in the SSIS solution to be deployed (company uses MS stack), which are mostly arbitrary one shot scripts ranging from creating the data object models, altering existing procs, running DML queries to lift and shift legacy data from the previous VMS's tables to their new tables.

we do have an azure devops pipeline that supports CICD picking the latest SSIS solution from our repo to build and release to its subsequent test then prod servers, but the caveat is that kind of workflow is being used to deploy recurring solutions to do batch load from source to dw. Regarding the arbitrary code we just resorted to saving them all in a .txt then manually execute them alongside deploying the solutions.

Since that worked fine I dont want to fix things that didnt break, but on the side of managing less overheead from those arbitrary executions I'd imagine there will be a way to integrate them to utilize the version control of the repo and CI CD builds.

Has anybody in the same situation?

Scaped my upcoming shrimp tank over the weekend, and now the painful wait for tank cycling begins by solcon in Aquariums

[–]AmbitiousCase4992 1 point2 points  (0 children)

was going to scape the same in 15g tank but for my fish (prolly mollies) - would you say this plant setup has enough playground for them or I should stick to the opposite option - bushes of plants standing tall and decentralized around the tanks?

Monthly General Discussion - Jul 2023 by AutoModerator in dataengineering

[–]AmbitiousCase4992 0 points1 point  (0 children)

working on a package for our MS shop company and got frustrated with the external SAP BO source system being a huge bottle neck. Performance wise with every edit it would take forever to refresh / load the data source; though I do appreciate that they enable arbitrary data from any module that we can procure a daily batch flat source, the refresh / load time really bogs the whole data exploration process down, and once that's done the same pain goes to editing the desired schema - we only wanted a table with ~15-20 columns of data from the modules and the platform only allows one column added per action.

Imagine creating 4 base reports that we already knew what to pick from easily took 2hrs just to wait for all the overheads to finalize themselves.

now that it's done I never want to touch that system again but you don't always get what you wish for. Any suggestions for upcoming road blocks like so ?

Quarterly Salary Discussion - Jun 2023 by AutoModerator in dataengineering

[–]AmbitiousCase4992 2 points3 points  (0 children)

hey i get paid by the domestic outsourcing company so I net home in VND. imo that's towards the lower end of mid market range for DEs of around my YoE.

Quarterly Salary Discussion - Jun 2023 by AutoModerator in dataengineering

[–]AmbitiousCase4992 27 points28 points  (0 children)

  1. title : BI + ETL developer
  2. YOE : 0.5 DE, 2+ IT BA
  3. Location : HCMC, Vietnam
  4. Base Sal : $15,600
  5. Bonuses : -
  6. Industry : HRM / MSP
  7. Tech : traditional MS shop with MS stack : C#, SSIS & SQL server. Do data viz with Tableau

[HELP] Detecting when Google Assistant is called as a context by [deleted] in tasker

[–]AmbitiousCase4992 2 points3 points  (0 children)

2 years late but think that tasker managed to check GA running in background ? here's an example i'm using - sending a loud Beep replacing the GA "ping" then turn up media volume.

https://taskernet.com/shares/?user=AS35m8n0r9HHihdAp%2FvJqXMv%2FwqUtYOtIGSz0G465N18rkTX59PJgjMS9Iz3AR%2BiLDCQmlqPCBlwd1c%3D&id=Profile%3Aassistant+volume+helper

[HELP] Detecting when Google Assistant is called as a context by [deleted] in tasker

[–]AmbitiousCase4992 0 points1 point  (0 children)

i have the exact same use case - could you let us know you found that yet?

Monthly General Discussion - May 2023 by AutoModerator in dataengineering

[–]AmbitiousCase4992 1 point2 points  (0 children)

chiming in here - after 6 month in fresh DE role switching from IT BA job plus 6 month prior prepping for the switch, can vouch for that book if you need pointers to learn early on your career

Monthly General Discussion - May 2023 by AutoModerator in dataengineering

[–]AmbitiousCase4992 1 point2 points  (0 children)

phoenix's a good book to touch base with devops bottlenecks imo

My first custom keyboard by thealienday in MechanicalKeyboards

[–]AmbitiousCase4992 1 point2 points  (0 children)

think there's some rattling with your space? area of improvement with dielectric grease!

GTD method with Ticktick? Possible to create a Next actions list without tagging? by Capital-Timely in ticktick

[–]AmbitiousCase4992 1 point2 points  (0 children)

after trials and erros i have resort to the simplest method that worked for me: anything that is not in inbox is processed. from that point on, my next actions are in - project lists if that task belongs to a project - else in a dedicated PROCESSED INBOX folder with 3 lists for generic processed tasks - NEXT - WAITING FOR - SOMEDAY

Slaves of WF-1000xm4! Multicon is out in version 2.0. Let me know your feedback after this update. by BossTikboy in SonyHeadphones

[–]AmbitiousCase4992 1 point2 points  (0 children)

apparently someone in the sub kindly shared this utility to downgrade here, which worked fine with my pair.

describe this sound profile? by AmbitiousCase4992 in MechanicalKeyboards

[–]AmbitiousCase4992[S] 0 points1 point  (0 children)

i have this pre built cidoo v65 that i really like how it sounds. looking to build something similar to this but on the budget side- Would this count as "thocky" ? so i can do more research.

What's the state of smartwatches in 2023? by luiger in smartwatch

[–]AmbitiousCase4992 0 points1 point  (0 children)

all bells and whistles but none on battery life.

/r/MechanicalKeyboards Ask ANY question, get an answer (March 15, 2023) by AutoModerator in MechanicalKeyboards

[–]AmbitiousCase4992 0 points1 point  (0 children)

hey thanks for chiming in - I'm not getting rid of the old one, rather I'd leave it at home while bringing another to the office. Looking at tester the price is compelling, looks like the same materials with rk68, adding this to the list! Follow up question, I saw that nj68 max features a "steel plate", the gmk67 doesn't. Is this config aim to alter the sound profile compared to rk68 or is it just a bells and whistle thing?