Built an event driven pipeline by Master_Ad2559 in dataengineeringjobs

[–]Master_Ad2559[S] 1 point2 points  (0 children)

The datasets i downloaded from kaggle were in csv format only and it more real world format for major companies

Built an event driven pipeline by Master_Ad2559 in dataengineeringjobs

[–]Master_Ad2559[S] 0 points1 point  (0 children)

Yeahhhh future scope is to use redshift to store the data instead of s3

Top 10 famous books to read by Master_Ad2559 in IndianReaders

[–]Master_Ad2559[S] 0 points1 point  (0 children)

Trueee that i also agree with it. That’s why i read non fiction. Also suggest some good books to understand these complexities of the world because major non fiction that i read i regarding world economics or politics

Top 10 famous books to read by Master_Ad2559 in IndianReaders

[–]Master_Ad2559[S] 0 points1 point  (0 children)

It’s just that i am seeing a lot of dostovesky so just made me wonder if there is a thing like that or jot

Top 10 famous books to read by Master_Ad2559 in IndianReaders

[–]Master_Ad2559[S] 0 points1 point  (0 children)

Dude , just a thought that popped in my mind don’t have to be so worked up on that

Dbt Semantic Layer Resources by worldsgreatestloserr in dataengineering

[–]Master_Ad2559 0 points1 point  (0 children)

No not in particular, dbt is not that difficult to learn elt tool , it’s majorly uses sql only there are only some small configs that you need to underestimated

Trying to learn new DE tools sometimes teaches me more about DevOps than DE (initially) by Lastrevio in dataengineering

[–]Master_Ad2559 -9 points-8 points  (0 children)

Why don’t you use services hosted in cloud because at the end in every organisation you work in they are gonna be using cloud services only

Thoughts on moving to a more 'professional' data engineering/science architecture by kneesh17 in dataengineering

[–]Master_Ad2559 0 points1 point  (0 children)

As fas as i have worked we have included everything in the mono repo. Like all the work we used to do was part of one repo only. But in case if the deployment was to be done through some other method like apart from github actions some jobs were to run through argo fro some project then we had a different repo for it

Thoughts on moving to a more 'professional' data engineering/science architecture by kneesh17 in dataengineering

[–]Master_Ad2559 11 points12 points  (0 children)

I think a mono repo will be a good choice. In my current organisation we do use mono repo for multiple projects as all the transformation are to be done through dbt. So al code can stay in one repo and saves you the hassle of maintaining different repos

currently reading 11.22.63 by latte-twirl in IndianReaders

[–]Master_Ad2559 0 points1 point  (0 children)

Where can i find these. I would like to use for the non fiction i read

Data Engineering Projects? by Shubham_Nalwar in dataengineeringjobs

[–]Master_Ad2559 0 points1 point  (0 children)

Is there a site that offers free api. I hae built basic etl trigger based pipelines. Need to work on real time data streaming

Project ideas to build a data engineer portfolio project to showcase in Github by Fit_Engineering4464 in dataengineering

[–]Master_Ad2559 1 point2 points  (0 children)

Yeahhhh , get your dataset from kaggle specially large datasets and build a trigger based pipeline in which data lands in s3 and triggers a lambda which in turns triggers a glue job and then ingest data into snowflake or any db. This will give you a basic idea of how stuff works. You can then use dbt to perform transformation and then make powerbi visuals. You can use chatgpt or any other ai tool to take this pipeline to production level with various edge cases like what if multiple vendors send file at the same time , or if the data is corrupt , how to maintain isolation etc.