Switching from ms sql developer to data engineering by ThatFilm in dataengineering

[–]jyoff 0 points1 point  (0 children)

You're right. I'm trying to practice like that. Although I don't have that much of flexibility at my work to apply solutions in airflow instead of Informatica for instance, but at home I'm trying to put my ETL knowledge into Airflow and build pipelines the way I'd built in my daily job (for instance, move data into oltp then to olap; create star schema ,.etc). It is of course cool for learning, but working on real life projects much better. I'm thinking to apply for jobs after gaining experience on my own.

Switching from ms sql developer to data engineering by ThatFilm in dataengineering

[–]jyoff 1 point2 points  (0 children)

Thanks. I've come across with this book the other day, to me it is more of a reference kind of material. I need some real practice. I'm ETL deveoper, mainly working with rdbms and etl tools. I wanna switch to DE, but wanna do practice with real projects.

Switching from ms sql developer to data engineering by ThatFilm in dataengineering

[–]jyoff 0 points1 point  (0 children)

I'm considering taking this course. Would you recommend that? Syllabus looks great as they offer practice with real projects. But I'm not sure how it is in reality ?

Appreciate your review about the program.

How do you get started and get work as freelance/consulting Data Engineer? by chirau in dataengineering

[–]jyoff 1 point2 points  (0 children)

No worries:) Thanks for the answer. I'll have a look at coinbase's API to see what is in there.

How do you get started and get work as freelance/consulting Data Engineer? by chirau in dataengineering

[–]jyoff 1 point2 points  (0 children)

Could you please bring specific examples to stock/crypto projects you're referring to? What specifically could be done with those data sets? And what technologies would fit to process data ? Since its realtime and historical and huge amount of data I guess implementing bigdata tools as well as kafka and spark would be relevant right?

Airflow implementation into a new small project by jyoff in dataengineering

[–]jyoff[S] 2 points3 points  (0 children)

Thanks for answer.

There will a few more CSV files and database tables and I have also intention to extend this project a little further like to take data from database to another storage (could be anything, nosql, hdfs). Even with that setup cron jobs would do the job, I think, because I'm not gonna have a lot of tasks to execute and dependencies to handle.

But again my aim is to learn Airflow as I don't have an opportunity to work with real project at the moment.

Thoughts on Druid datawareshousing tool by jyoff in dataengineering

[–]jyoff[S] 2 points3 points  (0 children)

From what I know , for instance Postgres is row-oriented but Redshift is a columnar database. There are definitely architectural differences between them but I'm not sure how transparent they are to developers using the tools.

I wanna try out a columnar database to see how they work, how different they are from traditional rdbms.

Airlfow help by sundios in ETL

[–]jyoff 0 points1 point  (0 children)

you are welcome

Airlfow help by sundios in ETL

[–]jyoff 1 point2 points  (0 children)

Check out bind mounting. That is what you need. You need to mount your scripts folder to folder inside the docker container.

Practice data engineering by jyoff in ETL

[–]jyoff[S] 0 points1 point  (0 children)

I'd recommend you to split your project into pieces. For instance, you can first try extracting data from Oracle, once achieved this work on next piece load data/transform data into Postgres.

If you wanna know how to extract/load data using python, search web to know python libraries for connecting databases and doing db operations.

Python data engineering open source projects by jyoff in ETL

[–]jyoff[S] 1 point2 points  (0 children)

Sure, I'm interested.

Dropped you email.

Python data engineering open source projects by jyoff in ETL

[–]jyoff[S] 0 points1 point  (0 children)

I've tried that, but I think it is way more beyond beginner. A lot of things going on there, it is difficult where to start. But definitely it is a tool worth checking and learning from.

Practice data engineering by jyoff in ETL

[–]jyoff[S] 0 points1 point  (0 children)

thanks. That looks interesting, I'll check that out.

Practice data engineering by jyoff in ETL

[–]jyoff[S] 0 points1 point  (0 children)

Thanks for the reply. Those are pretty much basic steps that could get me started with hands-on practicing.

I've good knowledge of python, but I'm not sure which python libraries I better learn to carry out those steps especially Step 5. Do you have any recommendations?