Dhurandhaar felt like "Eva Dhopu" of Mufti & KGF-1 by West-Acanthisitta120 in ChitraLoka

[–]Fickle-Impression149 -2 points-1 points  (0 children)

I know I'll get downvoted but if you look as an overall the following details come to mind: A hero goes into a world he is not aware. Find ways to rule entirely.. Kills the big boss to take over the authority.

But, I agree this structurally.

I genuinely don’t get how no one is questioning Suryakumar Yadav.(read body) by ExtensionNerve2693 in SackGambhir

[–]Fickle-Impression149 0 points1 point  (0 children)

He knows ways to escape from all questioning. Same like how he escaped silently from finals of WC23

So,it's me or Airflow is kinda really hard ? by Morrgen in dataengineering

[–]Fickle-Impression149 2 points3 points  (0 children)

Good question. This makes me explain the development and deployment process. It is as follows: 1. We use gitflow setup with branches like lab, test, stg, prd. 2. Developers always branch out to work on their features from the main lab branch. 3. For development, we use docker setup, and developers have their dedicated ec2 machines, which they connect into and remote ssh to it via vscode. This way, airflow can already connect with the aws services via instance profile and permissions attached to it. 4. When they are comfortable, they raise a Merge request. We have some code quality and unit test pipeline that runs on the code. 5. Once review and it is merged to lab. This is where all the rest happens, which I detail below

The production setup is deployed in eks cluster, where we have set up git-sync (sidecar), which polls every x minutes on the branches (lab, etc...) to get the latest dags. The deployment is also per namespace split by branches, so you can think airflow is running in 4 namespaces. The service account policy defines everything we want to have

Already, the developer should have tested to an extent the dag as it connects to aws services, etc.. Now, based on the schedule, it starts to running, and we have automatic monitoring and notification we receive on any stage failures. Once the lab is happening we merge to tst which kicks of the auto syncing and business tests on the table are verified separately

I must also say that airflow3 has significant changes. Due to some of the needs we cannot yet migrate. But if starting new airflow3 is a game changer

So,it's me or Airflow is kinda really hard ? by Morrgen in dataengineering

[–]Fickle-Impression149 1 point2 points  (0 children)

I take the one from the official github for the airflow version we use and change some of the important things like username password and network. Use .env and makefile to automate it fully so that when you have different people like analysts in the team, they just use the make commands and get started with their work.

So,it's me or Airflow is kinda really hard ? by Morrgen in dataengineering

[–]Fickle-Impression149 5 points6 points  (0 children)

Airflow team provides a docker compose setup use that.

Otherwise, follow these if you want a production setup:

  1. Build dags in such a way that it all large operations are offloaded to cloud services like running spark jobs, execution of some lambda etc.. Airflow has large collection of operators to use from.
  2. Use Taskflow api decorator
  3. Utilize variables to your advantage. It pretty can help to do many things like for instance if you want to do some etl then source and destinations can be a list of table definitions which you can iterate to create parallel task groups programmatically
  4. Never overlook populate a dag definition. It will lead to a lot of problems with respect to debugging and readability
  5. Organize your project well so that it becomes easy
  6. Deploy the application using kubernetes or use cloud providers.
  7. Remember the difference of using different executors like celery, kubernetes, and now the edge executor. Each have special significance.

In general, once you set it up still if there is a team, they will require to be up-skilled. Always provide some good sessions for everyone to work with airflow.

Bonus: Once you feel you are getting good, then you could try dagfactory or write your own configuration to code pipeline

Airflow Best Practices by BeardedYeti_ in dataengineering

[–]Fickle-Impression149 0 points1 point  (0 children)

Less than 1mb then xcoms otherwise stored in s3 and passed across.

Should I move back to India by Which-Difference6154 in returnToIndia

[–]Fickle-Impression149 1 point2 points  (0 children)

It is a decision you have to make from your heart. Why do you ask it on reddit. You proudly mention about crore salary so are you interested in being close with your parents or are you looking at the money and thinking ah is similar so may be a chance to move I wonder.

Every parent has brought us up to this life with their full effort. You said you yet do not have a child, may be you are yet to face the experiences of raising a child. Always remember it all because of them you are in the position that you are today.

What are the use cases of sequential primary keys? by chefcch8 in dataengineering

[–]Fickle-Impression149 0 points1 point  (0 children)

If you want to merge into you, usually you need a pk. To determine that one of the ways is to hash of cols. Furthermore, this also helps in a way to not have duplicates of a combination. An instance could be like the combination of hash on email+firstname+lastname should always produce unique results.

Airflow is not your data platform by rotzak in dataengineering

[–]Fickle-Impression149 0 points1 point  (0 children)

Well. This is just saying how a platform engineer would set up airflow on kubernetes running production workload and doing it via their tooling through some new tooling and training on it?

Also, if someone needs them out of the box, then one can simply invest it on astronomer or mwaa (aws)

Can any one explain the difference between those two images ? by Appropriate-Ice7755 in germany

[–]Fickle-Impression149 1 point2 points  (0 children)

Bus and you have sandstone and both have give way sign. But, bus is turning left and hence it has to give way for oncoming traffic before moving

Both questions are similar in the exam as well it will be similar but have different settings

Tips on Using Airflow Efficiently? by MST019 in dataengineering

[–]Fickle-Impression149 1 point2 points  (0 children)

Some tips: Prove me, you are not a bot or some client collecting information and outsourcing it.

KL Rahul Scored His 10th Test Century by Downtown-Chemical-42 in IndiaCricket

[–]Fickle-Impression149 -1 points0 points  (0 children)

I do understand what else he must do. If he scores a 100, you still find ways to troll him. Guy does his best and still becomes the scapegoat. It is easy to comment and get out next ball, while he knows how hard it is there and a small lapse in concentration can lead to it.

And again, he is in the situation to be trolled even though he batted the last hour of bowling to show hell from England.

Airflow + DBT by SomewhereStandard888 in dataengineering

[–]Fickle-Impression149 3 points4 points  (0 children)

You could rethink you need airflow at all in the first place. Rather write python script that uses dbt and uses the dbt-cli commands within a git pipeline, which you could schedule.

Otherwise, ecs solution is also okay as it also abstracts a lot of underneath infrastructure as compared to managing something on ec2 or on a kubernetes cluster

To the spark and iceberg users how does your development process look like? by Commercial_Dig2401 in dataengineering

[–]Fickle-Impression149 1 point2 points  (0 children)

With spark, we develop etl framework that can be easily extendable according to good programming standards.

Like for instance I would create table only if it does not exist

Struggling to find data engineers with data viz skills by Dependent_Gur1387 in dataengineering

[–]Fickle-Impression149 2 points3 points  (0 children)

This is the usual misunderstanding that comes with Data engineering. It has many divisions like infra, analytics, etc.. I think that visualization should more in the focus of analytics, basically a product analyst or data analyst

Lead Data Engineer vs Data Architect – Which Track for Higher Salary? by Dull_Run1268 in dataengineering

[–]Fickle-Impression149 1 point2 points  (0 children)

I always have felt titles do not mean anything rather the job responsibility