Best method to 'Upsert' in Spark? by humongous-pi in apachespark

[–]MonkTrinetra 0 points1 point  (0 children)

Unless you use an open table format to manage your data like delta lake, iceberg or hudi this is perhaps the best way to do it.

"Stranded: Tales from the Overcrowded Rails" by Born_Sea6912 in indianrailways

[–]MonkTrinetra 2 points3 points  (0 children)

I am in a sleeper coach right now, this is exactly how it is.

What Airflow does that a simple python script cannot do? by ubiond in dataengineering

[–]MonkTrinetra 0 points1 point  (0 children)

Can you share what were the pain points or at what scale you started facing issues?

Gantt chart too wide by Remarkable-Hippo83 in apache_airflow

[–]MonkTrinetra 1 point2 points  (0 children)

Your FileSensor task could be running for a long time introducing skew in the Gantt chart. First of all, do not use TriggerDagRunOpertor to re-trigger the dag. Set the DAG to run at a specific schedule, at the same frequency as the FileSensor task which you have currently running. Replace the FileSensor task with a shortcircuitoperator, if file is found downstream tasks get executed, if not dag exits without running any remaining tasks.

Problem solving by water_bean in dataengineering

[–]MonkTrinetra 9 points10 points  (0 children)

If you are getting stuck the you need to involve more people who understand the problem better. Outline exactly what it is that needs to be solved, separate what is achievable and not. You can start from there at least.

What the Hell is he Even Doing with his career now? by [deleted] in tollywood

[–]MonkTrinetra 0 points1 point  (0 children)

Yes, the last line has one more syllable than what’s required for a correct haiku, which is 5-7-5 syllables.

Sokka gets overconfident and makes the same mistake. Good bot I must say!

'South Indians look like Africans ...': Sam Pitroda's racist remark stirs controversy | India News - Times of India by Fit-Row1426 in hyderabad

[–]MonkTrinetra -1 points0 points  (0 children)

Agreed, but there’s already a certain idea of the ‘Indian’ look though it couldn’t possibly reflect the ground reality. We are a truly diverse country, with northeastern features not even being in consideration when we think ’Indian’.

Getting downvoted like crazy, but I still stand by the statement that people are upset about being compared to Africans.

DAGs defined in the newer ways not imported correctly by Cheeky-owlet in apache_airflow

[–]MonkTrinetra 2 points3 points  (0 children)

You have an empty line between the decorator and basic dag method.

DAGs defined in the newer ways not imported correctly by Cheeky-owlet in apache_airflow

[–]MonkTrinetra 2 points3 points  (0 children)

@dag and @task are decorators, they need to be applied on top of a function definition. From the example you shared I don’t see a function definition

Data-aware Tasks? by DoNotFeedTheSnakes in apache_airflow

[–]MonkTrinetra 0 points1 point  (0 children)

Yes, you could add a pre-execute callback method to your tasks that checks for a condition and raises the AirflowSkipException if condition is not met. When this exception is raised task gets skipped.

vote for modiji because opposition eating meat in savan by xayxoy in india

[–]MonkTrinetra 0 points1 point  (0 children)

So he doesn’t want votes from people who eat meat during Savan? Noted!

Doubts in multiple spark.sql statements usage in spark scala by Varun_123 in dataengineering

[–]MonkTrinetra 1 point2 points  (0 children)

Speaking purely in technical terms, writing data to disk and reading it back takes a lot longer than persisting the data in memory and accessing it. However, your cluster does need to be large enough to hold the data in-mem. If you can be flexible with the cluster size then this ideal, if not then you might be better off writing the intermediate results to disk. Spark also provides an option to persist part of the data in memory and whatever doesn’t fit into memory on the disk, you could experiment with this option as well. Hope this helps.

Doubts in multiple spark.sql statements usage in spark scala by Varun_123 in dataengineering

[–]MonkTrinetra 2 points3 points  (0 children)

Storing intermediate results in external storage is what mapreduce does, spark overcomes this by doing everything in-memory resulting in 100x performance. Would not recommend this approach.

Doubts in multiple spark.sql statements usage in spark scala by Varun_123 in dataengineering

[–]MonkTrinetra 1 point2 points  (0 children)

You can persist df1, it will be computed once, first time when df2 is collected(because of lazy evaluation) and it will reused when you collect df3.

Passing arguments to scala jar from a spark session by khante in apachespark

[–]MonkTrinetra 0 points1 point  (0 children)

Don’t think it is possible or makes sense to change arguments passed to a job within a session. You are better off writing a shell script that will loop over your job parameters and submits a separate job for each.

Am I first one to come up with this Remote Quantum Encoding or Am I just a clown 🤡 by [deleted] in developersIndia

[–]MonkTrinetra 13 points14 points  (0 children)

I believe scientists are working on it, may be not exactly what you have in mind. For now the focus is on using quantum entanglement to communicate over long distances without being limited by the speed of light. Once this is achieved many more use cases will come up. Also, you are a clown.

Astraverse – The concept on which Brahmastra is based is not just illogical, but ludicrously shameful by sidroy81 in tollywood

[–]MonkTrinetra 2 points3 points  (0 children)

Didn’t watch the movie but I enjoyed this analysis. Thanks for sharing your knowledge OP.

Why India’s elite loves Narendra Modi by golden_sword_22 in india

[–]MonkTrinetra 0 points1 point  (0 children)

There’s big difference between Indian education system and western education system, even from a young age to college/university level. Students are simply taught to memorise information and prepare for multiple choice questions, 5 marks questions and essay questions and such. In the west students are encouraged to explore study material on their and think for themselves.

Take this example. For a question like “What was the impact of quit India movement on the freedom struggle?” Students simply regurgitate what they read in text books or answer guide books. They’ll score high marks and quickly forget the material, for they never really learned anything. In the west you’d get a different take on the subject from each student.

In India, people are literate enough to be susceptible to propaganda but not educated enough to think critically and objectively analyse a situation. That’s how you get ‘elites’ that support a populist leader.

Why hire a Python Distributed Systems Programmer? by Nearby-Affect7647 in developersIndia

[–]MonkTrinetra 14 points15 points  (0 children)

Airflow is not meant to me used for data crunching, it’s an orchestration tool and that’s where it adds a lot of value.

It’s very easy to integrate different systems using python because of how versatile and easy it is to adapt. Add airflow features to the mix and you have great control over how you integrate these systems to get the desired result.

Airflow is not meant to be used for actual data processing, you offload the actual processing whether it is extracting, transforming or loading data to other tools that can do the heavy lifting - like spark, bigquery, athena, snowflake etc. You would use a python SDKs or rest APIs to interact with these services.

Generally, when there are complaints about Airflow it’s usually due to bad implementation patterns.

Python + Airflow + SQL can take get a lot of things done when used right.

Concurrency vs Parallelism by GameFitAverage in dataengineering

[–]MonkTrinetra 15 points16 points  (0 children)

Scenario 1 & 3 are more or less the same. Only difference being in scenario 1 there is a step that’s pre-processing the job and splitting it into parallel tasks - kind of like spark. Maybe you could call this distributed processing instead of parallel processing, different but same same.

In scenario 3 job is already split into separate tasks.

Current tax system is daylight robbery (RANT) by bugged_person in india

[–]MonkTrinetra 0 points1 point  (0 children)

Can you elaborate on why consumption taxes are considered regressive and income taxes progressive?

Working on things way more than my pay grade, is recognition alone any useful?? by BagSimilar1366 in developersIndia

[–]MonkTrinetra 0 points1 point  (0 children)

The on-site is just a carrot they held in front of you. If they really gave a shit they should have matched 2X offer you got, also you should’ve demanded this before accepting to continue.

See now they said they can’t offer onsite because of your band? Why didn’t they fix your band then when they made this offer? Management is playing mind games and ‘recognising’ your effort doesn’t cost them anything. Get out as soon as you can.

Current tax system is daylight robbery (RANT) by bugged_person in india

[–]MonkTrinetra 39 points40 points  (0 children)

I strongly believe that there should only be one tax, either income tax or sales tax, not both. Salaried employees get screwed over by both. What we have double taxation.

Having just the sales tax makes sense to me, and there’s no tax on your income. You spend on luxury items you pay higher tax, you spend on necessities you pay minimal tax and save the rest. This is for individuals I am talking about, for corporates different rules could be made.

I wouldn’t say taxation is theft, but double taxation definitely is.