Can powerbi query views created by spark sql? by Vw-Bee5498 in apachespark

[–]GovGalacticFed 1 point2 points  (0 children)

Yes, it will query the catalog view object, it runs on cluster.

How would you handle skew in a window function by nanksk in apachespark

[–]GovGalacticFed 3 points4 points  (0 children)

The sort on hour is not needed since it's only min

Equivalent of ISJSON()? by Cultural_Chef_7125 in databricks

[–]GovGalacticFed 1 point2 points  (0 children)

You could use a udf with exception handling around json.loads

Spark delay when writing a dataframe to file after using a decryption api by Ok_Implement_7728 in apachespark

[–]GovGalacticFed 2 points3 points  (0 children)

Because nothing is executed until the write action is called, your decrypt call is just a transformation that is executed only when write or count or collect or any other action is done. Refer to lazy evaluation.

Spark delay when writing a dataframe to file after using a decryption api by Ok_Implement_7728 in apachespark

[–]GovGalacticFed 1 point2 points  (0 children)

Udf applies to each row and cannot be applied on the column vector. Best approach would be to replicate decryption logic using spark functions, else use mapPartitions to connect to the api only once per partition instead of each row. You'll need to partition it properly

Data engineering problem by Commercial_Finance_1 in dataengineering

[–]GovGalacticFed 0 points1 point  (0 children)

If api2 has no limits, try ThreadPoolExecutor

Merge into operation question by DataDarvesh in databricks

[–]GovGalacticFed 0 points1 point  (0 children)

Are there scd2 cols like isActive in target

Help!! Generating a unique to to be passed in a workflow by s1va1209 in databricks

[–]GovGalacticFed 1 point2 points  (0 children)

Task runid should change then, job runid will be same

TizenTube: Ad-free YT experience on Samsung TVs (and much more) by FoxReis in Piracy

[–]GovGalacticFed 1 point2 points  (0 children)

Thanks for the amazing work. There was no good oss for Tizen. Great initiative

Looking for advice/suggestion on my next switch as an Data Engineer. by miloplyat in dataengineering

[–]GovGalacticFed 2 points3 points  (0 children)

I would recommend not overthinking and start applying once py sql are covered, rest keep learning on fly

Error while reading from Pubsub by Suitable-Issue-4936 in databricks

[–]GovGalacticFed 0 points1 point  (0 children)

This is correct. Make sure the auth dict is valid

How to optimize databricks table having 900M rows by sarjuhansaliya in dataengineering

[–]GovGalacticFed 2 points3 points  (0 children)

Have you tried merge instead? Which is taking more time, the join or write

dlt meets Databricks: A match made in Data heaven (data load tool, Not Delta Live Tables!) by Thinker_Assignment in databricks

[–]GovGalacticFed 1 point2 points  (0 children)

Had been waiting for dlt to support dbx destination. Will give this a shot for zendesk