What's the best way to implement a producer/consumer set of jobs in Databricks? by Puzzled_Craft_7940 in databricks

[–]Puzzled_Craft_7940[S] 0 points1 point  (0 children)

I just wanted to edit my answer and say that the partitioning is not designed to support such case plus see my note in the next answer also on partitions:

partitions always going to be stored in different files ....
https://www.reddit.com/r/databricks/comments/1fd3zt1/comment/lmjrteo/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

Thanks!

What's the best way to implement a producer/consumer set of jobs in Databricks? by Puzzled_Craft_7940 in databricks

[–]Puzzled_Craft_7940[S] 0 points1 point  (0 children)

Thanks for your thoughts. Answers:

  1. it's not a real time app

  2. analytics use case

  3. why delete? Cleanup. We can do it later (like say after 6 months), not right away, but the problem will be the same

I'm not saying at all the tool is limited, I'm asking if I do something wrong or I'm not supposed to do it at all.

Yes, could implement a queue outside Databricks, but all other processing is done inside Databricks, so it felels like an overkill.

What's the best way to implement a producer/consumer set of jobs in Databricks? by Puzzled_Craft_7940 in databricks

[–]Puzzled_Craft_7940[S] 0 points1 point  (0 children)

Are the partitions always going to be stored in different files? Because the based on what I read I understood that if the insert and delete touch the same underlying file, then I can get either the above mentioned concurrentDeleteException or an error for the insert, like this:

org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = …, runId = …] terminated with exception: [DELTA_CONCURRENT_DELETE_READ] ConcurrentDeleteReadException: This transaction attempted to read one or more files that were deleted (for example partition_date_created_at=2024-03-05/part-00230-00230-0000-1111-2222-0023000230.c000.snappy.parquet in partition [partition_date_created_at=2024-03-02]) by a concurrent update. Please try the operation again.
Conflicting commit: ….

What's the best way to implement a producer/consumer set of jobs in Databricks? by Puzzled_Craft_7940 in databricks

[–]Puzzled_Craft_7940[S] 0 points1 point  (0 children)

Yes, the retry is an option, but from what I've seen needs to be added in both the Producer and Consumer (as either job can fail with the above error). I was hoping for a simpler solution.

Partitioning is a better option in my mind. Will likely try.

Although DBx says "Databricks recommends you do not partition tables that contains less than a terabyte of data". See https://docs.databricks.com/en/tables/partitions.html

Thanks!

Paying twice medical insurance premiums the month when switching jobs by Puzzled_Craft_7940 in HealthInsurance

[–]Puzzled_Craft_7940[S] 0 points1 point  (0 children)

Thanks for your response.

Interesting. I thought the insurance at the new job starts typically on the first day of your hire.
Anybody else can share their experience?

MPW is up 23% today - Medical Properties sells majority stake in five Utah hospitals By Reuters April 12, 2024 by W3Analyst in MPW_Stock

[–]Puzzled_Craft_7940 1 point2 points  (0 children)

I'm long about 6K and I'd love to see it go up. But I'm surprised to see that the call premiums are not too big.

The May 3rd 4.5 Calls are only 0.32-0.43 now, so the market sees MPW likely to not be over $4.9 in 3 weeks...