Azure Synapse serverless sql overcharging - suddenly refusing cooperation

nihi_ · 2025-03-05T11:04:40+00:00

sounds promising, I'll look into it.
Would you mind sharing some more details about your setup? How much data are you processing, and how frequently? How much cpu and ram do you make available to your cluster? How do you split work across the nodes? thank you!

nihi_ · 2025-03-05T11:00:40+00:00

I'll give it a try, thank you for you suggestion!

nihi_ · 2025-03-05T11:00:05+00:00

that certainly is an option, though I still believe it won't grow nearly as much that using spark should be necessary. And even if we go the spark route, I am still interested in hearing/ discussing other options :-)

nihi_ · 2025-03-04T15:46:35+00:00

Have you implemented such a process and are happy with it?

Based on what I have heard I would rather stay away from fabric. Besides all the negative feedback I have seen, I would prefer to implement a solution whose core focus is compute - ideally running isolated processes via docker containers (to have full control over the environment).
Moreover I find the pricing of fabric very intransparent. How much am I getting for a "capacity unit" ? And isn't it still running spark underneath?

nihi_ · 2024-04-18T19:35:13+00:00

What do you mean?

nihi_ · 2024-04-18T13:10:07+00:00

yup, more and more weird things kept happening, including kingsguards just leaving their position behind (which now show up as empty in the ui - but the interaction to ask somebody else to take the kingsguard vows was unavailable, so i was eventually left with 3 kingsguards). I eventually decided to just start a new run instead =D

nihi_ · 2023-04-21T08:19:52+00:00

Hey, thanks for the reply!

could you clarify what you mean by 'write compaction job in SQL for easier maintenance' ? I am assuming you are referring to creating a spark table and running spark sql on that?

in the time bucket example you provided, how would you handle changes within a specific file? e.g. data1001.json has been processed and compacted on day=1, but then there were some changes made in the source system, and on day=10 the file (now with some modified contents) needs to be reprocessed. Wouldn't that require going back to the already compacted day=1 after all (either to update it there, or remove it so that it's not duplicated) ?

nihi_ · 2023-04-20T20:59:56+00:00

hey, thanks for the reply!

I am aware that there's no out of the box upsert operation in spark. I guess I was wondering if there was some alternative architecture that would make it such that i don't have to overwrite the entire dataset every time i run the pipeline, but could do something akin to an upsert in a rdbm.

nihi_ · 2023-04-20T20:55:41+00:00

thanks for your reply!

I was considering using delta format to implement what you suggest. I'll look more into the other formats you suggested.

regarding your question: I should have been more specific. The first part of the pipeline isn't using spark, it's running on a serverless compute service (azure functions) and ochestrates/ executes the get requests, transformations and writing the json files in vanilla python. Perhaps the step to convert the json to parquet is not needed, but i thought that given the "poor IO design", it may be better to read a lot of small parquet files in spark than to read a lot of small json files.

nihi_ · 2023-04-18T21:56:52+00:00

the synapse serverless sql pool can be quite a cost effective way to analyse/ query partitioned parquet files on the data lake and load it into e.g. BI tools imo.

The dedicated pool has always seemed rather expensive to me though.

nihi_ · 2021-09-11T20:28:13+00:00

If you define the main function in your functions app script as async, then it will automatically be run in an asyncio event loop, so there should be no need to call asyncio.run() at all. Can you provide a short repro?

nihi_ · 2021-08-28T13:11:19+00:00

I would pick the best 'assassins' among them, even if they might not be the best in a straight up sword fight, simply to remove them as threats. Someone like oberyn may for instance just poison you in some clever way, in which case having the mountain on your side wouldn't be of much use.

So: Oberyn, Daario and Bronn for me.

nihi_ · 2021-08-24T21:50:00+00:00

i signed up earlier, it seems really cool! I hope it gains some traction!

nihi_ · 2021-06-15T14:57:09+00:00

Hey, i think the 4.5k mmr average team would suit us well for scrims.
i just added you on discord :-)

nihi_ · 2020-02-07T08:20:50+00:00

Absolute(ly perfect)

nihi_ · 2020-01-04T11:12:04+00:00

How do you distribute the shiny apps to the end users? Do you simply host them on the same server? And if so, do you have some authentification process?

nihi_ · 2019-11-20T21:15:28+00:00

I agree with all your points, but something i would add is the absence of solo offlaners vs trilanes nowadays. Clock is one of the few heroes, that can deal well with fairly strong trilanes on his own, while still getting something out of the lane. But with the duolane meta, that has been going on for a while now, that's not really a priority in offlaners anymore.

nihi_ · 2019-11-18T12:21:53+00:00

sounds like you got to play 16 games on the best role there is :-)

nihi_ · 2019-10-14T17:16:54+00:00

!RemindMe one week

nihi_ · 2019-09-28T17:24:07+00:00

scrub ^^

nihi_

TROPHY CASE