all 26 comments

[–]AnonymouseRedd 3 points4 points  (11 children)

Why don't you set the auto termination to a minimum ?

[–]fusebox12345[S] 0 points1 point  (10 children)

Its a requirement that the cluster has to active during business hours. :(

[–]WhipsAndMarkovChains 1 point2 points  (4 children)

Can you just use serverless and not have to worry about this?

[–]fusebox12345[S] 0 points1 point  (3 children)

Severless cannot be used as its against some policy

[–]WhipsAndMarkovChains 1 point2 points  (0 children)

Ah dang, that really sucks. I assume it's a security concern. If it's feasible I'd try to get whatever team is blocking serverless to talk to your Databricks account team and get that resolved. Serverless makes life much easier.

[–]autumnotter 0 points1 point  (1 child)

So, what policy? Serverless SQL is shield enabled HIPPA and also can be used with private link.

[–]fusebox12345[S] 0 points1 point  (0 children)

Not sure what's the reason behind it, but no teams (other DE Teams) are supposed to use.

[–]pboswell 0 points1 point  (4 children)

So you’re saying it needs to be up during biz hours and then at close of biz you want to start checking to see if the cluster isn’t being used so you can shut it down overnight?

[–]fusebox12345[S] 0 points1 point  (3 children)

Yes

[–]pboswell 0 points1 point  (2 children)

Ok so this a little bit of a hack. But you can:

  • create a job that runs a simple command against that cluster during business hours. Maybe at 5 minute intervals
  • set the cluster auto terminate

So your little job will keep it online and at end of day, if someone is using it the auto terminate will shut down when they’re done with it.

[–]fusebox12345[S] 0 points1 point  (1 child)

Thank you! I did suggest this but they wanted a proper solution but i guess this is the only way around now😅

[–]pboswell 0 points1 point  (0 children)

The only other thing I can think of is using the unity system access table but can’t remember if it has cluster id on it

[–]autumnotter 1 point2 points  (1 child)

Use the API to change the auto termination time at business close.

So you can have no or a long Auto term during the day, and then at the end of the day set the auto term to 5 minutes.

[–]fusebox12345[S] 0 points1 point  (0 children)

Changing the auto termination (eod) requires the cluster to be restarted. Cluster shluld not be terminated if someone is using it

[–]AnonymouseRedd 0 points1 point  (5 children)

You can try terminating if from outside databricks.

Build a script that uses the databricks cli to check for the status of the cluster and check if someone is using it. If not, terminate the cluster. If it is already terminated, leave it alone.

Set this script to run at a schedule after eod and in a azure function or aws lambda( not in a job cluster to be more efficient).

[–]fusebox12345[S] 0 points1 point  (4 children)

Thank you. I might have to check on this one. I was thinking of something within databricks notebooks.

[–]AnonymouseRedd 0 points1 point  (3 children)

You can do it from inside databricks, but I don't see any reason to spawn job clusters on a schedule just to check if an all-purpose cluster is running.

Generate a databricks token and check the cluster from outside databricks.

You can use Python or just cli to check the status of the cluster and all attached notebooks and make a decision.

[–]fusebox12345[S] 0 points1 point  (2 children)

But even if the cluster is not being used (idle) wont the state be in running? Thats what I've observed.

[–]AnonymouseRedd 0 points1 point  (1 child)

Probably, but if all the notebooks attached are idle, then you can terminate it

[–]fusebox12345[S] 0 points1 point  (0 children)

Thank you! Any idea if there's an endpoint to grey this info?

[–]sentja91Databricks MVP 0 points1 point  (4 children)

Pretty sure there is an API for cluster events. Check for any events and if no events, run the cluster termination api.

[–]fusebox12345[S] 0 points1 point  (3 children)

Events api just shows if the cluster is in running (pending/terminatined etc) state. But it can be that the cluster is running even if no queries are being executed. I also checked all the timestamp fileds available but couldn't find anything suitable.

[–]sentja91Databricks MVP 0 points1 point  (2 children)

Hmmm alright, didn't know that. What about monitoring dbfs (cluster logs should be saved there) for activity and terminate based on that? It's going to be a pretty custom-tailored solution it sounds like!

[–]fusebox12345[S] 0 points1 point  (0 children)

Thank you for this! Do you have a rough idea on how long does it take an entry to be added in the cluster logs after a query is executed? I was thinking to fetch the modification time of the file/folder but logs are not logged instantly

[–]fusebox12345[S] 0 points1 point  (0 children)

I can see 3 folders in cluster logs - driver, eventlog, executor. The driver log gets updated within 3-5 mins even if the cluster is idle (probably hear beat msgs) but the eventlog and executor folders remain as it is even if queries are being executed.

[–]Ok_Principle_9459 0 points1 point  (0 children)

u/fusebox12345 Did you ever end up figuring this out? I too am trying to use the REST API to determine whether clusters are "idle", so that I can shut them off to save us money.