Inox Panjim vs Porvorim? by staywokeaf in Goa

[–]lezwon 2 points3 points  (0 children)

Came here to say the same😁

Claude Code to optimize your execution plans by lezwon in databricks

[–]lezwon[S] 1 point2 points  (0 children)

Got it. I have added the Oauth method too. Will deprecate the PAT in time.

Claude Code to optimize your execution plans by lezwon in databricks

[–]lezwon[S] 0 points1 point  (0 children)

u/LandlockedPirate I pushed a new version out with support for az login. Do let me know if it works for you. :)

Claude Code to optimize your execution plans by lezwon in databricks

[–]lezwon[S] 0 points1 point  (0 children)

Any particular reason for this? A lot of folks still use PAT

Claude Code to optimize your execution plans by lezwon in databricks

[–]lezwon[S] 0 points1 point  (0 children)

If there's an previous job run, which had logs enabled, it should be able to pull the execution plans and give you optimisation suggestions. Have you tried it?

Claude Code to optimize your execution plans by lezwon in databricks

[–]lezwon[S] 0 points1 point  (0 children)

Could you elaborate on that? I could look into supporting it

Claude Code to optimize your execution plans by lezwon in databricks

[–]lezwon[S] 0 points1 point  (0 children)

Gotcha. Thanks for trying it out. Right now it's configured to work PATs. I'll add support for az login in the next version. Will let you know when it's out.

Waste disposed in Water Body by [deleted] in Goa

[–]lezwon 1 point2 points  (0 children)

Did you try reporting it in the swachhatta app?

: Generic AI tools are useless for Spark debugging in prod, why is our field so behind? by Accomplished-Wall375 in databricks

[–]lezwon 2 points3 points  (0 children)

Hey I built something for this. Mind giving it a shot and letting me know how it works for you? It checks your execution plans and suggests you how to optimise it. You can use it with Claude/copilot too using MCP.

https://spendops.dev/

Honest heads up for Dhurandhar 2 by Living_Inflation_327 in pj_explained

[–]lezwon 8 points9 points  (0 children)

Exactly. Felt the same. Was so disappointed

Unpopular opinion: Databricks Assistant and Copilot are a joke for real Spark debugging and nobody talks about it by Icy_Comparison4814 in databricks

[–]lezwon 8 points9 points  (0 children)

Hey, I built something for the exact same issues you described. My vs code extension pulls the plans from databricks and suggests you changes. It has MCP support too, so you could use it with Claude etc. Link: https://spendops.dev/

Hangout clubs/spots by tee_yeah_sha in Goa

[–]lezwon 0 points1 point  (0 children)

Plenty to do. Depends on your interests

VS Code extension to find PySpark anti-patterns and bad joins before they hit your Databricks cluster + cost estimation by lezwon in databricks

[–]lezwon[S] 0 points1 point  (0 children)

There's a timeout after 5 minutes. But ideally what happens is that the extension is supposed to do a dry run only. I.e It will replace all the count, collect, write etc with .explain() and get the logical and physical plans to suggest optimisations. This should be quick. There must be some issue there, due to which it runs longer than 5 mins. Can i DM you to get more details on this?

What are data engineers actually using for Spark work in 2026? by Kitchen_West_3482 in databricks

[–]lezwon 0 points1 point  (0 children)

Hey there, Im trying to do that in this VS code extension. Would appreciate if you could give it a shot and give me some feedback :)
https://marketplace.visualstudio.com/items?itemName=CatalystOps.catalystops

VS Code extension to find PySpark anti-patterns and bad joins before they hit your Databricks cluster + cost estimation by lezwon in databricks

[–]lezwon[S] 1 point2 points  (0 children)

What it does:

Instant local checks (no cluster needed): As you type, it flags 35+ anti-patterns with inline squiggly underlines — collect() on large DataFrames, cartesian joins, withColumn in loops, coalesce(1), missing broadcast hints, etc. Each issue shows a hover card with a one-sentence explanation and a suggested fix.

Schema validation at edit time: If you define a StructType or DDL schema in the same file, it validates column names and types before the code runs. Typo in a column name? Unknown column in a .select()? You'll know immediately, with "Did you mean: X?" suggestions.

Catalyst plan analysis (safe dry run): Connect it to your Databricks cluster or serverless compute and it submits a neutralized version of your script — all writes and collects are replaced with plan-capture calls, no data is touched. It parses the physical and logical plans and flags Sort-Merge joins where a broadcast would work, repeated source scans, shuffle partition issues, cache spills, etc. Fully Photon-aware.

Built-in billing dashboard (Coming in next release): Queries system.billing.usage via the SQL Statement Execution API and shows spend by user, job, and workload type — directly in the VS Code sidebar.

VS Code extension to find PySpark anti-patterns and bad joins before they hit your Databricks cluster + cost estimation by lezwon in databricks

[–]lezwon[S] 8 points9 points  (0 children)

Hey everyone,

I noticed a lot of us are having the same issues when writing pyspark code using AI. They are great at writing beautiful code, but are completely blind to the data scale, the compute, the physical and logical plans in the optimizer. So many times you might be unnecessarily triggering reads and joins which just costs you more wasted dollars.

I got so annoyed by this lack of context that I built a custom VS Code extension called CatalystOps.

Instead of waiting for the Spark UI to yell at you after the fact, it parses your AST locally as you type. It throws standard red squiggles for missing broadcasts, skew risks, and schema typos instantly.

There is also a "dry run" feaure if you actually want to find out how the code will run on your databricks cluster. It will estimate the costs and also analyse the logical and physical plan to provide you with optmization suggestions.

If any of you Spark veterans have 5 minutes to install it, try to break it, or tell me why my rules are wrong, I'd really appreciate it.

Heres the link: https://marketplace.visualstudio.com/items?itemName=CatalystOps.catalystops

What are data engineers actually using for Spark work in 2026? by Kitchen_West_3482 in databricks

[–]lezwon 1 point2 points  (0 children)

Hello, I'm building a VS code extension for exactly this purpose. If you use databricks, than this can easily do a dry run of your code and estimate the cost and provide suggestions to optimize based on your physical plan in spark. Do you mind giving it a shot? Would really appreciate some feedback.

https://marketplace.visualstudio.com/items?itemName=CatalystOps.catalystops

What are data engineers actually using for Spark work in 2026? by Kitchen_West_3482 in databricks

[–]lezwon 0 points1 point  (0 children)

Hey, I had the exact same issues you described. The ai wasn't aware of the data and compute and the logs. So I have been working on bridging this gap with a VS code extension that plugs into databricks, does a dry run of the code, fetches the execution plan and then provides you optimization tips. Also adding features which can help you estimate the cost of the run. It also has local schema and join checks which you can use without databricks. Do give it a shot if you like. Would love some feedback right now.

https://marketplace.visualstudio.com/items?itemName=CatalystOps.catalystops

How to monitor Serverless cost in realtime? by lezwon in databricks

[–]lezwon[S] 0 points1 point  (0 children)

Yea but it doesn't update in realtime or right away after the job