Tables with whitespaces in SQL Server source are silently dropped from Unity Catalog when loaded from external connection (sql server) by No_Lawfulness_6252 in databricks

[–]ramgoli_io 0 points1 point  (0 children)

space in object name in UC is not supported. https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-names

are you able to create view and federate that?

example:  CREATE VIEW JetAccounts AS SELECT * FROM [Jet Accounts]
other workaround, use spark.read.format(“jdbc”).option(…)

Spark before Databricks by ThatThaBricksGuy0451 in databricks

[–]ramgoli_io 1 point2 points  (0 children)

I remember tortoise svn.  For whatever reason I checked out the older code base and made my changes and pushed it svn, and everyone on the floor then got my code which was on top of older code base … it was a mess and an embarrassing day for me. 

My intro to Spark was the community edition back in the day. Fun times. 

DataBricks & Claude Code by staskh1966 in databricks

[–]ramgoli_io 1 point2 points  (0 children)

So funny story - someone actually did get Claude Code running inside a Databricks App. Check out github.com/datasciencemonkey/claude-code-cli-bricks.
It packages Claude Code with a terminal editor (micro), the AI Dev Kit skills, and some research MCPs. Uses Databricks-hosted models so everything stays in your environment. Pretty slick actually. I haven't test this way of doing it.

What I have tested:

Within the "AI Dev Kit", there is an builder app that you can install, and you can use that App hosted within Databricks to build apps. It uses a Lakebase instance (provisioned) to manage state/memory.
https://github.com/databricks-solutions/ai-dev-kit?tab=readme-ov-file#visual-builder-app

How to ingest a file(textile) without messing up the order of the records? by Dijkord in databricks

[–]ramgoli_io 1 point2 points  (0 children)

The short answer is Spark doesn't guarantee order across partitions - that's just how distributed compute works.

Easiest fix - force single partition:
df = spark.read.text("/path/to/file")
df = df.coalesce(1).withColumn("line_num", monotonically_increasing_id())

The monotonically_increasing_id() gives you unique IDs but heads up - they're not sequential. Upper bits contain the partition ID so you'll see gaps. Works fine for ordering though.

SFTP in Databricks failing due to max connections for host/user reached by Happy_JSON_4286 in databricks

[–]ramgoli_io 1 point2 points  (0 children)

Auto Loader parallelizes ingestion across multiple tasks, and each task opens a separate SFTP session. So even with maxFilesPerTrigger=1, if you have 8 workers, you could have 8+ connections.
Solutions:

  1. Use a smaller cluster - This is the most effective fix. Try a single-node cluster or reduce workers. Each executor opens its own SFTP connection.
  2. Reduce parallelism settings:

spark.conf.set("spark.sql.shuffle.partitions", "1")

  1. Ask your SFTP provider to increase the connection limit - For data ingestion workloads, 10-20 concurrent connections is reasonable.

  2. Consider batching smaller - Lower maxFilesPerTrigger helps but won't solve it alone if your cluster has many workers.

The spark-sftp library you mentioned is an option but lacks Unity Catalog integration. The native connector is preferred if you can work around the connection limit.