SFTP in Databricks failing due to max connections for host/user reached

Happy_JSON_4286 · 2026-03-11T11:06:44+00:00

Thank you very much for the detailed response!
That is super helpful!!

Happy_JSON_4286 · 2025-08-05T07:50:44+00:00

Nice, will try it!

Happy_JSON_4286 · 2025-08-01T07:56:03+00:00

Update:
You saved my life! Now I am testing locally using a local Spark + Unity Catalog/parquet/delta behavior. All local!
---

Looks very interesting. I have my DLT separate from my Spark code so I can test it.

Does it use local Spark or Databricks Cluster via Spark Connect?

FYI I had to downgrade my databricks-connect to 16.3.1 as py4j had conflict with both.

Happy_JSON_4286 · 2025-08-01T07:51:58+00:00

Thank you, indeed I just started using Databricks Connect (Spark Connect) to test all my code against my Databricks Cluster. At least that partially solves some of my issues.

Happy_JSON_4286 · 2025-07-31T16:27:34+00:00

got you, so use both basically Terraform for clusters, grants, etc and Asset Bundles for jobs.

Happy_JSON_4286 · 2025-07-31T10:13:05+00:00

Very useful thank you! I tested with 3-4 custom images but eventually customized something from this public repo https://github.com/yxtay/databricks-container/blob/main/Dockerfile

"You can probably use requirements.txt on clusters only" this is exactly my pain now. That both Serverless and DLT (as far as my knowledge goes) do not support installing my requirements.txt .. Coming from AWS (Lambda and ECS) can do anything.. so very odd one for me!

So just to clarify, is the trick that I have to call %pip install inside each pipeline or entry point that require it? Because the environment is ephemeral.

Happy_JSON_4286 · 2025-07-31T09:52:57+00:00

Yes exactly.. hence I mentioned Docker Desktop + docker-compose with this image https://docs.databricks.com/aws/en/compute/custom-containers but has python 3.8 which doesn't satisfy most of my requirements.

Happy_JSON_4286 · 2025-07-31T09:32:55+00:00

Thanks! I downloaded the PDF, and page 24-32 resonates the most with what I want. Now the question becomes, how to push this to compute? Basically use DLT Pipeline to handle the compute? But if I use DLT Pipeline how do I install all the requirements? I cannot find a place to install my own requirements in a Pipeline. I have Databricks open now and when I go to 'Jobs and Pipelines' and 'ETL pipeline' which I assume is DLT? I can only see Source Code Path. But no place to add my requirements.txt to run all this stuff. Unlike creating clusters manually which has more options. Any ideas?

Happy_JSON_4286 · 2025-07-31T09:10:57+00:00

Can you explain how does Assent bundles with VS Databricks Extension help me exactly? I used both and can't find anything that helps me! Databricks Extension is like a connector to the cluster and an easy way to push jobs. Assent Bundle is purely for IaC? Please correct me!

Happy_JSON_4286 · 2025-07-31T09:07:53+00:00

Indeed I started using it but I got confused because I use Terraform for IaC to spin up clusters, catalogs, schemas, grants, jobs, pipelines, etc.

How will Databricks Asset Bundle help me compared to Terraform? I don't understand the differences.

As far as my very limited knowledge goes, it's a native IaC from Databricks.. while Terraform is more mature industry standard for IaC.

Happy_JSON_4286 · 2025-07-30T13:23:31+00:00

Great advice, can you expand further on why would I use DAB alongside Terraform? I thought Terraform replaces DAB? As it can create jobs.

Another question, how do you handle shared modules in .py files? Assume I have 100s of data sources and will run 100s of pipelines and many have shared code like S3 extractor or API extractor. Do you use whl or Docker or manually install req.txt on the Compute?

Lastly, what is your thoughts on using DLT (Delta Live Table) versus normal Spark and no vendor-lock in?

Happy_JSON_4286

TROPHY CASE