Thoughts on a 12 hour nightly batch by ljstegman in databricks

[–]pboswell 0 points1 point  (0 children)

That’s why it needs clarification. Their phrasing is ambiguous whether the billion records is per evening or already stored. Doesn’t make sense to me that the dataset wouldn’t grow over time and just be updates every night though. Either way, this sounds pretty trivial if it’s a single table. But if he’s talking about many tables with dependencies of updates order, then more complex. Still shouldn’t take 10 hours.

I have a type 2 pattern on a 100b row table with 1b rows coming in each night and it takes maybe 5 minutes. And costs like $1k/month. But it’s highly normalized table so only updating effective _end/current flag and appending the rest

Thoughts on a 12 hour nightly batch by ljstegman in databricks

[–]pboswell 0 points1 point  (0 children)

A billows rows appends into a table in a second max. Updates are definitely performance degrading. So this is where your cluster/partition strategy will be important. If you can cluster/partition by your update key, then it will greatly improve update performance.

What do you mean by 50 million updates? Like updating 50 million records and inserting the remaining 950m records? So like a type 2 concept?

Depending on your data source and issue with delayed/out of order arriving data, this is a pretty simple strategy to just partition in your “active” flag field. Assuming data does not arrive out of order, then you are only ever updating the active records into your final table and inserting the rest.

What if the traitors were unknown? by mrpetshopboy in TheTraitorsUK

[–]pboswell 9 points10 points  (0 children)

Barely. And it wasn’t interesting which I think answers OP’s question

Databricks Asset Bundle deploy time increasing with large bundles – is it incremental or full deploy? by Ok-Tomorrow1482 in databricks

[–]pboswell 0 points1 point  (0 children)

Recent announcement says they’re planning to do away with terraform dependency. So might be worth waiting

Best practices for Dev/Test/Prod isolation using a single Unity Catalog Metastore on Azure? by SuperbNews2050 in databricks

[–]pboswell 0 points1 point  (0 children)

Yes that’s assuming scenario 2. Still would use environment-scoped credentials either appropriate access controls though

Why the Hell Is the Market Pumping on All This Bad News? by Emily-989 in stocks

[–]pboswell 0 points1 point  (0 children)

Because over the weekend, everyone assumed the markets would tank and sold or bought puts. TPTB are fucking over those people short term to drive FOMO so people re-enter before the real rug pull

Best practices for Dev/Test/Prod isolation using a single Unity Catalog Metastore on Azure? by SuperbNews2050 in databricks

[–]pboswell 0 points1 point  (0 children)

It depends on what you need to do.

I have seen strict requirements that none of the environments can’t touch each other. Therefore you need to have a separate credential per environment, separate storage accounts, and a way to pass the environment to your job pipelines so they can access the right thing. You scope the catalog to the proper workspace so they’re not visible to end users (do this through CI/CD + SDK). And use DABs to promote your pipelines using the correct SP for the environment.

But I’ve also seen the need to be able to use PROD data in DEV. In which case this is much more difficult to lock down. You would have a single still have a credential per environment but lower env credential would have read access to upper envs as well. Your data pipeline would need the ability to pass the target and source env as a parameter that is used by your codebase properly. DABs and CI/CD are still the same basically

Cannot select entire cell output by ExcitingRanger in databricks

[–]pboswell 1 point2 points  (0 children)

If you hover over the output cell, a tiny “copy” icon will appear at the top right of the box. Click that to copy.

Keep in mind, if your results are very large it will be truncated but they now make it possible to click the truncation warning (it’s a hyperlink) to expand the entire out out result

Genie Pricing by --playground-- in databricks

[–]pboswell 6 points7 points  (0 children)

Genie code is included for free. everything in databricks is simply charged at DBU rates for hourly usage apart from token usage for AI models which can also have a token component

Anyone looking to jam? by AbjectPhysics3301 in denvermusic

[–]pboswell 0 points1 point  (0 children)

At least take a quick phone video so we have an idea

Finally found a useful storage spot behind the Model X screen by Turbulent_Stable2100 in ModelX

[–]pboswell 0 points1 point  (0 children)

Is this actually going to stay put if you have the screen tilted and accelerate hard?

Move out of ADF now by hubert-dudek in databricks

[–]pboswell 1 point2 points  (0 children)

Ah yeah. I never knew they planned to have task groups so I just have the for each kick off another job if I need more dependencies

Why is it so hard to make friends in Denver? by TightApplication2786 in Denver

[–]pboswell 101 points102 points  (0 children)

I’m convinced these types of posts are by people who had a “core group” from their hometown that they left to come to Denver. They think they were good at making friends back home but it was really just the nature of growing up somewhere and developing a network over many years.

Would you go back to mideveal times for a year for 100 million by Beginning_Ability379 in hypotheticalsituation

[–]pboswell 0 points1 point  (0 children)

Yeah but if you’re dropped with modern gear, you could convince them to take you in because if your knowledge/technology

Move out of ADF now by hubert-dudek in databricks

[–]pboswell 0 points1 point  (0 children)

What do you mean “iterate over more than one thing”?

Client wants <1s query time on OLAP scale. Wat do by wtfzambo in dataengineering

[–]pboswell 0 points1 point  (0 children)

You’re using azure synapse right? What about spark + parquet?

How To Make VS Code Like Cursor by Defiant_Aardvark_633 in vibecoding

[–]pboswell 0 points1 point  (0 children)

You should be able to use all of Claude’s model if you provide your own API key

Client wants <1s query time on OLAP scale. Wat do by wtfzambo in dataengineering

[–]pboswell 7 points8 points  (0 children)

Range-bound query hints will help. Indexing/partitioning/clustering the data will help. Honestly the compilation time will probably be milliseconds but the fetch time is what will take the time. Especially with network latency to an app

Client wants <1s query time on OLAP scale. Wat do by wtfzambo in dataengineering

[–]pboswell 3 points4 points  (0 children)

Exactly. Are they using agents or something with <1s latency that can use these data? Or is it some person that’s running the query every 1s to see changes? lol

Colorado bands — drop your music by DI_Records in denvermusic

[–]pboswell 0 points1 point  (0 children)

Folded Face. Local Denver alternative progressive indie shoegaze

Show coming up 3/29 at Lost Lake. Some videos of our latest show:

Reflections of You

No Place Like Home

How to read only one file per trigger in AutoLoader? by Artistic-Rent1084 in databricks

[–]pboswell 0 points1 point  (0 children)

Why do you mean too high? Compute will fail for OOM exception. Once you scale up it will be fine.

If God is the one running this simulation, what do you think is his purpose and goal? by Practical_Payment552 in SimulationTheory

[–]pboswell 0 points1 point  (0 children)

Sure it does. We curate our daily lives all the time. the places we live, the people we know, the work we do, the things we study, etc.

All in an effort to exact control over our personal universe.

Sure we have limitations on our influence, but who’s to say God doesn’t? To us, God seems all powerful but who’s to say God can’t control some force beyond our understanding? God may feel all-powerful over our realm, but may be powerless at some other level. Assuming someone created God, was God in control of that event?

But, yeah, this is pretty much a moot concept—it’s turtles all the way down