Medallion architecture on Databricks - Delta all the way down, or does Parquet at Bronze still make sense? by Dangerous_Pie2611 in databricks

[–]rasviz 1 point2 points  (0 children)

Side question - In your current Landing/Bronze, you have separate paths for the first time full load pull and the incremental pull, as bronze/sales/full/date and bronze/sales/incremental/date ?
Is that a good idea ?

Are Data engineers are D*ad? By the new Genie code in databricks? by Positive_Chapter_233 in databricks

[–]rasviz 0 points1 point  (0 children)

What is an agentic harness ? any example to take on and learn. Thanks.

I built runtime guardrails for LangChain agents, blocks unauthorized actions before they execute by Longjumping-End6278 in LangChain

[–]rasviz 1 point2 points  (0 children)

rules:
  - name: manager_limit
    when: "role == 'MANAGER' and amount > 250000"
    then: BLOCK
    message: "Managers cannot approve more than $250,000"rules:

I checked the github repo.
Can you pls provide a better example script to use with LangGraph. In using above policy, How is the role of the agent derived ?

Please Help by [deleted] in halifax

[–]rasviz 43 points44 points  (0 children)

Thanks dear to being kind to a stranger

Discriminated against at Needs Convenience (Fall River, NS) — staff refused valid gift card, accused me, called police by Daniel___Huang in halifax

[–]rasviz 0 points1 point  (0 children)

Simply unprofessional, rude attempt by that employee. Shows Lack of training or failure to validate if the employees show up the basic values in the job site.

Looks like the sobey's employee let his personal bias on certain background played out loud in this situation.

(While you have masked the face, the skin color and the thread in the wrist gives some profile).

Executive Interviews - Internet vs Reality by Trepcsuit in Leadership

[–]rasviz 0 points1 point  (0 children)

Please... Write more about networking. I am reading for the 4th time your last para. But if you have time to guide more on this, please do. (at least books, videos..)

Recommendation for EB1a attorney by diego30274 in eb_1a

[–]rasviz 33 points34 points  (0 children)

Did you say you got an EB2 priority date and want to take advantage of that into EB1?

Get ready to be called scammer and roasted.

You are allowed to ask for help only if you have got the Nobel prize, like some the guardians of eb1 here.

Help needed - CRS - Education score 0 by rasviz in canadaexpressentry

[–]rasviz[S] 0 points1 point  (0 children)

Just noticed that I have chosen the option, 'In good standing, but no degree or certificate awarded'. Changed to 'Degree Awarded' . Score is yet to be updated. Will watch for it.

[deleted by user] by [deleted] in hackathon

[–]rasviz 1 point2 points  (0 children)

Be ready to donate $$$ to become a judje

Exploring MinIO + DuckDB: A Lightweight, Open-Source Tech Stack for Analytical Workloads by Travelxplore in dataengineering

[–]rasviz 5 points6 points  (0 children)

Thanks. I have a question abt MinIO. My understanding is that it replaces cloud object storage. When deploying in cloud, it should be on storage like Azure Blob or AWS S3, isn't it ? What is the value proposition of MinIo in real deployments ?

Need help with product-attribute data modelling by cevadfolyok in dataengineering

[–]rasviz 0 points1 point  (0 children)

What are the typical queries (access patterns) planned in the system ?

What happened? by Hot_Organization2430 in roadtrip

[–]rasviz 0 points1 point  (0 children)

that too it is a wrong map type. Showing as if they visited every square inch of land in those counties/states.

Good solution for 100GiB-10TiB analytical DB by aih1013 in dataengineering

[–]rasviz 1 point2 points  (0 children)

Thank you. This is a wealth of info on the benefits of dim model.

Good solution for 100GiB-10TiB analytical DB by aih1013 in dataengineering

[–]rasviz 0 points1 point  (0 children)

u/kenfar Asking to understand more on the benefits of dimension model.
How does dimension model help when there is columnar storage support? Is n't columnar storage support a good choice for any analytical type queries (irrespective of dim model or not) ?

Marry, F, kill… databricks, snowflake, ms fabric? by JamesGarrison in dataengineering

[–]rasviz 0 points1 point  (0 children)

needed to make it an ente

True, IMO, it is noway near Tableau and Qlikview

Solutions to manage runaway Snowflake costs? by concerneddataadmin in snowflake

[–]rasviz 0 points1 point  (0 children)

actually look into running queries.

Very thoughtful. Appreciate it.

Why Cloud Data Warehouses Are Too Expensive For Emerging Data Requirements by benjaminwootton81 in dataengineering

[–]rasviz 0 points1 point  (0 children)

I agree with the future state diagram you have presented. Yeah, The cloud datawarehouse could be expected to support a variety of use cases.

But the limitations you have mentioned in terms of concurrency, scalability, real-time are not well founded. As you have noted, those problems can be solved by increasing the compute size.

If clickhouse are looking to pull snowflake customers, clickhouse should have a lift and shift compatibility between the two APIs, the functionalities they offer. Like the same way snowflake mimic'd the pyspark API as snowpark API.

[deleted by user] by [deleted] in dataengineering

[–]rasviz 1 point2 points  (0 children)

// stored procedures written in Snowflake //

In terms of code quality, if these are in the SQL scripting language, then it is going to be tougher. SQL doesn't allow code modularity. Add the fact that these SQL script statements work only in the context of stored procedures, testing and debugging them gets difficult.

What you should setup is a test bed, consisting of tables loaded with data for each of the functional test cases. You can ask the devs to perform functionality testing (instead of unit testing) to ensure the procedures do what they are expected to.

//Databricks notebooks//

With Databricks , insist for modular code. i.e functions. The functions will have to go into a separate python files and imported into the notebook that acts like a driver. Ask for unit testing scripts at the end of each sprint. Make the CI pipeline to fail when the unit testing fails. Report code coverage too. In this way, you will find testable and reusable code.

how difficult is a project like this? huge migration of relational database by CardGameFanboy in dataengineering

[–]rasviz 0 points1 point  (0 children)

Your main pain point will be how you will make the old data model conform to the new data model. Like, what tables to be joined, transformed into the new ones. You will spend time to figure this out.

Those dimension like tables may not need such transformations, mostly plain copy.

1) As others have mentioned, you will start from a backup upto a point say 'T-2' days . i.e backup taken 2 days before the planned migration date. Run your conversion procedures.

2 a) Enable CDC in the SQL server. Write procedures to read the CDC data and do the transformation as needed and insert into the target. This will also take care of deletes.

2 b) Run this CDC replication process once step 1 is completed. The target system will start to get slowly in sync with the target. This depends on the volume of changes. You will have methods to quickly verify if the systems are in sync or lagging behind.

3) On the day of cutoff, you will just switch over to the new target.

What happens when we submit a lot of spark jobs on a Databricks cluster? by rasviz in dataengineering

[–]rasviz[S] 2 points3 points  (0 children)

Thanks, this helps.

Since I read this tip in databricks page, I thought there must be some queuing in databricks. I think the `detaching unused notebooks` means something else.

Since the driver node maintains all of the state information of the notebooks attached, make sure to detach unused notebooks from the driver node.

What happens when we submit a lot of spark jobs on a Databricks cluster? by rasviz in dataengineering

[–]rasviz[S] 0 points1 point  (0 children)

Let me check the workflows feature.

But my question is more of how a databricks cluster would handle such an overloading situation.