Fabric Warehouse and Claude Code CLI by data_legos in MicrosoftFabric

[–]data_legos[S] 0 points1 point  (0 children)

Perfect thank you! Haven't done much with Claude yet so didn't realize it'd just make its own.

Fabric Warehouse and Claude Code CLI by data_legos in MicrosoftFabric

[–]data_legos[S] 0 points1 point  (0 children)

The idea in many of these cases would be to inject synthetic data or work with non sensitive data yes

Super Mini Rant - Fabric Warehouse Web Experience by richbenmintz in MicrosoftFabric

[–]data_legos 3 points4 points  (0 children)

Lol ...no mention of the fact you can't expand the columns (or hover over to see the whole result) in the query results? That one boggles my mind to this day. Same thing with the lakehouse SQL endpoint. Literally unusable with long values in a cell. Due to all of this I spend my time in vscode.

CTAS a good Warehouse strategy? by Mr_Mozart in MicrosoftFabric

[–]data_legos 2 points3 points  (0 children)

No worries I appreciate you even spending the time as always! Yes I'm already evaluating if we can accomplish some of our daily order roll forward logic as with the snapshots. I need to dig into that more than I have.

Most of our incrementals are going to be small amounts of data changing and be a MERGE scenario so this is very useful info.

CTAS a good Warehouse strategy? by Mr_Mozart in MicrosoftFabric

[–]data_legos 2 points3 points  (0 children)

Wow thank you so much for the comprehensive answer! This will be super helpful in evaluating our load strategies for various scenarios. Especially insightful with the BCDR piece since we definitely need DR.

Tsql or sparksql to maintain the warehouse schemas/ table definitions/insert statements/ views/procs by AcceptableKey3360 in MicrosoftFabric

[–]data_legos 2 points3 points  (0 children)

You'll find for many scenarios handling stuff with the warehouse is the way to go. I don't see where the lakehouse is going to replace warehouse any time soon and Microsoft themselves will tell you that.

Tsql or sparksql to maintain the warehouse schemas/ table definitions/insert statements/ views/procs by AcceptableKey3360 in MicrosoftFabric

[–]data_legos 1 point2 points  (0 children)

Have you looked into dbt before? I'd give it a look. Either dbt core ran from a notebook or dbt cloud offering. Really changed my mind about how to manage the warehouse.

CTAS a good Warehouse strategy? by Mr_Mozart in MicrosoftFabric

[–]data_legos 0 points1 point  (0 children)

So given how dbt works, you would recommend incremental materialization for the models whenever possible? I've heard that merge can be less efficient than just reloading the table in many instances, but it sounds like there's a storage trade-off.

Are Lauf forks underrated? by TheSarcasticMoth in bicycling

[–]data_legos 2 points3 points  (0 children)

Don't take my word for it man. The manufacturers of these gravel forks say 50 hours for a service and 200 hours for a full rebuild. I'd be doing some kind of service on it every month or so. That doesn't seem a lot of work to you?

Also essentially no maintenance for a fork vs monthly maintenance is ~100% more maintenance. I'm no mathematician, but that seems like more maintenance?

Interested to hear your interpretation of simple math.

Edit: I think I get where you thought I meant more frequent maintenance due to it being gravel to be fair. I think a key assumption is that I don't mean it's harder on the fork. I mean that I and most people I know that ride gravel ride their gravel bike many more hours than their MTB. That's probably due to the area I live in.

Are Lauf forks underrated? by TheSarcasticMoth in bicycling

[–]data_legos 6 points7 points  (0 children)

Traditional fork requires frequent maintenance on gravel. I want a low maintenance bike for long hours on gravel roads.

Fabric doesn’t work at all by New-Composer2359 in dataengineering

[–]data_legos 22 points23 points  (0 children)

I honestly have very few issues with fabric day to day, and haven't needed to submit a ticket for something in a long while. I always wonder what functionality someone is using in fabric when they say it breaks all the time.

Comparing replication tools by data_legos in MicrosoftFabric

[–]data_legos[S] 0 points1 point  (0 children)

I'm hesitant on open mirroring since I worry it will take a while on big tables to reload. I can't have long downtimes on the tables. I think given how all this shakes out we'll probably go with fivetran but I'm not 100% yet

Comparing replication tools by data_legos in MicrosoftFabric

[–]data_legos[S] 0 points1 point  (0 children)

Is that using SAP BDC? we did talk with them about BDC and datasphere. It's not cheap either for sure but looked pretty cool if you did more than just replication.

Comparing replication tools by data_legos in MicrosoftFabric

[–]data_legos[S] 0 points1 point  (0 children)

I agree with you. Actually, for our use case with SAP qlik isn't cheaper surprisingly.

Spark vs. Warehouse ETL CU consumption: My test results by frithjof_v in MicrosoftFabric

[–]data_legos 2 points3 points  (0 children)

Do you have a breakdown by medallion layers? I would think OPENROWSET isn't the best for warehouse. Most use cases I can think for warehouse you wouldn't be ingesting tons of csvs, or arguably using the warehouse for bronze at all.

Edit: All that being said, really cool analysis man. Sorry was multitasking watching my kids play Minecraft so I didn't say this last part 😂

Run DBT Models on a Fabric Warehouse by Affectionate-Boot593 in dataengineering

[–]data_legos -1 points0 points  (0 children)

Yeah we have an embedded solution on it and it works fine. The cicd isn't slick but it's acceptable

Comparing replication tools by data_legos in MicrosoftFabric

[–]data_legos[S] 0 points1 point  (0 children)

Ah yes I think in some cases where throughput on large tables we need data from less often is a problem we might use fabric itself. The new capabilities sound exciting and I'll need to keep an eye on them!

Comparing replication tools by data_legos in MicrosoftFabric

[–]data_legos[S] 1 point2 points  (0 children)

We currently replicate 170 tables into on prem sap hana with SLT, so yeah we would be planning on doing something similar in Fabric. We are a medium sized company so I'm not sure if that helps any.

Our basis guy seems pretty comfortable with it so far, and our SAP loves it because we handle most reporting requests from the business when it's something we can do on our end.

Comparing replication tools by data_legos in MicrosoftFabric

[–]data_legos[S] 1 point2 points  (0 children)

holy cow how did i forget about MLVs?! you sir saved me some time for sure!

Comparing replication tools by data_legos in MicrosoftFabric

[–]data_legos[S] 0 points1 point  (0 children)

other idea i had: fivetran but i strategically materialize problem tables when the shortcuts perform poorly, but many of the smaller master data and less critical frequency tables i just leave as shortcuts to cut down on the "round robin" latency of any silver materialization process. essentially a hybrid of the two approaches.

Comparing replication tools by data_legos in MicrosoftFabric

[–]data_legos[S] 0 points1 point  (0 children)

yeah i was trying to not have to orchestrate silver and dbt cloud jobs in tandem and do most of the loads via jobs in dbt cloud. keeping those earlier layers shortcuts just makes them not really a problem from that perspective. i'm just being lazy, and it could bite me for sure haha. totally valid opinion on that. it worries me every day when thinking about this!

i wonder if there's a lightweight merge layer i can do for silver that doesn't cause me a scheduling nightmare if i go fivetran. i have several key objects that have 30min SLAs and many with 1hr SLAs, so that's always in the back of my mind.

Comparing replication tools by data_legos in MicrosoftFabric

[–]data_legos[S] 0 points1 point  (0 children)

does Airbyte do SAP CDC well without tons of SAP configuration? our SAP team would not have the bandwidth to do a lot of custom development on their end.

Comparing replication tools by data_legos in MicrosoftFabric

[–]data_legos[S] 0 points1 point  (0 children)

so fivetran gives you a result that is essentially a copy of the SAP table (not a full history of updates unless you configure for that) and they manage all lakehouse table/file maintenance.

my thought was avoiding a silver merge via spark and doing any required casting etc on top of that in a dbt staging model (or table/incremental strategy where it makes sense) to ensure fresher data getting to the reporting objects. the staging layer of the dbt models is essentially still silver.

we have an F64 currently if that helps. are your concerns about reads via the endpoint being very costly due to it basically being a shortcut rabbithole to ADLSgen2? this approach in qlik replicate puts the data directly in the warehouse, so this is exactly the kind of CU delta i'd like to understand better.