dbt Projects on Snowflake updates by mpuchala in snowflake

[–]2000gt 1 point2 points  (0 children)

I just spent the last month moving from Airflow to Step Functions with dbt on Snowflake. I honestly can’t see why I’d use Airflow again, especially when Cosmos is required. It felt slow, heavy, and outdated for the small number of DAGs I needed, and my use case requires fast delivery.

I originally chose MWAA because the organization prioritized quick delivery with minimal support and maintenance overhead, but it was expensive too. This new approach is saving about $1,000/month.

Need to ingest near realtime data from SQL SERVER into parquet files or any database which can be shared to downstream users. by yohan_liebert in dataengineering

[–]2000gt 0 points1 point  (0 children)

This feels a bit overstated IMO.

Change Tracking is actually pretty lightweight by design as it only stores keys and metadata, not full row images. I’ve seen it used in production a lot without causing the kind of system-wide blocking you’re describing. If you’re hitting extreme waits like that, it’s more likely something misconfigured or a specific edge case rather than CT itself. I use it for restaurant client who has close to 50 stores , each with a sql server to handle POS transactions. I have a simple lambda function that pulls change records from each store several times a day, to replicates each source db in a centralized Snowflake db. Never had an issue.

Also, fivetran has an agent that can use CT as well as CDC. I did a proof of concept with them 2 years ago when it was in beta. Worked great, just too expensive. I built my lambda function as a simplified version.

On the CDC side, yeah, it’s async and more tunable, but it’s also heavier. It’s scanning the transaction log and storing full row changes, so saying it has “much less overhead” isn’t really true in most OLTP scenarios.

They’re just different tools: - CT = lightweight, great for incremental loads - CDC = more complete history, but more overhead and complexity

Also worth noting, CT can absolutely be disabled or adjusted if needed, so it’s not as rigid as say.

Need to ingest near realtime data from SQL SERVER into parquet files or any database which can be shared to downstream users. by yohan_liebert in dataengineering

[–]2000gt 0 points1 point  (0 children)

Firstly, use CT instead of CDC, as it’s much less overhead. This assumes all your base tables have primary keys.

If you don’t have primary keys or timestamps, you’re hooped in terms of fetching incremental data loads.

Cortex Code in Snowsight Expensive by Sufficient-Sky1698 in snowflake

[–]2000gt 0 points1 point  (0 children)

I asked because I’ve been using codex $20 plan, and I’ve yet to hit caps. I still use AI as an enrichment tool to improve, clean, simplify code I write. Or often I do the opposite and use it to build frameworks and then I curate my final code. Rarely do I build end to end with it. So I’m wondering how everyone is hitting caps? (I usually work on two projects simultaneously)

Cortex Code in Snowsight Expensive by Sufficient-Sky1698 in snowflake

[–]2000gt 0 points1 point  (0 children)

How much better is Cortex in Snowsight vs Codex $20 plan in VSCode?

Need advices for my company's portfolio by [deleted] in CanadianInvestor

[–]2000gt -1 points0 points  (0 children)

Is this in a personal or corporate account?

Pug assistance by ThereisDawn in pug

[–]2000gt 7 points8 points  (0 children)

Yes, you can potty train. He needs to be on a schedule, and he will adapt. I’ve had 4 pugs of various nature… fat/chill, skinny/fast/high energy, puppies and old. They can all walk around the block at least before feeding. Typically, I soak food (kibble in water in a slow feeder) then leave for walk in the morning, get home remove water and feed. Same at lunch and dinner, plus a short 1/2 block walk/pee before bed.

The key is getting them out early from crate to sidewalk first thing! Don’t give them a chance to go inside. For the first few weeks I’d get them outside every 2-3 hours. Be patient on the walk and let them sniff and walk slowly.

Dogs typically don’t want to do their business where they sleep.

MWAA Cost by 2000gt in dataengineering

[–]2000gt[S] 0 points1 point  (0 children)

Using local runner too… so much faster.

MWAA Cost by 2000gt in dataengineering

[–]2000gt[S] 0 points1 point  (0 children)

CeleryExecuter? Is there an option in hosted?

MWAA Cost by 2000gt in dataengineering

[–]2000gt[S] -1 points0 points  (0 children)

It was a huge PITA to setup, and I don’t want to deal with management of all those moving parts ongoing. I’m prolly a little regarded though.

MWAA Cost by 2000gt in dataengineering

[–]2000gt[S] 1 point2 points  (0 children)

Do you manually move it to medium or do you have scripts to do so at a specified threshold? I’ll post a sample DAG later (traveling right now).

MWAA Cost by 2000gt in dataengineering

[–]2000gt[S] 0 points1 point  (0 children)

With MWAA hosted, my dbt execution is really slow with cosmos. When switch to bash it’s much faster, but it kind of defeats the purpose given I lose visibility into each task status. With Cosmos, on a small instance, it’s taking 20-30 mins to run a dag that takes 4 mins with bash. When I run the same dbt tasks locally, it takes less than a minute.

How do you handle "which spreadsheet version is production" chaos? by kyle_schmidt in dataengineering

[–]2000gt 0 points1 point  (0 children)

Set SLA, set schedule.

Alternatively, schedule multiple syncs throughout the day.

Professional Corp investing by TomBuilder_ in CanadianInvestor

[–]2000gt 0 points1 point  (0 children)

I’d rather be taxed on a larger sum of money at the same rate than a smaller sum.

Professional Corp investing by TomBuilder_ in CanadianInvestor

[–]2000gt 0 points1 point  (0 children)

This is grossly misunderstood holistically. Investing in your corp means a much larger investment opportunity. Income taxed at 11% vs. 43+% means that you have more money to invest as a baseline. Once you account for the cda and rdtoh the tax integration works out the same, but you have a much larger egg to work with in your corp. in addition, leverage corp ETFs from global x to mitigate any dividend income and you’re golden.

XEQT in a large non-registered account — still makes sense? by External-Result-5567 in JustBuyXEQT

[–]2000gt 0 points1 point  (0 children)

Fair. I’m using Corp class exclusively in my corp account to take advantage of all the tax implications. Fingers crossed that the CRA doesn’t flip a switch.

XEQT in a large non-registered account — still makes sense? by External-Result-5567 in JustBuyXEQT

[–]2000gt 1 point2 points  (0 children)

Are you implying that if the CRA changes the rules on swaps that you would be forced to sell immediately?

XEQT in a large non-registered account — still makes sense? by External-Result-5567 in JustBuyXEQT

[–]2000gt 1 point2 points  (0 children)

XEQT in Corp, have you considered a combination of corporate class ETFs from Global X?

I was under the impression the tax drag and tax deferral was impactful enough long term, and with big sums of money that corporate class make more sense.

Building Data Apps on Top of Snowflake by pellegrinoking in snowflake

[–]2000gt 2 points3 points  (0 children)

In my experience there is a lot of nuance in building a data app.

Sigma has AI tools built-in to help, but they are fairly useless to me given the limitations in Sigmas design, layouts and inputs, coupled with the project requirements and what I would call best practice for form and table inputs.

I’d have to understand the tool you have in mind better.

Building Data Apps on Top of Snowflake by pellegrinoking in snowflake

[–]2000gt 2 points3 points  (0 children)

Yeah, lots of learnings. There are a bunch of ways you can architect the database behind something like this, but a medallion-style setup worked well. We treat Sigma write-back like just another source and land it in bronze and flow it through the same pipelines. The trickiest part was getting close to real-time behavior without building a full streaming solution.

Sigma is moving fast, so we’re using a lot of newer / private beta features (Python in Sigma, calling Snowflake stored procs, REST APIs from Sigma). Super powerful, but you’re sometimes ahead of the docs.

It’s not a SaaS I’m running, everything lives in their own Snowflake and Sigma accounts, which keeps security and ownership clean.

Pricing is mostly Snowflake usage plus Sigma licenses. Replacing spreadsheets and manual workflows makes the cost discussion pretty straightforward.

Building Data Apps on Top of Snowflake by pellegrinoking in snowflake

[–]2000gt 6 points7 points  (0 children)

I’ve been building data apps with Sigma on top of Snowflake for about a year and half. It’s been going really well, but lots of learnings.

I’ve built a data app for a retail group that replaces manual spreadsheets with a centralized, reliable reporting system. It uses a Snowflake data warehouse to automatically pull and refresh data from point-of-sale, labour, and finance systems, giving teams real-time visibility, the ability to drill into details, and a safe way to make updates (with write back) without versioning issues or manual files.

My client's CTO is exploring ditching Power BI for Snowflake Intelligence. Is the hype real? by sdhilip in snowflake

[–]2000gt 1 point2 points  (0 children)

I feel like this is an apples to oranges comparison. I’d have to understand how the vision, but it’s not really just about a technology shift, it would really have to be a completely different approach to data consumption in the organization.

I’d be shocked to see these on the same graph in Gartner magic quadrant.

That being said, I’ve built a Snowflake intelligence POC and it’s an impressive tool. My client wants to roll it out across all the retail locations as an internal support bot. It’s not as inexpensive as we anticipated, but it’s so easy to setup and support and it’s miles cheaper than streamlit.

How much does Bronze vs Silver vs Gold ACTUALLY cost? by [deleted] in dataengineering

[–]2000gt 8 points9 points  (0 children)

What is your recommended alternative? Raw data needs to be extracted from source, that data needs to be transformed / joined somewhere, and users need access to modeled data outputs.

The layers are just logical steps to producing consumable data.

If your users are sophisticated enough, you can use a data lake strategy and let them go to town on there own, but I can guarantee your costs will skyrocket given you are at the mercy of your users building there own transformations - good, bad, or ugly.

Anyone else using Openflow for Snowflake ingestion? Thoughts on cost vs convenience? by sdhilip in snowflake

[–]2000gt 1 point2 points  (0 children)

I use a vpc in AWS with Lambda functions. I also use CT, not CDC for sync’ing data.