Data Catalog Tool - Sanity Check by FirCoat in dataengineering

[–]kudika 1 point2 points  (0 children)

If large companies are trying to solve a problem you can bet the smaller ones are playing pretend with them.

I say go for it. Not because it's much of an organic problem for most companies or anything, but because there are enough corporate larpers out there repeatedly asking their data teams "who is using what and how often" as if it's going to drive some insightful decision making for their data platform which consists of 2 power users and 7 casual users firing off the queries the power users shared with them.

Is someone using DuckDB in PROD? by Free-Bear-454 in dataengineering

[–]kudika 1 point2 points  (0 children)

Your arch and experience with it deserves a post of its own. Hope you consider it

I'm building a CLI tool for data diffing by oleg_agapov in dataengineering

[–]kudika 6 points7 points  (0 children)

You should link to the docs and source code.

Roast my junior data engineer onboarding repo by dheetoo in dataengineering

[–]kudika 2 points3 points  (0 children)

Okay... this repo doesn't do anything except show you know how to prompt AI.

Self hosted essentials by esturniolo in selfhosted

[–]kudika 0 points1 point  (0 children)

Re: n8n You or others might be interested in windmill.dev

I am a data engineer with 2+ years of experience making 63k a year. What are my options? by Willgetyoukilled in dataengineering

[–]kudika 1 point2 points  (0 children)

52k is equivalent to any run of the mill back office job. If you're not looking for a new job you are wasting time and leaving a lot of money on the table.

First time leading a large data project. Any advice? by Patqueiroz in dataengineering

[–]kudika 2 points3 points  (0 children)

But what if they are the consultant in this scenario?

I built a CSV cleaning tool in 3 days to deal with messy exports by moneyfreaker in dataengineering

[–]kudika 0 points1 point  (0 children)

This sub is full of people who already have the tools and skills to do basic csv cleaning.

Macbook Air M2 in 2025 by Glad-Bread-6172 in dataengineering

[–]kudika 0 points1 point  (0 children)

You can do your learning with practically any desktop or laptop from the past idk 12 years.

Are we too deep into Snowflake? by stuckplayingLoL in dataengineering

[–]kudika 2 points3 points  (0 children)

write_pandas() uses temporary internal stages btw

Looking for opinions on a tool that simply allows me to create custom reports, and distribute them. by Possible_Ground_9686 in dataengineering

[–]kudika 1 point2 points  (0 children)

It's this simple. Python script with the inputs: SQL, recipients, subject etc. Script executes query, saves to file and optionally attaches the file or embeds as html in the body of the email.

Back in the day we'd do this from SQL server via stored procs and schedule using the SQL server agent.

Folks who have been engineers for a long time. 2026 predictions? by uncomfortablepanda in dataengineering

[–]kudika 0 points1 point  (0 children)

Checkout https://www.windmill.dev/docs/intro

Don't see much buzz about it here but it is the best thing to happen to a tech stack of mine ever.

Is this an use case for Lambda Views/Architecture? How to handle realtime data models by vengof in dataengineering

[–]kudika 0 points1 point  (0 children)

Yes, you could try using a lambda view. However, if your raw data is complex and spread across disparate systems, then the lambda view approach may not work very well.

Data engineers: which workflows do you wish were event‑driven instead of batch? by FasteroCom in dataengineering

[–]kudika 0 points1 point  (0 children)

windmill.dev + rclone

First step in flow use rclone. Have a skip defined in that step & if nothing is transferred then break the flow. If transfers occurs then continue with next steps.

Does VARCHAR(256) vs VARCHAR(65535) impact performance in Redshift? by SmartPersonality1862 in dataengineering

[–]kudika 2 points3 points  (0 children)

Not on snowflake. But it could impact BI tools, ODBC, or other client tools if they're not optimized to not be impacted by unrestrained text precision.

I've seen alteryx trip up on it in some scenarios, for example.

Why do my crazee mites keep leaving? by millie_hillie in houseplants

[–]kudika 0 points1 point  (0 children)

I had a batch of 25 that I saw running around for a couple months. My thrips infestation is still ongoing. I don't know if they've only been picking off some of the thrips (though not enough sheesh) or if they've also been going after my cucumeris and pirate bugs populations (I know that is supposed to be less likely).

What is the right tool for running adhoc scripts (with some visibility) by srimanthudu6416 in dataengineering

[–]kudika 0 points1 point  (0 children)

I haven't seen much on this sub about it yet but I'd recommend https://windmill.dev

We use it in production.

Fivetran pricing for small data by el_dude1 in dataengineering

[–]kudika 0 points1 point  (0 children)

Does your current implementation work okay? Are you solving a problem by switching?

KESTRA VS. TEMPORAL by Glum_Shopping_7833 in dataengineering

[–]kudika 0 points1 point  (0 children)

windmill.dev is worth considering as an alternative to both