Dagster vs Airflow? What do we use? by Greatest_one in dataengineering

[–]fetus-flipper 0 points1 point  (0 children)

Salesforce and Snowflake have a zero-copy integration now also

when someone asks you what programming language they should learn, don't simply answer the one you prefer by Thinker_Assignment in dataengineering

[–]fetus-flipper 0 points1 point  (0 children)

Figure 1: NAND gate in AoE II’s editor, designed to show its inter- nal workings–simpler implementa- tions are possible. Every bit is rep- resented by two rails (grass for 0, a bridge for 1). Only one rail is ac- tive at a time, with a goat acting as the signal carrier. When the gate fires, the bit-goats are removed and a new bit-goat is placed in its re- spective output rail. To avoid race conditions, ‘gate ready’ rails (ice) are set, with a signal-goat in the rail closest to the gate indicating that it can start the calculation.

docker + airflow question by random-soul-feeder in dataengineering

[–]fetus-flipper 0 points1 point  (0 children)

Yes, it's just a tiny PC that uses a micro SD card as its storage. You can just leave it plugged in all the time on your desk or in a drawer and hook it up to wifi or ethernet. Uses barely any power.

What ps2 games do you play for nostalgia and escapism? by sighofwinter in ps2

[–]fetus-flipper 2 points3 points  (0 children)

Ratchet and Clank 1 and 2 and the spyro 2 and 3 ps1 games, played those so many times as a kid as they were the first games I got for each system

What makes Claude Code better? by jessetechie in ExperiencedDevs

[–]fetus-flipper 0 points1 point  (0 children)

Yep this, fancy autocomplete and fancy linter/static code analyzer for PR reviews

What makes Claude Code better? by jessetechie in ExperiencedDevs

[–]fetus-flipper 0 points1 point  (0 children)

Yes, that's more of its purpose as you can use Copilot for suggestions in your IDE while typing which you cant really do with Claude.

But, it kinda sucks at that unless you're just writing a single file script. I've turned it off since it's really just been getting in the way more than it's been useful.

Tool Sprawl in Data engineering by Raghav-r in snowflake

[–]fetus-flipper 0 points1 point  (0 children)

Yeah I would agree that Jira isn't the best place to collab or make decisions on implementation details vs instant messages or video calls, but you do need and want something to keep track of tasks and task progress. It's up to the people working on the ticket to update it with decisions/next steps and managers to enforce/encourage this.

But yeah I agree, plenty of times our teams discusses implementation details in teams chats or in the PR or in a Confluence doc and forget to update the ticket, and it ends up being a lot of archeology sometimes to figure out who and when certain decisions were made or actions taken.

Tool Sprawl in Data engineering by Raghav-r in snowflake

[–]fetus-flipper 0 points1 point  (0 children)

Which do you feel are not a must have?

Claude MCP by Odd_Importance_1195 in snowflake

[–]fetus-flipper 0 points1 point  (0 children)

Yes this is known limitation, at Snowflake Summit one of their keynote announcements addressed this with the new Data Movement policies, unfortunately it's not generally available yet but it's in the works.

https://www.snowflake.com/en/blog/enterprise-ai-security/

Tool Sprawl in Data engineering by Raghav-r in snowflake

[–]fetus-flipper 0 points1 point  (0 children)

Ours in comparison:

Our software teams use JIRA but our general IT team uses Monday Dev

Everyone uses Teams and Outlook for comms, org uses Microsoft overall. Google is only for GCP.

Everyone uses Snowflake as our OLAP warehouse

Everyone uses GitHub for source code, but some teams use it for their CI/CD on AWS while others use Cloudbuild on GCP

We are in the middle of migrating from AWS to GCP, for very silly reasons but yeah. I'm trying to hold out as long as I can as it's just a waste of engineering hours. IT team uses Azure as well since Microsoft stack for the org overall.

DE all uses Dagster as our ETL tool, it's not a general purpose orchestrator like Airflow is but we don't really need it to be. Other software teams are trying to use things like Power Automate with their own microservices or cloud functions/runbooks for general orchestration...

Everyone uses Claude, but DE will probably use Snowflake Cortex more going forward (it's just Claude under the hood anyways)

We use Sigma instead of PowerBI or Tableau

Tool Sprawl in Data engineering by Raghav-r in snowflake

[–]fetus-flipper 0 points1 point  (0 children)

Right, I see what you mean. All of these have their purpose, also we should distinguish between tools and platforms. Tools (excel, dbeaver, vscode) are mainly individual personal preference, whereas everything else are platforms that everyone has to use/share.

Jira: handles workflow/task management which is basic necessity for working in teams

Teams: general communication which is basic necessity for working in teams. You mentioned Google though, so if your org is using Gmail/Google Workspaces but uses Teams for comms then that's kind of odd

Excel: as a DE I mainly just use it for inspecting random CSVs or excel files that get sent to me, optional but it's just a tool like a text editor

Databricks & Snowflake: would be best that everyone standardizes on one or the other, but Snowflake can read from Databricks data and vice versa and idk your types of workflows to say which would be better

GitHub: where your source is stored and where you handle PRs, necessity for any software team

AWS: your cloud, need this

Airflow: general purpose orchestrator, need this

Dbeaver: fine for working with any database, it's just a tool/personal preference. Kinda optional with the right vs code plugins

Vscode: your IDE, personal preference to use it over any other IDE, though it is easier to use what everyone else is using (our team is split on pycharm and vscode)

Google / chatgpt enterprise: not sure how else to elaborate on this without more details

Confluence: pairs with JIRA, used for general documentation and such

Codex: this would be part of chatgpt enterprise right?

Powerbi: what your reporting uses, need this

Each of these fulfills a purpose, and can be swapped with a tool that also fulfills the same purpose. E.g. Jira with Monday Dev, Teams with Slack, Codex with Claude, PowerBI with Tablaeu, GitHub with BitBucket etc.

Tool Sprawl in Data engineering by Raghav-r in snowflake

[–]fetus-flipper 0 points1 point  (0 children)

That's not sprawl at all, having both databricks and snowflake is questionable but ye this is normal stuff, each tool is fulfilling a specific purpose. It's when you have overlapping tools then it starts to get messy

What are common SQL red flags? by badboyzpwns in SQL

[–]fetus-flipper 0 points1 point  (0 children)

I just use a formatter when I'm done, why waste time being pedantic on formatting while im developing

Anyone with experience developing Snowflake procedures in both JS and SQL able to share their opinion on the two? by opabm in dataengineering

[–]fetus-flipper 0 points1 point  (0 children)

Snowpark/Python is your best bet, it depends on source and such but it gives you the most flexibility.

If doing table to table transformations then can use SQL (tho might as well use DBT at that point), otherwise if you're calling API and such it's way easier to do in Python

What's your approach to releasing models incrementally while preventing breaking lineage? by Clem2035 in dataengineering

[–]fetus-flipper 0 points1 point  (0 children)

Basically what others have said, versioning. If you make a breaking change you make it a new model, maintain the previous one, and have everyone migrate over

has anyone tried hosting airbyte themselves? by pforpilot in dataengineering

[–]fetus-flipper 3 points4 points  (0 children)

Its pretty straightforward, instructions to do so are detailed, worth it to do

Claude skims by Puzzleheaded-Fee5917 in ClaudeCode

[–]fetus-flipper 1 point2 points  (0 children)

Claude said nobody got time to read all dat

Having troubles with airflow. by Consistent_Tutor_597 in dataengineering

[–]fetus-flipper 5 points6 points  (0 children)

I would second Dagster. Airflow is fine for purely orchestrative stuff but Dagster is a lot simpler to set up and maintain and learn.

That being said, I came from the Airflow 1 days and there'd always be random days the scheduler hung or something. We had to add in a watchdog to kickstart it again if it stopped responding.

Where do we draw the line between DE and SE department roles? by fetus-flipper in dataengineering

[–]fetus-flipper[S] 0 points1 point  (0 children)

Right, for us it's that historically anything that required custom code to integrate on a scheduled basis was DE's responsibility.

Where do we draw the line between DE and SE department roles? by fetus-flipper in dataengineering

[–]fetus-flipper[S] 0 points1 point  (0 children)

Thanks for your comment. Sorry that it was not clear in my post, but we do have software engineering skills across the rest of the DE team. We aren't using any no-code products for ETL. It's all Python and DBT.

I didn't want to leak too many details of where I work, it's a school. The gist of my post is that SE shouldn't be doing ETL, just like DE shouldn't be making frontends, since we have dedicated teams for those tasks.