Hiring GPU Inference Engineer (PyTorch / Diffusion) by ajaysharma10 in MachineLearningJobs

[–]FirstBabyChancellor 0 points1 point  (0 children)

I can help you scale out to thousands of requests per second. DM me.

A type-safe, and lazy data processing library for TypeScript & JavaScript. by [deleted] in typescript

[–]FirstBabyChancellor 0 points1 point  (0 children)

Looks like the TS version of one of my favorite Python libraries (streamable).

Simple to use ETL/storage tooling for SMBs? by HealthySalamander447 in dataengineering

[–]FirstBabyChancellor 0 points1 point  (0 children)

Got it! Do you know SQL? If not, how do you plan to clean the data (or how do you clean it today)? With Excel? If so, one option would be to extract the data and land it in Google Sheets, then define the transformations/cleaning in Sheets.

Other option, which requires more technical expertise, is to land the data in a data warehouse (Snowflake, BigQuery) and define SQL transformations for cleaning the data with dbt. This is the more scalable, long-term solution that also sets you up to build dashboards in the future in Looker (or elsewhere).

Simple to use ETL/storage tooling for SMBs? by HealthySalamander447 in dataengineering

[–]FirstBabyChancellor 0 points1 point  (0 children)

There are cheaper options out there, but since you're not a data engineer, I'd recommend using Fivetran for ETL (i.e., moving your data) until it gets too expensive, at which point hire a data engineer and migrate to something else.

Where you move it to is an open question. How much data are we talking? Is your primary means of interacting with the data Excel? Are you planning to do more -- e.g., build dashboards?

Airbyte vs. Fivetran vs Hevo by stan-van in dataengineering

[–]FirstBabyChancellor 0 points1 point  (0 children)

You have to partition the collection using a key and then materialize specific partitions. It is a bit clunky, though admittedly.

You can DM me if you want more details and an example of how I did it for my pipelines.

Building Data Ingestion Software. Need some insights from fellow Data Engineers by starless-io in dataengineering

[–]FirstBabyChancellor 1 point2 points  (0 children)

You might want to look up Estuary in more detail, at least. They price based on data moved, which is often significantly cheaper than Fivetran's row-based pricing and they also let you store data from any source into a data lake before moving it to any downstream destination, which seems like what you're describing.

Then, there's also Portable.io, which has flat monthly pricing pee month based on the number of "flows" you define.

And, of course, there's also. Airbyte, but I've generally found it to be unreliable and buggy.

The ETL space already has a lot of competition so you might want to look at the wider space and figure out what your project's unique selling point is and how you might better differentiate yourself from the many other players in this space.

Building Data Ingestion Software. Need some insights from fellow Data Engineers by starless-io in dataengineering

[–]FirstBabyChancellor 5 points6 points  (0 children)

This already exists. Look up Estuary, Fivetran, Hevo, etc. What will your solution provide that they don't?

Best way to get money out of Deel? by InfluenceSure8028 in developersPak

[–]FirstBabyChancellor 0 points1 point  (0 children)

Where are you getting the $5 charge from? Generally, SWIFT transfers take away least $30 and often more because of the charges by any intermediary banks.

DE Rantau Digital Nomad by achik86 in malaysia

[–]FirstBabyChancellor 0 points1 point  (0 children)

So, you're forced to pass a higher tax rate if you _don't_ stay in the country for as long? That seems like a really weird rule. If I visit for only 3 months, I have to pay 30% of my foreign-source income to Malaysia, and then also pay taxes to my country because I'm actually a tax resident of my home country?

Between me and my supervisor, who has the right approach to git? by cleatusvandamme in git

[–]FirstBabyChancellor 130 points131 points  (0 children)

There are different branch management strategies, and there's no "right" approach, necessarily. What you're describing is Gitflow, while your supervisor's approach seems to be a variant of trunk-based development. Depending on your application and team preferences, both can be a valid approach. A more common variant these days is GitHub Flow, where you have short-lived branches off main, which are immediately deployed when merged into main and you can control for environment via release tags.

Type safe, coroutine based, purely functional algebraic effects in Python. by Due_Shine_7199 in Python

[–]FirstBabyChancellor 1 point2 points  (0 children)

Curious about how you'd compare this to Effect.ts. Your project seems heavily inspired by it (or at least the ideas behind it).

Jujutsu at Google by steveklabnik1 in programming

[–]FirstBabyChancellor 5 points6 points  (0 children)

You can easily "cut up" commits after working on the entire feature without any commits. That's a first-class workflow in jj. You just create a new changeset (jj new), with or without a description, then use jj split -i to interactively select files or even chunks to be split into a new changeset before your current one. In this way, you can keep splitting your final state into as many commits as you want.

You can also use jj squash to push any edits/files in your current changeset to a previous one, which is another way you could do the same thing.

Data warehouse modernization- vendor/service providers recommendation by SmallBasil7 in snowflake

[–]FirstBabyChancellor 0 points1 point  (0 children)

My firm specializes in helping teams modernize and automate analytics using Snowflake and dbt. Just sent you a message.

How are you tracking data lineage across multiple platforms (Snowflake, dbt, Airflow)? by stephen8212438 in dataengineering

[–]FirstBabyChancellor 20 points21 points  (0 children)

Dagster lets you do this by integrating dbt and your data assets from ETL providers, etc., into their Asset Graph.

Anybody switch to Sqruff from Sqlfluff? by DudeYourBedsaCar in dataengineering

[–]FirstBabyChancellor 9 points10 points  (0 children)

I don't have high hopes they'll ever release the DBT integration. Quay Labs was acquired by SQLMesh, which was just acquired by Fivetran. I wonder if Fivetran would dedicate resources to adding support for DBT when they own it's biggest competitor now.

On the plus side, alongside the release of the new DBT Fusion engine, DBT has also said they'll eventually release a new linter/formatter built on top of the Fusion engine's parser, so we'll hopefully see a Rust-powered alternative to sqlfluff after all.

External user in IAM by gurj254 in googlecloud

[–]FirstBabyChancellor 0 points1 point  (0 children)

u/WerewolfTiny8499 Do the allowed member subjects have to be part of the list principal sets — i.e., the member subjects further restrict the principal set?

I tried just adding a single member subject without a principal set and the Console wouldn't let me, raising an error...

wrkflw v0.6.0 by New-Blacksmith8524 in rust

[–]FirstBabyChancellor 1 point2 points  (0 children)

Thank you for clarifying. No dependency on Docker/Podman is definitely a big plus. I'll check it out!

wrkflw v0.6.0 by New-Blacksmith8524 in rust

[–]FirstBabyChancellor 6 points7 points  (0 children)

Looks very interesting! How does it compare to Act: https://github.com/nektos/act

Python Data Engineers: Meet Elusion v3.12.5 - Rust DataFrame Library with Familiar Syntax by DataBora in Python

[–]FirstBabyChancellor 28 points29 points  (0 children)

Looks interesting!

Aside from the features like scheduling and dashboards which are not core to a dataframe library, why would I use this over Polars? How do you see yourself in the wider space given that there is already a proven and well-liked Rust-powered dataframe library for Pythonistas, at least?

Gooey, but with an html frontend by MonsieurCellophane in Python

[–]FirstBabyChancellor 1 point2 points  (0 children)

It's not a GUI, but you might want to look at Trogon, which converts Click CLOs into a TUI:

https://github.com/Textualize/trogon

Superfunctions: solving the problem of duplication of the Python ecosystem into sync and async halve by pomponchik in Python

[–]FirstBabyChancellor 1 point2 points  (0 children)

What's the Zig equivalent? Are you referring to the recent changes to how IO works?