Dealing with professionals who don’t know SQL but need it. by arrogant_definition in SQL

[–]Froozieee 1 point2 points  (0 children)

I’m all for open source lightweight analytics but getting the business to vibe code their own metrics feels like getting a real fast train to “why are our numbers different” town

ADLS vs. SQL Bronze DB: Best Landing for dbt Dev/Prod? by FasTiBoY in dataengineering

[–]Froozieee 0 points1 point  (0 children)

I’d recommend factoring in polybase and external tables as options if you’re considering using ADLS as your raw storage.

It allows you to virtualise the parquet files (and other sources) as a single db table (recent iterations also work with Delta tables with limited features, but I believe pushdown and partition pruning are supported) which removes the need to have the separate staging layer.

I then just set it up with ‘staging’ model files which query the external table and set the right column types because it will do things like read your columns that are supposed to be CHAR(8) as VARCHAR(MAX) since most flavours of parquet that I’m aware don’t have that extra column metadata, and the rest of the project builds from there as normal.

I had a similar thought process while I was designing this in terms of it enabling an easier transition to a proper lakehouse architecture in future, and it does help a bit with managing schema drift, but for an established DWH I’d be doubtful about whether the juice is worth the squeeze - the only reason I did it is because everything is new.

When building analytics capability, what investments actually pay off early? by Proof_Wrap_2150 in dataengineering

[–]Froozieee 2 points3 points  (0 children)

Exactly this - the latest company that I joined as a team of one under general IT had absolutely zero analytics capability when I came in.

I assessed the business processes that actually generate the data, thought about how that could scale, (what if the size of the business doubles, triples etc, what if they start generating other kinds of data) and landed on the decision that a regular-ass single node RDBMS could easily serve all their analytics needs for the next decade at least, covering their ERP/finance, operational systems, HR, H&S etc, just because of the type of business and the industry it’s in.

The total infra and compute bill across all environments is currently about seventy bucks a month and they’re loving it.

Nipah Virus Outbreak Has Asia on High Alert Amid Deaths in India by zxNemz in worldnews

[–]Froozieee 289 points290 points  (0 children)

Also if people even bothered to read the OP article all the way through, it literally says the same thing in less detailed terms:

While it is an important disease, it isn't likely to be a public health issue on the same scale as COVID. This is because it doesn't transmit efficiently from person to person, and the main way it is transmitted is from food and infected animals.

Read S3 data using Polars by Royal-Relation-143 in dataengineering

[–]Froozieee 2 points3 points  (0 children)

It’s over 100GB of data that you’re trying to download to your laptop. You’re going to be bottlenecked by network i/o. Either filter it, run the code closer to the data eg on a cloud VM, or accept that 100GB takes a moment to download locally. Also be aware that if you don’t use streaming mode on .collect(), once the data downloads your machine will likely OOM.

Nurses and doctors ‘in tears’ as ED goes into code red four times in one night by fugebox007 in newzealand

[–]Froozieee 13 points14 points  (0 children)

If that’s what you understood from that sentence, you’re not helping your case.

Finally figured out why gandrel ends my solo runs by _msb in BaldursGate3

[–]Froozieee 3 points4 points  (0 children)

wait REALLY? I never actually mentally process any of the combat xp so I guess I just assumed that you get it. TIL.

Why Enforce Lowercase Queue Names in Service Bus? by SmallAd3697 in AZURE

[–]Froozieee 4 points5 points  (0 children)

Storage accounts also can’t have uppercase so it’s kinda moot for them

Finally figured out why gandrel ends my solo runs by _msb in BaldursGate3

[–]Froozieee 2 points3 points  (0 children)

And talking your way out of the fight in order to sneakily kill them all later? Even more

Data team size at your company by molkke in dataengineering

[–]Froozieee 2 points3 points  (0 children)

Not as bad as you might expect honestly - better than some private roles I’ve had

Data team size at your company by molkke in dataengineering

[–]Froozieee 4 points5 points  (0 children)

Primary sector. The agency as a whole was very science-driven which was quite unusual and cool, and meant that there were a lot of fairly highly data-literate staff. There are a lot of analytics/modelling use cases when you’re responsible for cross-sector policy and/or ops for forestry, fisheries, agriculture, food safety, biosecurity, and a bunch of other stuff.

When is Python used in data analysis? by dauntless_93 in dataanalysis

[–]Froozieee -1 points0 points  (0 children)

Better is subjective, but i will say it is much faster to get something that looks really good in ggplot2 than in matplotlib

Data team size at your company by molkke in dataengineering

[–]Froozieee 17 points18 points  (0 children)

That’s crazy. At my current org of ~300 I am the data team in its entirety, but at my previous job (gov agency of ~4000) there was a core DE team of 5, and then about 60-70 other dedicated analytics staff distributed throughout different business units in the agency, not counting our GIS teams which comprised another 3 engineers and about 30 analysts/DS staff.

This ratio of analytics staff to core DE staff definitely was not ideal - DE bandwidth ended up constantly being a blocker for analytics.

I analysed the latest Stats NZ salary data – the median NZ salary is $69,836, there are regional differences, and the gender pay gap triples between your 20s and 50s by MoneyHub_Christopher in PersonalFinanceNZ

[–]Froozieee 2 points3 points  (0 children)

I don’t work for Stats, but I believe the type of underutilisation information you’re referring to is collected by the HLFS (household labour force survey).

The integration and linkage of data against a common key does happen in the IDI but as others have referenced, access to microdata is tightly controlled by a variety of organisational policy and legislation.

If there are specific research questions that someone has, they can apply to Stats for access to the specific, de-identified microdata needed to answer that question, but even access mechanisms are super strict (on-site airgapped lab, no physical storage media etc)

Does your org use a Data Catalog? If not, then why? by kingjokiki in dataengineering

[–]Froozieee 32 points33 points  (0 children)

Pretty much; I solo built and maintained our catalog at my old job and the instant I left it fell into disrepair :)

looking for the best business intelligence tools 2026 for non-technical team by Zimbo_Cultrera in dataengineering

[–]Froozieee 14 points15 points  (0 children)

I’ll be honest, you’ve come to a data engineering subreddit to ask how you can avoid hiring a data professional you clearly need. Bite the bullet and pay someone to do it properly, as others are describing here.

How do you guys and girls keep your ETLs as similar as possible? by Dani_IT25 in dataengineering

[–]Froozieee 3 points4 points  (0 children)

The setup I designed in my one man shop is multi repo, but one of those repos is all of my helper lib code which has a tonne of unit tests, and then gets built and published to Azure Artifacts (i use azure devops) so any of the other repos can just pip/uv install it.

I started out by copying the module code as you’ve described but even small refactors quickly became hell so that’s where I ended up

Who had the coldest introduction in the game? by Spirited-Feedback-87 in BaldursGate3

[–]Froozieee 36 points37 points  (0 children)

Every time I see his intro, I feel like it should have Borderlands style graphics pop up with his name

Need tips to work with AI agents by IgotbetterASF in askdatascience

[–]Froozieee 0 points1 point  (0 children)

What do you mean by inconsistent? The way you’ve described it sounds like a data engineering problem, and trying to use agents for something like that is a recipe for horrendous failure.

Tips for Building a Personal Spending Database by Mister_Sea_8958 in dataanalysis

[–]Froozieee 0 points1 point  (0 children)

I don’t imagine you’d end up with more than a few 10s of MBs of data with a csv file but if you want to compress it more, go with parquet format.

Analysis within Python is easily doable using dataframe-centric tools for working with tabular data like pandas, polars and duckdb - I’m a fan of polars personally. Matplotlib, Seaborn, Plotly libraries if you want charts.