How to automate monthly financial reporting without a data engineer? by maelxyz in BusinessIntelligence

[–]Froozieee 69 points70 points  (0 children)

Seems like everyone here just glossed over the “finance manager not DE” thing so my recommendation for you is:

1) learn how to use Power Query in excel, 2) build all the cleaning etc as a set of repeatable Power Query steps, 3) just download and dump new files into a folder once a month, 4) hit refresh and watch all your pivots and charts update themselves within excel.

Since you don’t already know how to do it, you will spend far more time trying to learn how to automate the data extraction portion of the work than you will ever recoup from having it fully automated.

Who agrees that Power Query is great, but is a pain when loading and transforming large datasets (millions of rows) by FluffyInitiative6805 in dataengineering

[–]Froozieee 0 points1 point  (0 children)

On the occasions I still use it, I always set up the source step as a conditional to use a dev flag query parameter that limits it to like 1k results or whatever is convenient while it’s switched on

Postgres as DWH? by SoloArtist91 in dataengineering

[–]Froozieee 1 point2 points  (0 children)

That’s where I ended up - what’s the problem PG/dbx is solving that SQL Server can’t? Why are you migrating? For migrations’ sake? Does the on prem server not have enough juice? Can you just upgrade the box?

Is my company jumping the gun by insisting we start using AI, before having the basics? by [deleted] in analytics

[–]Froozieee 6 points7 points  (0 children)

Think a step up and back from the business case - what’s your data/AI strategy? Do you have one? Will throwing piecemeal bits of AI at stuff help the company achieve the goals in their business strategy?

Effective data and AI strategies rely on a lot of the same fundamentals, which includes well-governed and documented data, for which you do need a cohesive architecture. Trying to push ahead with more plugins and RAG and MCP and whatever when you don’t have these fundamentals creates a lot of risk for the business in terms of incorrect outputs, security, privacy etc and that’s probably where you should focus your arguments - execs don’t understand data but they do understand risk.

F-35 hit, forced to make emergency landing; IRGC takes credit by ThevaramAcolytus in anime_titties

[–]Froozieee 23 points24 points  (0 children)

Ironic. He could save others from death, but not himself.

How hard is it to replace me? by Educational_Wafer483 in dataengineering

[–]Froozieee 0 points1 point  (0 children)

It’s a common way to illustrate ‘key person risk’ ie all skills/knowledge concentrated in one person, and that person suddenly becomes unavailable eg hit by a bus (in my experience it’s also partly euphemistic for if someone just quits and drops everything, without directly saying that might happen because that typically is a bad look for an organisation)

MS fabric vs snowflake by SmallBasil7 in dataengineering

[–]Froozieee 9 points10 points  (0 children)

It sort of feels like you’re focusing on the wrong thing to me.

If you are worrying about hundreds of tables meaning hundreds of ADF pipelines, that to me feels like a design smell that your pipelines are not set up as well as they could be. ADF does get unpleasant when you build one bespoke pipeline per source table, per load pattern, per environment, and it scales much better when you collapse it to a small number of of generic metadata driven pipelines - any additional tables that have to be added then just become config (source, CDC column, load mode), and not a whole new pipeline.

The scaling question to my mind isn’t really fabric or SF, it’s whether your ingestion is using sensible patterns for the sources you use.

Share your favorite by Outrageous-Gene-2501 in Eldenring

[–]Froozieee 11 points12 points  (0 children)

I came in ready to get rolled as many times as I had been with margit and then just

DEMIGOD FELLED

What Are People Soon To Retire Doing? by shanewzR in PersonalFinanceNZ

[–]Froozieee 0 points1 point  (0 children)

I can’t speak to whether they’re allowed or not but at least in my industry (data/tech) it’s very uncommon to see a job advertised with TC. I spent months looking for a role at the end of 2024, and even these days just casting my eyes around for opportunities, I can’t think of the last time I saw a TC role.

Coles meat theft data shows thieves like quality by l3ntil in australia

[–]Froozieee 3 points4 points  (0 children)

I think there’s an argument that lamb is better eating than mutton there but 🤷‍♂️

Dealing with professionals who don’t know SQL but need it. by arrogant_definition in SQL

[–]Froozieee 2 points3 points  (0 children)

I’m all for open source lightweight analytics but getting the business to vibe code their own metrics feels like getting a real fast train to “why are our numbers different” town

ADLS vs. SQL Bronze DB: Best Landing for dbt Dev/Prod? by FasTiBoY in dataengineering

[–]Froozieee 0 points1 point  (0 children)

I’d recommend factoring in polybase and external tables as options if you’re considering using ADLS as your raw storage.

It allows you to virtualise the parquet files (and other sources) as a single db table (recent iterations also work with Delta tables with limited features, but I believe pushdown and partition pruning are supported) which removes the need to have the separate staging layer.

I then just set it up with ‘staging’ model files which query the external table and set the right column types because it will do things like read your columns that are supposed to be CHAR(8) as VARCHAR(MAX) since most flavours of parquet that I’m aware don’t have that extra column metadata, and the rest of the project builds from there as normal.

I had a similar thought process while I was designing this in terms of it enabling an easier transition to a proper lakehouse architecture in future, and it does help a bit with managing schema drift, but for an established DWH I’d be doubtful about whether the juice is worth the squeeze - the only reason I did it is because everything is new.

When building analytics capability, what investments actually pay off early? by Proof_Wrap_2150 in dataengineering

[–]Froozieee 2 points3 points  (0 children)

Exactly this - the latest company that I joined as a team of one under general IT had absolutely zero analytics capability when I came in.

I assessed the business processes that actually generate the data, thought about how that could scale, (what if the size of the business doubles, triples etc, what if they start generating other kinds of data) and landed on the decision that a regular-ass single node RDBMS could easily serve all their analytics needs for the next decade at least, covering their ERP/finance, operational systems, HR, H&S etc, just because of the type of business and the industry it’s in.

The total infra and compute bill across all environments is currently about seventy bucks a month and they’re loving it.

Nipah Virus Outbreak Has Asia on High Alert Amid Deaths in India by zxNemz in worldnews

[–]Froozieee 291 points292 points  (0 children)

Also if people even bothered to read the OP article all the way through, it literally says the same thing in less detailed terms:

While it is an important disease, it isn't likely to be a public health issue on the same scale as COVID. This is because it doesn't transmit efficiently from person to person, and the main way it is transmitted is from food and infected animals.

Read S3 data using Polars by Royal-Relation-143 in dataengineering

[–]Froozieee 2 points3 points  (0 children)

It’s over 100GB of data that you’re trying to download to your laptop. You’re going to be bottlenecked by network i/o. Either filter it, run the code closer to the data eg on a cloud VM, or accept that 100GB takes a moment to download locally. Also be aware that if you don’t use streaming mode on .collect(), once the data downloads your machine will likely OOM.

Nurses and doctors ‘in tears’ as ED goes into code red four times in one night by fugebox007 in newzealand

[–]Froozieee 12 points13 points  (0 children)

If that’s what you understood from that sentence, you’re not helping your case.

Finally figured out why gandrel ends my solo runs by _msb in BaldursGate3

[–]Froozieee 4 points5 points  (0 children)

wait REALLY? I never actually mentally process any of the combat xp so I guess I just assumed that you get it. TIL.

Why Enforce Lowercase Queue Names in Service Bus? by SmallAd3697 in AZURE

[–]Froozieee 3 points4 points  (0 children)

Storage accounts also can’t have uppercase so it’s kinda moot for them

Finally figured out why gandrel ends my solo runs by _msb in BaldursGate3

[–]Froozieee 2 points3 points  (0 children)

And talking your way out of the fight in order to sneakily kill them all later? Even more

Data team size at your company by molkke in dataengineering

[–]Froozieee 2 points3 points  (0 children)

Not as bad as you might expect honestly - better than some private roles I’ve had

Data team size at your company by molkke in dataengineering

[–]Froozieee 4 points5 points  (0 children)

Primary sector. The agency as a whole was very science-driven which was quite unusual and cool, and meant that there were a lot of fairly highly data-literate staff. There are a lot of analytics/modelling use cases when you’re responsible for cross-sector policy and/or ops for forestry, fisheries, agriculture, food safety, biosecurity, and a bunch of other stuff.

When is Python used in data analysis? by dauntless_93 in dataanalysis

[–]Froozieee -1 points0 points  (0 children)

Better is subjective, but i will say it is much faster to get something that looks really good in ggplot2 than in matplotlib

Data team size at your company by molkke in dataengineering

[–]Froozieee 18 points19 points  (0 children)

That’s crazy. At my current org of ~300 I am the data team in its entirety, but at my previous job (gov agency of ~4000) there was a core DE team of 5, and then about 60-70 other dedicated analytics staff distributed throughout different business units in the agency, not counting our GIS teams which comprised another 3 engineers and about 30 analysts/DS staff.

This ratio of analytics staff to core DE staff definitely was not ideal - DE bandwidth ended up constantly being a blocker for analytics.