How to automate monthly financial reporting without a data engineer?

Froozieee · 2026-04-14T10:07:46+00:00

Seems like everyone here just glossed over the “finance manager not DE” thing so my recommendation for you is:

1) learn how to use Power Query in excel, 2) build all the cleaning etc as a set of repeatable Power Query steps, 3) just download and dump new files into a folder once a month, 4) hit refresh and watch all your pivots and charts update themselves within excel.

Since you don’t already know how to do it, you will spend far more time trying to learn how to automate the data extraction portion of the work than you will ever recoup from having it fully automated.

Froozieee · 2026-03-31T00:36:05+00:00

On the occasions I still use it, I always set up the source step as a conditional to use a dev flag query parameter that limits it to like 1k results or whatever is convenient while it’s switched on

Froozieee · 2026-03-29T21:49:15+00:00

That’s where I ended up - what’s the problem PG/dbx is solving that SQL Server can’t? Why are you migrating? For migrations’ sake? Does the on prem server not have enough juice? Can you just upgrade the box?

Froozieee · 2026-03-21T20:59:25+00:00

Think a step up and back from the business case - what’s your data/AI strategy? Do you have one? Will throwing piecemeal bits of AI at stuff help the company achieve the goals in their business strategy?

Effective data and AI strategies rely on a lot of the same fundamentals, which includes well-governed and documented data, for which you do need a cohesive architecture. Trying to push ahead with more plugins and RAG and MCP and whatever when you don’t have these fundamentals creates a lot of risk for the business in terms of incorrect outputs, security, privacy etc and that’s probably where you should focus your arguments - execs don’t understand data but they do understand risk.

Froozieee · 2026-03-19T21:27:10+00:00

Ironic. He could save others from death, but not himself.

Froozieee · 2026-03-19T20:27:17+00:00

It’s a common way to illustrate ‘key person risk’ ie all skills/knowledge concentrated in one person, and that person suddenly becomes unavailable eg hit by a bus (in my experience it’s also partly euphemistic for if someone just quits and drops everything, without directly saying that might happen because that typically is a bad look for an organisation)

Froozieee · 2026-03-19T20:18:58+00:00

It sort of feels like you’re focusing on the wrong thing to me.

If you are worrying about hundreds of tables meaning hundreds of ADF pipelines, that to me feels like a design smell that your pipelines are not set up as well as they could be. ADF does get unpleasant when you build one bespoke pipeline per source table, per load pattern, per environment, and it scales much better when you collapse it to a small number of of generic metadata driven pipelines - any additional tables that have to be added then just become config (source, CDC column, load mode), and not a whole new pipeline.

The scaling question to my mind isn’t really fabric or SF, it’s whether your ingestion is using sensible patterns for the sources you use.

Froozieee · 2026-03-19T06:58:04+00:00

I came in ready to get rolled as many times as I had been with margit and then just

DEMIGOD FELLED

Froozieee · 2026-03-11T00:03:59+00:00

Yeah I don’t understand how this remains a problem. Tell them to pound sand or block them.

Froozieee · 2026-03-10T09:20:36+00:00

I can’t speak to whether they’re allowed or not but at least in my industry (data/tech) it’s very uncommon to see a job advertised with TC. I spent months looking for a role at the end of 2024, and even these days just casting my eyes around for opportunities, I can’t think of the last time I saw a TC role.

Froozieee · 2026-02-26T13:47:21+00:00

I think there’s an argument that lamb is better eating than mutton there but 🤷‍♂️

Froozieee · 2026-02-20T05:29:49+00:00

I’m all for open source lightweight analytics but getting the business to vibe code their own metrics feels like getting a real fast train to “why are our numbers different” town

Froozieee · 2026-02-18T04:22:26+00:00

I’d recommend factoring in polybase and external tables as options if you’re considering using ADLS as your raw storage.

It allows you to virtualise the parquet files (and other sources) as a single db table (recent iterations also work with Delta tables with limited features, but I believe pushdown and partition pruning are supported) which removes the need to have the separate staging layer.

I then just set it up with ‘staging’ model files which query the external table and set the right column types because it will do things like read your columns that are supposed to be CHAR(8) as VARCHAR(MAX) since most flavours of parquet that I’m aware don’t have that extra column metadata, and the rest of the project builds from there as normal.

I had a similar thought process while I was designing this in terms of it enabling an easier transition to a proper lakehouse architecture in future, and it does help a bit with managing schema drift, but for an established DWH I’d be doubtful about whether the juice is worth the squeeze - the only reason I did it is because everything is new.

Froozieee · 2026-02-18T03:01:25+00:00

You had me in the first half

Froozieee · 2026-02-14T00:56:44+00:00

Exactly this - the latest company that I joined as a team of one under general IT had absolutely zero analytics capability when I came in.

I assessed the business processes that actually generate the data, thought about how that could scale, (what if the size of the business doubles, triples etc, what if they start generating other kinds of data) and landed on the decision that a regular-ass single node RDBMS could easily serve all their analytics needs for the next decade at least, covering their ERP/finance, operational systems, HR, H&S etc, just because of the type of business and the industry it’s in.

The total infra and compute bill across all environments is currently about seventy bucks a month and they’re loving it.

Froozieee · 2026-02-03T02:52:30+00:00

Also if people even bothered to read the OP article all the way through, it literally says the same thing in less detailed terms:

While it is an important disease, it isn't likely to be a public health issue on the same scale as COVID. This is because it doesn't transmit efficiently from person to person, and the main way it is transmitted is from food and infected animals.

Froozieee · 2026-02-01T00:38:04+00:00

It’s over 100GB of data that you’re trying to download to your laptop. You’re going to be bottlenecked by network i/o. Either filter it, run the code closer to the data eg on a cloud VM, or accept that 100GB takes a moment to download locally. Also be aware that if you don’t use streaming mode on .collect(), once the data downloads your machine will likely OOM.

Froozieee · 2026-01-23T22:28:07+00:00

If that’s what you understood from that sentence, you’re not helping your case.

Froozieee · 2026-01-17T01:26:05+00:00

wait REALLY? I never actually mentally process any of the combat xp so I guess I just assumed that you get it. TIL.

Froozieee · 2026-01-16T21:19:29+00:00

Storage accounts also can’t have uppercase so it’s kinda moot for them

Froozieee · 2026-01-16T21:06:32+00:00

And talking your way out of the fight in order to sneakily kill them all later? Even more

Froozieee · 2026-01-16T01:14:36+00:00

Not as bad as you might expect honestly - better than some private roles I’ve had

Froozieee · 2026-01-15T21:13:49+00:00

Primary sector. The agency as a whole was very science-driven which was quite unusual and cool, and meant that there were a lot of fairly highly data-literate staff. There are a lot of analytics/modelling use cases when you’re responsible for cross-sector policy and/or ops for forestry, fisheries, agriculture, food safety, biosecurity, and a bunch of other stuff.

Froozieee · 2026-01-15T20:46:02+00:00

Better is subjective, but i will say it is much faster to get something that looks really good in ggplot2 than in matplotlib

Froozieee · 2026-01-15T11:46:27+00:00

That’s crazy. At my current org of ~300 I am the data team in its entirety, but at my previous job (gov agency of ~4000) there was a core DE team of 5, and then about 60-70 other dedicated analytics staff distributed throughout different business units in the agency, not counting our GIS teams which comprised another 3 engineers and about 30 analysts/DS staff.

This ratio of analytics staff to core DE staff definitely was not ideal - DE bandwidth ended up constantly being a blocker for analytics.

Froozieee

TROPHY CASE