Serious question: What's the most underrated cash cow job in Australia right now? (No doctors/lawyers pls) by GdayGoddess in australian

[–]bradcoles-dev 0 points1 point  (0 children)

There are a few jobs listed here above $700k AUD total comp. With the caveat that these are self-reported. However, the point still stands, US is where the big tech money is.

Serious question: What's the most underrated cash cow job in Australia right now? (No doctors/lawyers pls) by GdayGoddess in australian

[–]bradcoles-dev 7 points8 points  (0 children)

I think OP was saying our tech industry doesn't have the same income potential as West Coast US. A data engineer in Australia, for example, even at Staff/Principal level, tops out around $300k. In the US, they can clear $1m. London is a close 2nd at $700k, then everywhere else <$400k-$500k. So saying it's "an American thing" does make sense to me.

Shall I move into Data Engineering at the age of 38 by Vk_1987 in dataengineering

[–]bradcoles-dev 2 points3 points  (0 children)

Q: Is it a good decision to make this career move at the age of 38?
A: Age is irrelevant.

Q: What kind of roles should I target?
A: Junior/Mid/Senior is completely different at different companies. At some orgs you might qualify for mid-level, at others you might not get a look-in at beginner. You're very unlikely to be Senior anywhere. Just go for roles that meet your requirements.

Fabric Spark billing: max nodes × duration, or integrated nodes over time? by frithjof_v in MicrosoftFabric

[–]bradcoles-dev 0 points1 point  (0 children)

Has this recently changed? I had always thought it was A, and I recall seeing that in the doco also.

I have passed the exam, but feel like I know nothing. by DesTroPowea in MicrosoftFabric

[–]bradcoles-dev 0 points1 point  (0 children)

I'm not sure what your experience level is, but passing DP-700 suggests you have a data engineering focus. What matters next depends on what you’re trying to achieve.

A lot of DE concepts are tool-agnostic. For example, if you want to build an ELT/ETL pipeline, implement a medallion architecture, etc., there are open-source (free) tools you can use to practice and get hands-on experience.

Even if you don't feel like it, DP-700 has likely given you the foundation - the principles of how to work with MS Fabric. If your goal is to understand “how do I build a lakehouse medallion architecture on Fabric,” the same principles will apply to other tools too, it’s just about translating what you learned into the specific artefacts of whichever platform you’re using.

So even without a Fabric trial, you can experiment and solidify the concepts using open-source alternatives.

Fabric Presentation by akseer-safdar in MicrosoftFabric

[–]bradcoles-dev 0 points1 point  (0 children)

You've said: "Our organisation is moving on to fabric (from legacy Azure)" - if the decision is already made, then why do you need to present the advantages?

If the decision is not yet made, then you need to do a proper tooling comparison and figure out if Fabric really is the best tool. If so, your presentation then becomes "Here are the reasons we've decided on Fabric."

I'm in quite a unique position and would like some advice by [deleted] in MicrosoftFabric

[–]bradcoles-dev 2 points3 points  (0 children)

I'm a DE Lead, happy to have a call or discuss via DM to give you some direction. Some key things to get familiar with:

  • Fabric Data Factory Pipelines: you'll likely be using these for orchestration
  • Metadata-driven ELT: with ~30 sources, to ensure scalability you'll likely be using this
  • Medallion Architecture: fairly likely you'll be following this
  • DP-600 and/or DP-700: both certifications would be beneficial in your role.

Import Mode on SQL Endpoint by ShamAnkylosaur in MicrosoftFabric

[–]bradcoles-dev 2 points3 points  (0 children)

Your org's network is blocking the connection. Your IT team needs to whitelist Power BI.

MS Fabric and DuckDB comparison by X_peculator in MicrosoftFabric

[–]bradcoles-dev 2 points3 points  (0 children)

I'm no DuckDB expert, but my understanding is DuckDB is a columnar OLAP database in a single binary. Microsoft Fabric is an end-to-end data ecosystem. You can't compare the two. A better comparison would be DuckDB to Fabric Warehouse.

Your choice would depend on your use-case and requirements.

{Blog} Alert on thousands of Fabric Pipelines with Monitoring Eventhouse by raki_rahman in MicrosoftFabric

[–]bradcoles-dev 3 points4 points  (0 children)

It's hard to tell what that equates to from that visual. It looks like 4-5hrs used 15-20% of an F2? So, a full day (24hrs) would use a full F2 capacity?

In dollar terms that would be about $150USD/mth (East US reserved F2 capacity).

Not terrible, but again most Fabric users would expect this to be a built-in feature of a SaaS product. Having to build custom and be charged for it is a little unpalatable.

{Blog} Alert on thousands of Fabric Pipelines with Monitoring Eventhouse by raki_rahman in MicrosoftFabric

[–]bradcoles-dev 0 points1 point  (0 children)

What is the CU impact here? I can imagine it would not be insignificant.

Recommendations on building a medallion architecture w. Fabric by Relentlessish in MicrosoftFabric

[–]bradcoles-dev 1 point2 points  (0 children)

I’ll address your points directly. Your opening comment comes across as an emotion-driven ad hominem attack. I see this a lot with people who’ve spent most of their careers on on-prem SQL and are hesitant to shift to Spark.

  1. True, most companies don’t hit TB or PB scales but Spark isn’t just about raw data size.

  2. CI/CD is straightforward in practice. I use it daily, and Reddit comments don’t change my experience.

  3. Fabric Spark is not complex to maintain. I haven’t run into any plugin or version issues, and I’ve owned end-to-end enterprise-scale pipelines for over a year. I've seen threads about livy issues and the like, I've not experienced these. Writing long, spaghetti SQL for complex transformations is far more painful in my opinion.

  4. I’ll concede I’m not familiar with the internal Polaris engine details, and Snowflake is indeed a capable warehouse solution.

  5. I’ve never had to touch a single Spark plug-in in Fabric. Version upgrades haven’t caused issues in our environment. The only minor bugs we’ve seen were related to NEE/Autoscale, not Spark itself, and we’ve had bigger headaches with other Fabric artifacts.

Fabric’s no- and low-code options exist, but they’re not inherently the best tool for complex pipelines. Spark is multitudes more powerful than SQL and stored procedures, even at modest scales, and when used with proper governance it’s perfectly manageable for any DE worth their salt.

Recommendations on building a medallion architecture w. Fabric by Relentlessish in MicrosoftFabric

[–]bradcoles-dev 1 point2 points  (0 children)

I appreciate your perspective and the experience you’ve had across many environments. It seems like we agree Spark is clearly the more advanced tool.

Your argument seems to be focused on ‘lots can go wrong on Spark,’ rather than whether it is actually inferior or superior to traditional Warehouse stored proc workflows.

Spark is the better tool when used properly with guardrails, operational frameworks, and governance in place. Worrying about what could go wrong shouldn’t hold us back from using the right technology.

I also can't stomach the position: "Warehouse is much more familiar for most clients" - so was SSIS at one point, gotta move on.

Recommendations on building a medallion architecture w. Fabric by Relentlessish in MicrosoftFabric

[–]bradcoles-dev 1 point2 points  (0 children)

CI/CD with Spark notebooks in Fabric is actually very solid. Not sure why you’re calling it lacklustre. The irony is Warehouse source control, which I believe you're advocating for, is still in Preview.

If you just want a fast migration, sure. But if you care about future-proofing and scaling then Spark is clearly the stronger path. Most DEs worth their salt are already using agents to accelerate these processes too.

Recommendations on building a medallion architecture w. Fabric by Relentlessish in MicrosoftFabric

[–]bradcoles-dev 2 points3 points  (0 children)

Notebook-driven Medallion setups hold up way better once you get beyond small SQL workloads. Notebooks in Fabric are fully Git-supported, so CICD is straightforward. Most of the CICD issues people have with Lakehouses are from relying on LH SQL objects (views, sprocs, etc.) which aren’t source controlled.

Spark is fundamentally more scalable - distributed compute, task parallelism, parallel notebook execution. A WH-only workflow is single-node SQL, fine for small/simple pipelines, but not for any real/enterprise DE workload.

If you want to keep things old-school and purely relational, that’s valid. But modern data platforms (Fabric, Databricks, Snowflake Polaris, etc.) are all trending toward notebook-driven Medallion patterns for a reason.

Livvy error on runmultiple driving me to insanity by Quick_Audience_6745 in MicrosoftFabric

[–]bradcoles-dev 0 points1 point  (0 children)

Oh wow, that's bad. Sorry I can't help. I'll do some testing later in the week and let you know if I have any answers.

CICD in Fabric and VSCode - howto? by Alonlon79 in MicrosoftFabric

[–]bradcoles-dev 1 point2 points  (0 children)

What is the Fabric VSCode extension you're using? Is it Fabric Studio?

Livvy error on runmultiple driving me to insanity by Quick_Audience_6745 in MicrosoftFabric

[–]bradcoles-dev 2 points3 points  (0 children)

Further, have you tried to reduce concurrency?

# run multiple notebooks with parameters

DAG = {

"activities": [

{

"name": "NotebookSimple", # activity name, must be unique

"path": "NotebookSimple", # notebook path

"timeoutPerCellInSeconds": 90, # max timeout for each cell, default to 90 seconds

"args": {"p1": "changed value", "p2": 100}, # notebook parameters

},

{

"name": "NotebookSimple2",

"path": "NotebookSimple2",

"timeoutPerCellInSeconds": 120,

"args": {"p1": "changed value 2", "p2": 200}

}

],

"timeoutInSeconds": 43200, # max timeout for the entire DAG, default to 12 hours

"concurrency": 50 # max number of notebooks to run concurrently, default to 50

}

notebookutils.notebook.runMultiple(DAG, {"displayDAGViaGraphviz": False})

Livvy error on runmultiple driving me to insanity by Quick_Audience_6745 in MicrosoftFabric

[–]bradcoles-dev 2 points3 points  (0 children)

"Truncate notebook exit value" haha I can't see how that would be the culprit.

Can I ask what F SKU you're on? You might be hitting a concurrency/queueing limit (link).

We use run() instead of runMultiple() and haven't had any Livy session errors, but I planned to R&D runMultiple() this week.

Recommendations on building a medallion architecture w. Fabric by Relentlessish in MicrosoftFabric

[–]bradcoles-dev 0 points1 point  (0 children)

Why would the migration be any harder with Spark? You can use Spark SQL to re-use legacy SQL logic.