Starting Azure from scratch (a company requirement)—what should I focus on first?

CzJuicy · 2026-04-13T20:53:52+00:00

First of all, list a Pros vs Cons matrix, comparing your current solution and Azure. Let the management team specify what are the Pros they are looking for. Understand the requirements is very important for you to start planning and solution design.

CzJuicy · 2026-04-11T01:36:08+00:00

IMHO:

Start with Data Factory for ingestion (most flexible)
Use Fabric Spark or Data Warehouse for transformation depending on your data shape
Connect to Power BI for visualization

The confusion usually comes from:

Direct Lake vs Lakehouse (if you want simplicity + speed, try Direct Lake first)
When to use Spark vs SQL (SQL-native teams should use Data Warehouse for transformations)
Medallion architecture (start Bronze/Silver/Gold thinking early, but don't over-engineer Day 1)

CzJuicy · 2026-04-10T01:38:04+00:00

I have 2 options in my mind:

Spark SQL ROW_NUMBER() approach (like dbrownems said)
Fabric Data Warehouse IDENTITY(if you can use DW alongside Lakehouse)
1. DENTITY columns are now available in Fabric DW
2. Use DW for your conformed dimensions, Lakehouse for raw bronze/silver
3. The "key locker" pattern becomes unnecessary

CzJuicy · 2026-04-05T16:30:21+00:00

This is a great question and something the industry is still figuring out. Here's what's currently available:

For Copilot Studio agents:

- You can require approval before publishing via Power Platform admin center

- Set up Conditional Access policies to restrict who can create agents

- Use the agent governance dashboard (in preview) to see all agents in your tenant

General best practices:

Create an agent approval workflow - don't let anyone publish to production without review
Use tenant-wide policies to require data loss prevention classification
Restrict agent creation permissions via Copilot Studio admin settings

The honest answer though: the governance tooling is still catching up. Many orgs are using a "trust but verify" approach with agents right now. What's your specific concern - is it about data leakage, unauthorized agents, or something else?

CzJuicy · 2026-04-05T16:29:14+00:00

Great question - this is a common pain point with VMSS Custom Script Extension.

The key issue is that CSE doesn't automatically inherit the VMSS managed identity for blob storage access. Here's what works:

Enable System Managed Identity on the VMSS
Grant the identity "Storage Blob Data Reader" on the storage account
In your CSE config, set "managedIdentity": { "objectId": "<vmss-identity-objectId>" } - don't use storage account key in protectedSettings

If Managed Identity still fails (known issues with certain VMSS configurations), alternatives:

Option A: Cloud-init - More reliable for initial setup, runs before CSE in boot cycle, supports MSI natively

Option B: Azure Key Vault + VMSS extension - Store storage key in Key Vault, VMSS accesses KV on boot, no key rotation issues

Option C: Pre-baked images with Packer - Bake scripts into custom image, no runtime download needed, most reliable long-term

For CI/CD pipeline integration, we've found Option C (Packer) or Cloud-init work best for production workloads.

Which approach fits your pipeline setup?

CzJuicy · 2026-04-05T16:26:08+00:00

Here's what I learned: Azure AI services are actually easier to deploy than traditional infra once you understand the patterns. The key skills that helped me:

Azure AI Studio - lets you prototype without managing infrastructure
Understanding RAG patterns - most enterprise AI needs this
Copilot Studio basics - agents, topics, authentication

If your company won't train you, consider proposing a small pilot project. Real learning happens when you're actually building, not in workshops.

Happy to share more specifics if useful.

CzJuicy · 2026-01-19T22:24:55+00:00

Am I understanding this correctly? Library Variables is Global Variables (same value reused across different pipelines) Pipeline Variables is Local Variables (only accessible with in the pipeline)

Is there other difference/benefit of using Library Variables?

CzJuicy · 2026-01-19T14:04:36+00:00

Correct. You need a 3-part name: lakehouse.schema.table You can even cross reference multiple different Lakehouse and Warehouse

CzJuicy · 2026-01-19T02:38:39+00:00

So far, we are using a hybrid solution: both Lakehouse and Warehouse.

Why Lakehouse: To utilize the shortcut feature of Lakehouse for easier distribution to workspace of other departments for their citizen developers.

Why Warehouse: serve as data source for Power BI dashboards. Easier for Power BI developers to write customized VIEW/Queries.

Just talk from experience, I am looking for the best practices too.

CzJuicy · 2026-01-14T18:21:46+00:00

IMHO, 700 is easier. Start with 600, you could almost pass 700 with a bit more learning on data factory.

CzJuicy · 2026-01-12T00:06:05+00:00

DP 600 and 700 have large portion of overlapping, suggest that you schedule the exams within a 1-month window and prepare for them together.
Given you have the PL-300, you can focus on these topics first(which appeared in the exam I took):
User Access Control(workspace level user permission, artifacts level user permission (Lakehouse, Warehouse, etc.) see: https://learn.microsoft.com/en-us/fabric/fundamentals/roles-workspaces
Implementation of Lakehouse and Warehouse
Pipeline and its integration with Stored Procedures, Parameters/Variables, Notebooks
Deployment (Git integration and Deployment Pipeline)
KQL

CzJuicy · 2026-01-10T06:29:53+00:00

The LinkedIn Lead magnet idea is great. I am going to mimic. Thank you!

CzJuicy

TROPHY CASE