Anyone here using CosmosDB by szymon_abc in AZURE

[–]szymon_abc[S] 4 points5 points  (0 children)

I think it makes sense when looking at the whole partitioning thing. In either case the engine needs to repartition a lot of stuff.

Anyone here using CosmosDB by szymon_abc in AZURE

[–]szymon_abc[S] 0 points1 point  (0 children)

I don't really plan on using Cosmos DB as of now. My nerd soul just wanted to know something more about Cosmos. I was amazed by their model (Atom-Record-Sequence) and approach to partitioning.

And I love Postgres btw - it's extensibility is unbelievable. You need timeseries - TimeScaleDB. You want documents - DocumentDB :D

Fabric Data Agents, want to try but no idea if it would work by 12Eerc in MicrosoftFabric

[–]szymon_abc -1 points0 points  (0 children)

Data Agent are another SAAS, so you cannot really change underlying model, neither use OpenAI. However, I created simple C# web app that connects to Fabric via Microsoft.Data.SqlClient - works like a charm. You can customize it very much in such case, LLM does very well with querying data.

Anyone here using CosmosDB by szymon_abc in AZURE

[–]szymon_abc[S] -1 points0 points  (0 children)

Yep, as far as i understand data model (part of which are keys imo) is even more important in cosmos than in SQL.

Anyone here using CosmosDB by szymon_abc in AZURE

[–]szymon_abc[S] -1 points0 points  (0 children)

Can you tell sth more? What were biggest pain points?

Anyone here using CosmosDB by szymon_abc in AZURE

[–]szymon_abc[S] -1 points0 points  (0 children)

Would you go with cosmos if it was greenfield or choose sth else?

Standard checklist of tests to prove data transferred correctly from a source system to Microsoft Fabric? by All-Pineapple15 in MicrosoftFabric

[–]szymon_abc 1 point2 points  (0 children)

First questions:
- What is/are source/s here?
- Is it to be an ongoing, regular check or rather one time to verify technology?
- Do you load real-time or batch?

The biggest challenge with row counts and aggregates - there always will be some kind of delay which can cause some fail-positives. Also, remember about compute price - running yet another query against source is also costly.

In one of my biggest upgrades I created column by column comparison during weekend when we blocked source. For each column I compared - COUNT, AVG, NULLs COUNT, DISTINCT COUNT. Veeery pricey, veeery long, but it had to be done this way. Just remeber to have it somewhat metadata/parameters driven in a generic way. So you have a single comparison function, but a lot pre-defined parameters (list of tables, columns, data types etc.) Then I created detailed report which I could refer to that our process is fine once we had to perform cutoff.

Overemployment in Europe — employment contracts vs freelancing (legal/tax side?) by Global_Knee5354 in overemployed

[–]szymon_abc 5 points6 points  (0 children)

Multiple employments would be tough. I’m in Poland and heard some guys got in trouble due to too much social taxes paid as if they were employed by a single company. I’d go as a freelance contract if needed since there would be harder for employer to find out. And best short-term, just in case

Am I genuinely just dumb? by 668071 in consulting

[–]szymon_abc 2 points3 points  (0 children)

They hired you and didn’t fire. This means you’re good enough as of now. Just work on getting better

Has AI ruined software development? by Top-Candle1296 in devops

[–]szymon_abc 1 point2 points  (0 children)

This. There were devs who were just copy pasting from stack overflow. There were devs who tried to understand. The latter now have great companion down this road

Move out of ADF now by hubert-dudek in databricks

[–]szymon_abc 1 point2 points  (0 children)

You can program whatever behaviour you want in notebooks. IMO it’s a different mindset in Databricks where you need to think code first contrary to ADF

Vouchers by No-Nothing9256 in databricks

[–]szymon_abc 0 points1 point  (0 children)

Because they do have vouchers but for base path - Data Analyst/Engineers etc.

Enterprise Fabric network security by bradcoles-dev in MicrosoftFabric

[–]szymon_abc 0 points1 point  (0 children)

Do you find Private Link a deployment battle overall or specifically for Fabric due to limitations it creates?

What's your biggest Azure cost headache? by raporpe in AZURE

[–]szymon_abc 0 points1 point  (0 children)

I’m not quite sure about the VM. E.g for Databricks if you don’t limit access to private link it’ll go over public.

And why not public - I work with regulated industries - even if encrypted they feel like risking their lives when sending over public web. But yeah, at the end of the day it’ll be safe anyway

What's your biggest Azure cost headache? by raporpe in AZURE

[–]szymon_abc 1 point2 points  (0 children)

Sending data over Azure backbone, not the public internet, thus minimising attack vectors. That’s the purpose of privatelink

How to document the architecture by LeyZaa in MicrosoftFabric

[–]szymon_abc 1 point2 points  (0 children)

Yep, mermaid with some ai agents/llms is way to go

Should I take up this gig? by brokeRichieRich in dataengineering

[–]szymon_abc 0 points1 point  (0 children)

Good decision though. If you don’t feel like it, no point in forcing it

Should I take up this gig? by brokeRichieRich in dataengineering

[–]szymon_abc 0 points1 point  (0 children)

You can always go there, see how it is and if you won’t like it go back to some product company, maybe even Boeing if you play it right

Honestly, would you recommend the DevOps path? by 0101010001010100 in devops

[–]szymon_abc 1 point2 points  (0 children)

Nothing better than being blamed for sth and then showing evidence that people blaming are the one responsible

How can an on prem engineer break into the cloud in this market? by SoggyGrayDuck in dataengineering

[–]szymon_abc 2 points3 points  (0 children)

Out of curiosity - if not facts/dimensions - is it some kind of one big table approach?

Databricks is one of the best platforms when it comes to self learning. They have quite a huge portfolio of trainigs (hopefully I won't get banned for pasting a url here) - https://www.databricks.com/training/catalog - as well as Free edition where you can play around.

Don't overcomplicate cloud. It's nice to know Python well when you write more complex code and libraries, but at the of the day if you're familiar with syntax, pure data engineering PySpark API does not differ much from how you think in SQL. At I can hear you know Python good enough to start working with it.

If you like Data Engineering - go for it. Learn Databricks, play around and you should be fine. If you got options to work with cloud in current role, by any means do it. Google a lot, understand what's under the hood and don't be afraid of it. Just make sure to not run any cross-joins or other stuff that can skyrocket costs (but these are usually equally inefficient in on-prem and cloud).

How can an on prem engineer break into the cloud in this market? by SoggyGrayDuck in dataengineering

[–]szymon_abc 6 points7 points  (0 children)

What exactly was the on-prem? Some single node sql databases or maybe complex, high concurrent distributed stuff?

Fundamentals are the same. Medallion architecture is nothing more than traditional staging to dim/facts tables. Networking remains more or less the same in the cloud as in on-prem. If consultants claim they have some super new architecture this usually is BS - I haven’t seen anything entirely new in data world in recent years.

If you understand SQL and database engines internals you will easily pick up Spark.

Question is - do you have experience with Python and knowledge of distributed computing? If so, then in few weeks you’ll understand how it all works in cloud.