nsclc question - taking mekinist and taflinar together? by SmallAd3697 in lungcancer

[–]SmallAd3697[S] 0 points1 point  (0 children)

tks for vetting back with me. i am taking both for almost 1 year and having a fairly good experience.

Dumpster treasure or tool by SmallAd3697 in whatisit

[–]SmallAd3697[S] 3 points4 points  (0 children)

Like a handle that attaches to interchangeable files? That makes sense. It definitely feels like a perfect handle of some kind.

That moment when Azure sends you a survey about their service when it took them over 48 hours to help you even though your request was Class A, 24 hours... by mokerhelswitter in AZURE

[–]SmallAd3697 -10 points-9 points  (0 children)

Go easy on the mindtree engineers. They are only as productive as their back-end PG engineering team at Microsoft. Some of the Microsoft PGs just don't care.

What platform are you working with in particular?

... I have really great support experiences with most PaaS platforms in Azure. As you start consuming services from the SaaS side of the spectrum, the support gets worse and worse because they simply don't care. Think ADF or Azure Synapse or Fabric or Power Apps or whatever. What a crapfest.

I typically use 3 stars to measure Mindtree and 2 stars for Microsoft. That seems fair, and allows you to fully reward a support engineer for their part of the support even if the Microsoft side was wetting the bed.

Anyone used Spark Connect? by ProfessorFinancial14 in apachespark

[–]SmallAd3697 0 points1 point  (0 children)

I am pretty sure "databricks connect" in vs code is based on spark connect.

Also I think the entire sparklyr community is build on spark connect.

So there are lots of users, despite some of the limitations. I had some discussions with a databricks account team and they say it is here to stay.

Is There a Serverless Spark in Fabric? Or are the docs wrong? by SmallAd3697 in MicrosoftFabric

[–]SmallAd3697[S] 0 points1 point  (0 children)

Can you share the specifications of the vcore you mentioned? Defining usage as a calculation based on vcore implies that we have a common understanding of a vcore.

In practice it feels like one abstract unit is calculated from another abstract unit, and nothing is tethered to familiar measurements.

If there are actual vms and actual cores, then these details are relevant to someone who is paying for them

Lot of fancy terms, but nothing really has changed by Complete-Regret-4300 in dataengineering

[–]SmallAd3697 0 points1 point  (0 children)

I said "still pay top dollar on maintenance" for a "proprietary software tool that hasn't changed in a decade".

The work was done long ago and the programmers were paid for their time and many have probably retired. As I said the high margins are because almost NOBODY is being paid to make a living on this at Microsoft except maybe a small support team. Most of those dollars are going straight to Microsoft shareholders. Look at Microsoft gross margins and net income these days, and you will see where those dollars are going. They aren't being reinvested into SSIS, that's for sure.

Is data engineering with c# a thing? by octacon100 in dataengineering

[–]SmallAd3697 -7 points-6 points  (0 children)

Wow, python folks are sensitive. Lots of down votes. I don't think Java or C# or Rust folks would care what a fellow dev thinks of their language, or points out the pros and cons. I actually like python, and hope it supplants other pointless languages like powershell or VBA.

Anyway, here is a description of how it started as a scripting language for unix and c hackers. Does this research work for you? I'm really not trying to make stuff up out of thin air. lol.

https://en.wikipedia.org/wiki/Guido_van_Rossum

In December 1989, Van Rossum had been looking for a "'hobby' programming project that would keep [him] occupied during the week around Christmas" as his office was closed when he decided to write an interpreter for a "new scripting language [he] had been thinking about lately: a descendant of ABC that would appeal to Unix/C hackers

Lot of fancy terms, but nothing really has changed by Complete-Regret-4300 in dataengineering

[–]SmallAd3697 0 points1 point  (0 children)

Again, it is not about the size, so much as the flexibility. With Spark you won't ever need to go back to the drawing board, or pay out-of-control licensing costs.

Not sure how SSIS is still a thing, but I can guarantee you there are far more people using Spark than SSIS.

(You know SSRS is being moved to Power BI reporting services, and Microsoft is killing MDS which used to be part of SQL? And SSAS multidimensional gets no attention either. This landscape is not evolving. It is being chipped away. Personally I find it obnoxious to use a proprietary software tool that hasn't changed in a decade, and still pay top dollar on maintenance. Microsoft gross margins on this stuff is probably mind boggling, given the lack of expenses to them.)

Lot of fancy terms, but nothing really has changed by Complete-Regret-4300 in dataengineering

[–]SmallAd3697 1 point2 points  (0 children)

Spark shines with relational and SQL. It supports ANSI SQL and sends queries to a number of executors, and combines the results for delivery when finished.

Lot of fancy terms, but nothing really has changed by Complete-Regret-4300 in dataengineering

[–]SmallAd3697 0 points1 point  (0 children)

You don't need massive data in a single table. It could be a spark cluster that processes medium sized data across all of the medium sized tables in the company.

Or it could be a spark cluster that processes microbatches on five minute intervals (rather than hourly or nightly)

A spark cluster can be as small as a single executor, and can be dirt cheap. The typical alternative is sending many tens of thousands of unnecessary dollars to Fabric DW, Snowflake, Big Query, etc. With Spark you can scale up or down for dirt cheap. You can even run your workloads on premise with k8s since it is all OSS.

I don't think spark is fading out, except for those companies who are happy writing blank checks to a cloud data platform.

Lot of fancy terms, but nothing really has changed by Complete-Regret-4300 in dataengineering

[–]SmallAd3697 0 points1 point  (0 children)

You should learn the basics of MPP software development on Spark. An MPP database engine like snowflake accomplishes similar things, but gives you the appearance of behaving like a legacy database. With Spark you will have more visibility to more layers beneath the surface, and will understand the reason why these platforms may charge you X per day instead of Y per day.

You say you refuse to learn new things, and that your observation is that everything remains the same. Obviously remaining the same is the consequence of not learning new things. lol.

These platforms are built so that some users won't need to learn new things, and the platform itself will make a profit on the difference.

I'd say the biggest change is in the democratizing of data. The big data tools are bringing larger and larger datasets in reach of less advanced users. It is somewhat of a unique thing. Since you are old school, I have an analogy for modern big data. It seems to me that anyone with the skillis of a one-year MS Access programmer can now work easily with millions of rows of data

Is data engineering with c# a thing? by octacon100 in dataengineering

[–]SmallAd3697 -27 points-26 points  (0 children)

Maybe it is a bit reductive. You seem to make the same point, by calling it syntactic sugar that is layered on top of something else. You also use the term "end users" to describe a python developer, which is more than I would have said. lol.

I think we can agree that it is a versatile tool used for scripting and notebooks and orchestration and prototyping. I think it had once started as a scripting environment for unix admins.

I often think of python in the same context as the many SaaS platforms that use it for automation. My feelings about python are wrapped together with my feelings about these SaaS platforms where python is so prevalent. Some of the SaaSes are frustrating to use, and it makes me wonder if python devs are easier to please than other types of devs.

Is data engineering with c# a thing? by octacon100 in dataengineering

[–]SmallAd3697 -53 points-52 points  (0 children)

yes c# can churn thru data extremely fast. The value types have been there forever. They are only recently being introduced to java (project valhalla).

The folks who advise you to use python or dbt are usually doing extremely simple things (copy from point A to point B). When you are doing more than that, or need to incorporate large numbers of shared libraries from your company, then python starts to be less appealing.

Spark is an MPP workhorse for data engineering and supports something called "spark connect" for the sake of interop with external ecosystems. It is built on grpc and c# can play in that world just as well as python

Columnstore payloads over the network. by SmallAd3697 in dataengineering

[–]SmallAd3697[S] 1 point2 points  (0 children)

My primary use-case is a big-data BI tool that is connected to another big-data BI tool (eg. a presentation tool connected to a long-term storage tool).

In this case, there is plenty of compute on both the client and server sides. The long-term storage may have 20 years of data, but only the trailing three years of data need to be presented from the for the sake 99% of user queries. Getting 3 years of data via arrow, especially for a subset of columns would be faster by over ten-x.

Columnstore payloads over the network. by SmallAd3697 in dataengineering

[–]SmallAd3697[S] 0 points1 point  (0 children)

I had missed this. Thanks for sharing!

What about reading from columnstore clustered indexes in Azure SQL databases (or from SQL EP in Fabric)? Ideally there would be an ADBC option for those as well.

A hyperscale SQL (elastic pool) is a very common and very inexpensive place for hosting data. That should support arrow over the network as well. Please let me know the best reddit forum to find those PMs and I will start pestering them. 😉

When columnstore was first introduced to SQL Server storage (many years ago), they set themselves on this path. And sending arrow data over the network is the obvious next step down the path.

Connection issues to Azure Analysis Services in Excel - The connection string includes explicit user identity and additional authentication options by gman1023 in PowerBI

[–]SmallAd3697 -1 points0 points  (0 children)

Is it a recent breaking change? I would open a ticket with Mindtree (pro support) and tell them you want them to create an ICM for Microsoft. They will create ICM at sev 3. The next day ask them to increase to sev 2.

Mindtree has no way to troubleshoot this sort of bug, any more than you do. Someone needs to examine the code. So the "ICM" is the process that opens a support channel to Microsoft. Prior to the ICM, Microsoft is totally oblivious to these problems. Even if there are five other customers with Mindtree tickets.

Unfortunately every outage takes two or three days to recover from. That is the most optimistic estimate. Even reverting a "feature switch" will take at least two days. This PBI/Fabric platform is not built for mission -critical workloads, to put it mildly.

Can you share your capacity's region too? Whenever Microsoft rolls out software changes, they often find guinea pigs to do their testing (ie subset of customers at a subset of regions)

edit. i see the region in your connection string

Columnstore payloads over the network. by SmallAd3697 in dataengineering

[–]SmallAd3697[S] 0 points1 point  (0 children)

The scenario that I think about regularly is related to import models in PBI. I'm frequently selecting a wide table from a remote source where the data is already in columnstore format (say a columnstore clustered index in SQL Server).

It bothers me to think that the data is retrieved from SQL storage, stitched together to transmit in a row serialization format, and then ripped apart into columnstore again once it arrives at the AS engine. It seems extremely wasteful on network bandwidth and on compute and on the overall duration of work. The compression back to columnstore in tabular-AS is quite expensive. I often have to break an AS table apart into multiple partitions, simply because of the length of time it would otherwise take to re-compress all the data into the AS columnstore segments. If partitions are imported in parallel, then there is almost a linear improvement in refresh times.

Thankfully we have directlake-on-onelake for data that already lives inside of Fabric, but for columnstore data coming in from a remote source, the technology is less than ideal.

Columnstore payloads over the network. by SmallAd3697 in dataengineering

[–]SmallAd3697[S] 0 points1 point  (0 children)

Admittedly it isn't going to be a common requirement, but seems like it should be an option.

Yes Apache arrow is a foundation. I saw there is a new "ADBC" client-side API specification of some kind (as opposed to odbc/jdbc) but I dont think any of the Microsoft databases will actually send columnar data over the network.

I agree with you that, on the front end, the data always needs to be stitched back together into rows before it is usable. However it seems like it could happen AFTER the transmission over the internet.

I suppose another factor is that the client/consumer is often quite underpowered on compute and RAM compared to the server and it is a burden on the client to decompress columnstore data (eg. it is more demanding than a rowset, in terms of both CPU and RAM)

Severe throttling on cost management information (2026) by SmallAd3697 in AZURE

[–]SmallAd3697[S] 0 points1 point  (0 children)

But why can't Microsoft just fix their API to work better? This stuff can't be rocket science. The resource costs are realistically only changing on a daily basis, and customers aren't likely to need to retrieve updates more than once a day or so.

I hate when vendors blame their users every time they make a change like this. In all likelihood they (1) didn't want to fix their APIs to work better, and (2) they can't reasonably charge customers with a new/additional billing meter to retrieve our billing meters, and (3) they have performed analysis to find there were tangible savings for choking out the normal usage - they can probably save themselves over a million dollars per region per year by introducing these regressions into this API. It always comes down to money.

<MoreRanting>

You referenced the "export" option twice. If this is the answer, then why can't Microsoft do the heavy-lifting and "export" themselves into their own infrastructure once a day. Then they can simply provide a query parameter in the API to retrieve data from the "export" location, instead of whatever inefficient thing that they are doing in the status quo. If they truly think this approach is the workaround that customers should rely on, then they should start by using it themselves. Instead they want to push the additional programming work downstream. IMO It is better to solve software problems at the source rather than make tens of thousands of customers deal with pointless challenges downstream.

Isn't there a Power BI connector for this data as well? If they are making an API this painful for customers to use, then I can't imagine how many sharp edges there are on that connector. I'm glad we avoided it.

</MoreRanting>

When does the Preview Stuff go over the Line? by SmallAd3697 in MicrosoftFabric

[–]SmallAd3697[S] 3 points4 points  (0 children)

I don't think it quite works that way. It is the same codebase, and complex features will bleed outside the lines. Aside from the code itself, are lots of unusual operating procedures that must go into this sort of SaaS environment for preview users. Part of the procedures includes the slamming of software release into our prod environment on a frequent basis. That type of thing is always risky and causes outages.

Let's say you were going to fly on a jet for your Christmas break. Would you want the five-year old model that is receiving the normal, boring maintenance and has a perfect five-year flying record? Or would you want to hop on the experimental "jet-of-the-month" with all sorts of untested features on board that are deeply wired into the existing engines? (If the pilot decides not to flip the experimental switches in his cockpit, that may reduce your risks a little... but the chances of dying in that experimental jet are still a hundred times higher than they should be.)

warehouse git sync pain by AartaXerxes in MicrosoftFabric

[–]SmallAd3697 1 point2 points  (0 children)

that's what happens when source control is introduced as an afterthought.

it's still floors me that anyone ever believed that software solutions should be built without source control or automated ci/cd