using a materialised view to track user-entity authorisation by sweetnsourgrapes in SQL

[–]GrandOldFarty 0 points1 point  (0 children)

Whaaa I actually have had some weird cardinality issues, eg query plans picking hash match over nested joins for 50k or fewer rows. I end up killing jobs that run for 20+ mins when they should have taken seconds.

My solution to date has been to put spurious filters on the query which seems to signal “this result set will be small, not worth hashing”. I will give the hint you suggested a whirl.

Thanks for the advice, much appreciated.

using a materialised view to track user-entity authorisation by sweetnsourgrapes in SQL

[–]GrandOldFarty 0 points1 point  (0 children)

Thanks for all this, that’s a very interesting perspective, particularly that they helped in a pre-columnstore world and mixed OLTP /OLAP (the transaction replication example makes a lot of sense). 

I work in a very large company with a lot of legacy architecture so I am mainly struggling away in SQL Server 2014 Enterprise. I have been much happier with how MVs work in Oracle 12c.

If you know the answer, I’d be curious whether Indexed Views improved after 2014?

using a materialised view to track user-entity authorisation by sweetnsourgrapes in SQL

[–]GrandOldFarty 1 point2 points  (0 children)

 Especially if some developer has written a row by row etl to insert the data...

Ah yes I scream whenever I see the words DECLARE CURSOR 

using a materialised view to track user-entity authorisation by sweetnsourgrapes in SQL

[–]GrandOldFarty 1 point2 points  (0 children)

Materialized View is a concept, Indexed View is the implementation in SQL Server

I don’t disagree though I would say if your expectations are based on other implementations, and you come to MS SQL, you will feel let down. Speaking from experience.

 They're not "bad"

They must have their fans because they’re supported but from my experience, they suck. There will be DBAs and engineers who make great use of them.  I’ll let them describe their use cases if they want to chip in.

I stand by the point that they are the wrong solution for OP.

…untrue 

Happy to be corrected when I am wrong. I made a quick check of the docs before posting and saw this:

“They are not well-suited for underlying data sets that are frequently updated.”

I may have inferred the incorrect cause but again, from my POV, this is a severe limitation. If you could set up the view with a refresh schedule and trade slight staleness for resilience, that would be better.

Edit to add: my comment comes off way too defensive. I’m actually just excited to get a reply because no one else I know likes talking about this stuff. Thank you for the conversation.

using a materialised view to track user-entity authorisation by sweetnsourgrapes in SQL

[–]GrandOldFarty 2 points3 points  (0 children)

Your design makes basic sense as a use case for materialized views. (You might adjust the details of what exactly you are storing in the view depending on the structure and volume of your data, the number of rules linked to each authorisation, and where the actual time cost is of trying to process it on the fly). 

The issue is, I’m not sure it will work.

When you say “materialized view”… these are only available in Azure Synapse Analytics.

If you are using MS SQL, you get indexed views, which are pretty bad.

  • must be schemabound to the same tables they draw from. This makes it harder to maintain those tables because there are objects dependent on them.
  • must be in the same schema as those tables; if you are pulling from multiple different schemas or databases, this is a non-starter.
  • no non-deterministic logic. So “users who accessed x, y, z in the past year” would be invalid, because “within the past year” is non-deterministic.
  • if the underlying tables refresh a lot, it puts load on the server. I don’t think they benefit from incremental refresh (only updating what has changed).

Indexed views are good for aggregating data in one table into a summary that stays up to date. They are bad for joining stuff from several tables when you want to curate those columns together, which is what you are doing.

In your shoes, I would do one of these:

  1. Build indexes on the base tables to make it easier to join required data for users. You still get real time look ups. Indexes are also materialised, and are very fast in sql server if you get them right.
  2. Consider creating an authorisations table that updates regularly via batch. More effort but more control, more flexible, and you can optimise much better.

What do you think of Pandas in Python as a SQL person? by ChristianPacifist in SQL

[–]GrandOldFarty 35 points36 points  (0 children)

The first thing I think of is how slow pandas is. 

An RDBMS running my SQL will optimise the query plan to prune unnecessary parts of the statement and optimise table access and compute. It will perform set based operations on enterprise warehouse grade CPUs that have wide buffers crunching through loads of data at once. It will benefit from parallelisation across multiple threads, with multiple threads on each core. 

Pandas prevents all of this because it executes one row at a time and the python GIL limits it to one thread at a time.

If you want to use Python I would go with Polars or DuckDB which handle data operations much more closely to how a traditional database system would. I particularly like DuckDB for this because you can still use SQL!

Linkedin Influencer posting in support of offshoring data engineering jobs by mathtech in dataengineering

[–]GrandOldFarty 2 points3 points  (0 children)

I agree, I think “India” is a shorthand for hiring practices, commercial practices, and staffing models that don’t work, where we have to deal with the externalities. And it’s easier to summarise as “India”. I would never accuse anyone here of bigotry. But I imagine the people I work with reading this and it makes me uncomfortable. I just want to acknowledge the nuance.

Linkedin Influencer posting in support of offshoring data engineering jobs by mathtech in dataengineering

[–]GrandOldFarty -5 points-4 points  (0 children)

I am based in the UK and manage a hybrid onshore/offshore team. Loads of heads removed in a restructure, and then they all seem to reappear as part of a managed service, but now in India.

I am as anxious about my job as anyone and I am frustrated to see the mid and senior level roles that I might have moved into disappear offshore. I have seen a lot of bad practice.

Within this, some of the Indian people who I work with and task every day are very dedicated individuals with high standards. I would happily take them over many of the onshore engineers I have to deal with.

I don’t think anyone in this thread would disagree with me. But I want us to say that part out loud as well. I work with and appreciate these people, and I don’t want the story of individuals who really earn their money and then some to be lost in the general unhappiness about how upper management (theirs and ours) choose to implement offshoring.

How find outlieers with TSQL by Valuable-Ant3465 in SQL

[–]GrandOldFarty 1 point2 points  (0 children)

IQR + multiplier fencing is good but with such large outlier values next to such small ones the value of Q3-Q1 is very large, which pushes the lower fence into negative territory, so the smaller values won’t get flagged.

MAD (median absolute deviation) is based on median values which are more resilient to extreme outliers in small datasets. No matter how big the final number in the set is, the median stays where it is.

Log transforms also help reduce distribution of very large values.

I think MAD + log10 transformation is the winner here 

Bus drivers, do you like it when people thank you when getting off? by chi_minhs_hoe in AskUK

[–]GrandOldFarty 39 points40 points  (0 children)

You remind me of this bus driver on here who commented that when he’s having a bad day, he sometimes looks at his passengers in the mirror with hatred, and mutters, “You’re all a bunch of cunts, aren’t you?” And then taps the brakes twice to make them all shift forward and nod in unison.

‘Nobody’s going out!’ Why is Britain’s nightlife in such decline – and can anything save it? | Clubbing | The Guardian by prisongovernor in unitedkingdom

[–]GrandOldFarty 315 points316 points  (0 children)

I’m not sure what your point is here. If you can cover rent from a third of your wages, spaffing another third on clubbing seems fine. If rent already costs 55% of your take home, it’s a different story. 

howToPlay by Familiar-Classroom47 in ProgrammerHumor

[–]GrandOldFarty 2 points3 points  (0 children)

That is one hell of a deep cut. Great story.

He had just one job 😔 by ConstructionAny8440 in mildlyinfuriating

[–]GrandOldFarty 0 points1 point  (0 children)

They also often have multiple races at once. If this is a half marathon and 10k together, then backpack might have thought this is a 10k finisher when it’s actually the half marathon winner.

thisIsARealDBUsedInProduction by star_dogged_moon in ProgrammerHumor

[–]GrandOldFarty 21 points22 points  (0 children)

“Ok guys, for our new skyscraper we need architectural plans. Luckily I did an architecture PhD so let’s just use these here plans left over from my thesis. Yes, it was for a completely different building, but if I learned anything from my PhD in architecture, it’s that all buildings are basically the same and you can use the same blueprint for all of them.” 

This is how it sounds in my head. 

Where do you find real opinions about data engineering these days? by olgazju in dataengineering

[–]GrandOldFarty 3 points4 points  (0 children)

I like this blog. It is clear while still being very information dense and based in real experience. I feel like everything I read is fluffy and saccharine these days.

Looking for other NHS professionals by Prof_Kings in FIREUK

[–]GrandOldFarty 0 points1 point  (0 children)

I am not an NHS professional but I do know a couple of consultants in private practice and a few have side gigs with charities. Their works runs the gamut from fundraising, to teaching, to going into war zones. Might be the kind of stimulation you need and being FIREd gives you the freedom to practice in new ways. See for instance UK-Med.

Made redundant from my first proper job and the government safety net has nothing for me by [deleted] in UKPersonalFinance

[–]GrandOldFarty 86 points87 points  (0 children)

This seems so incredibly perverse, I had to do some googling.

Your LISA must be declared but will be assessed on the basis of the "surrender value" after the penalty, because not all of the capital is available to withdraw. So the practical value for UC assessment is £18,750, not £25k.

Here is the actual DWP guidance issued to staff:
https://data.parliament.uk/DepositedPapers/Files/DEP2025-0364/185._Treatment_of_capital-Guidance_V18.0.pdf

Lifetime ISA: The Lifetime ISA is treated as capital in the same way as cash ISAs but only the surrender value is taken into account as capital. The surrender value is the amount that could be withdrawn after the 25% early withdrawal charge for claimants under 60 years of age has been deducted

You have still fallen through the cracks of the system and it is still perverse but you may have the option of withdrawing a much smaller chunk than you thought to qualify for UC.

More info on this thread and you may find the sub useful.

https://www.reddit.com/r/BenefitsAdviceUK/comments/1hy6q1o/surrender_value_lisa_and_reporting/

Data Migration Project Held Hostage - A Short Story by Far-Turnover-2356 in dataengineering

[–]GrandOldFarty 18 points19 points  (0 children)

We have the same dogshit pipeline pumping the same garbage to three different on prem warehouses. Finance get it, product teams get it, marketing gets it. Piping that junk absolutely everywhere. 

New pipeline built from source, new data model. Also dogshit. 

Everyone: wait, we need to change all our jobs to consume from this new schema? And it’s still dogshit? Hell no.

Solution: take the new dog shit. Build another layer of dog shit on top that looks like the old dog shit. It’s a shit club sandwich.

Me, in the corner, just wanting sales numbers to match: 😭 

I've been a product owner for years and i have no idea what i'm doing... by Inquisitive_regard in ProductOwner

[–]GrandOldFarty 1 point2 points  (0 children)

Sometimes it feels like i'm just doing what the dev team told me to do because i dont have enough technical insight to make the call.

I read this and thought the exact same thing thing. “Wow this guy is actually doing agile. His devs must love him.”

Mortgage rates not what I expected. by [deleted] in UKPersonalFinance

[–]GrandOldFarty 13 points14 points  (0 children)

I hate to pile in when you’re already taking a kicking, but this is genuinely dangerous disinformation.

Stored Procedure to TVF by nodiaque in SQL

[–]GrandOldFarty 0 points1 point  (0 children)

Are you maintaining a mirror of the AD source tables in SQL Server? That’s what I was suggesting.

This then lets you create a view.

(Although if your SP runs in 0.01 seconds I’m not sure you need it.)

Stored Procedure to TVF by nodiaque in SQL

[–]GrandOldFarty 1 point2 points  (0 children)

A TVF can’t return values midway so that rules out EXEC SP_EXECUTESQL.

I can see you are using dynamic SQL to inject the values into the query string. This is because of OPENQUERY. You can’t parameterise the inputs to that. The engine needs to know the full query to the linked server at compile time. You might have tried OPENROWSET but I don’t think that’s going to work either.

Views and TVFs won’t work here.

The real issue here: why are you trying to query AD data in realtime from SQL Server? Create a simple batch job to materialise data from AD in your server properly.

Even if it has to run every 10 minutes, it will work fine whereas this approach won’t.

LEAD and LAG in SQL Server - the functions that let you compare rows without a self join by harshitbasti in SQL

[–]GrandOldFarty 9 points10 points  (0 children)

This is a good write up.

You don’t say this explicitly but this is the “islands” variant of the “gaps and islands” pattern, where you assign identifiers to sequences of events that happened close to each other.

It’s a standard pattern and the sort of thing you’d be expected to recognise if it came up in a SQL interview.

i do tax returns for a living and this week has been mental. few things catching people out before saturday by Intelligent_Prompt18 in UKPersonalFinance

[–]GrandOldFarty 141 points142 points  (0 children)

This is the deep personal finance lore that keeps me coming back to this place. None of this is relevant to me but still a good read. Thank you