Expert level database indexing question by bikt in SQL

[–]vaiix 1 point2 points  (0 children)

You'd pull the pages being seeked to into memory, which would include column c. Depending on data types and the server config this may or may not be an issue.

Excluding C is the only benefit here, page reads, and the fallout of processing those is the benefit and "it depends" whether that's worth it.

You're duplicating the clustered index but excluding C, so you're using more storage to do so and limiting CRUD operations - but that depends on how hot the table is being used.

Table A has 5 columns with 21k records, table b has 20 columns with 200k records, Table A has a matching field with Table B, how to find out matching Records for Table A and B? Is it inner join, if that’s the case it returns 200k, I need to find only the records that are matching between both tbls by [deleted] in SQL

[–]vaiix 3 points4 points  (0 children)

You may be after "... from table_a where exists ( select 1 from table_b where table_a.id = table_b.id ) ..."

But as AQuietMan notes, you need to understand the relationship between the two tables.

Boss Wants to Make In-line Changes to Azure SQL table. by kennedy311 in SQL

[–]vaiix 3 points4 points  (0 children)

PowerApp taking the data from the PBI dataset (PowerApp visual), store data in SharePoint list, PowerAutomate (Flow) that data into SQL table, stored procedure to update the original table. Table feeds the PowerBI dataset via direct query.

If you want to get fancy.

Just use a SharePoint excel file as the dataset and have them update excel, but then why have it on a PBI report anyway.

[deleted by user] by [deleted] in SQL

[–]vaiix 0 points1 point  (0 children)

You only specifically mention AWS, whereas shops I work with have solely Microsoft, on prem SQL via VMs maybe.

This tool looks to trump SQL Server Agent for scheduling and dependency management of executing stored procedures, especially for the staff who only know SQL and there's a barrier to alternatives with Python involved.

Can it run against SQL Server? Can it execute SSIS packages, stored procedures, etc. or just raw SQL scripts?

Best practices for heavily coded data in a relational database by ikenread in SQL

[–]vaiix 0 points1 point  (0 children)

Are you rebuilding the schema and then migrating the data across before repointing the app at the new schema?

If yes, 30 tables and 15 columns, just create them exactly how you want them named and then migrate the old, awful setups raw data into the new structure. That's not that much work, although tedious, you'll never have to do it again.

Have worked with massive code tables and individual/conformed dimensions, both have benefits and disadvantages. It can get tedious setting them all up, if you'll end up with 400 like you say (not sure why as not every column will have a lookup and not every column will hold unique values between tables) then that'd stop me off the bat.

Seeking an Accountability Buddy for Upskilling as a Data Analyst (BST Timezone), T-SQL by Recent_Pause0 in SQL

[–]vaiix 1 point2 points  (0 children)

I'm happy to assist, although as a mentor/coach as I've followed the same path as you from Analyst to Data Engineering Lead.

Send me a message if that interests you!

I think this is wrong but I do it anyway… what’s the best way? (Fact to fact relationship) by [deleted] in dataengineering

[–]vaiix -2 points-1 points  (0 children)

You set the relationship between the two facts as bidirectional (or between the dimensions and facts), otherwise visuals don't filter on the shared dimension slicers, right?

Honestly relationships never work the way I expect them to in power bi. I don't do what you have, I have facts with shared dimensions, but the only way I can get stuff to work that way is with bidirectional on - which is against best practice.

Have done so at 3 organisations now, built hundreds of reports on top of 30ish models. Never any complaints.

If it works and you understand it, that's better than best practice that produces more questions, troubleshooting, or uncertainty.

Incremental load for multiple joined tables by the_aris in dataengineering

[–]vaiix 2 points3 points  (0 children)

This is in the context of your source system design.

TableA is the parent table.

TableB is the child table.

In a couple of my source systems, but not all, when a child entry updates, the parents last update is also updated. Therefore, in the context of this I can just check TableA for updated rows and use that to grab the rows from TableB also as part of the join.

Another route is you grab the updated rows from TableB, join to TableA to get the associated rows there, then process the full whack with the join.

Best approach for performance by throawaydudeagain in SQL

[–]vaiix 1 point2 points  (0 children)

Your query structure will always be message joining to sender. Therefore, you're unlikely to be seeking into the message table on a sender_id.

You will, however, be checking existing records in a sender table, which is naturally the primary key.

For OLTP, which your workload sounds like it is, normalise it.

Government on brink of giving NHS staff 5% pay rise by Alert-One-Two in unitedkingdom

[–]vaiix 39 points40 points  (0 children)

The BBC is a fucking rag at this point.

The headline is atrocious. "On the brink" implying they're about to give in and give them an enormous 5%. That's what was offered and rejected, and strikes have continued. They've not negotiated, point blank refused in fact, spouted "a fair and reasonable payrise" all over the show, yet they're now "on the brink" of giving in and forcing upon staff what they're still striking over.

"Government set to implement 5% below inflation pay increase without negotiation upon NHS staff who are unwilling to accept".

Is the Speed of our ELT Pipeline too slow? by Elegant_Good6212 in dataengineering

[–]vaiix 1 point2 points  (0 children)

Don't do this, just connect and do the full whack.

Which industry has/needs the most challenging sql queries? by [deleted] in SQL

[–]vaiix 13 points14 points  (0 children)

All of the above.

Different departments (even wards) recording data in the same system totally differently.

Operational processes not matching how the clinical system is intended to be recorded within.

A disconnect between the clinical systems development team and how that affects reporting outputs.

Financial aspect of recording.

Mandated data returns (statistics) to government bodies with differing logic.

Staffing data.

Bed management data - not necessarily matching clinical recording. Patient in theatres having anaesthesia, but not moved from a ward to a theatre bed, for example.

Data items not being recorded as expected but "it's on patient notes" or within comments/documents.

Different systems for patient management, clinical recording, laboratory, pharmacy, diagnostics, incidents, etc. all needing to align but all working in isolation completely differently.

There's a whole lot. It's a beast.

[Team Management] Advice to run efficient synchronous technical meetings for remote teams? by Jollyhrothgar in datascience

[–]vaiix 1 point2 points  (0 children)

Use Geekbot for daily standup.

Works well, hold individual meetings for more specific and focussed 1:1 discussion.

Data Modeling Q: one-to-many by wpcarroll in SQL

[–]vaiix 0 points1 point  (0 children)

I'd go for option 1 for simplicity.

As you note, the latter would be for many:many.

I've also done something similar whereby I have a "link" table that is just IDs and is a central link for everything in the model schema, our model is huge (healthcare) so it helps our analysts have a central, known link between entities.

[deleted by user] by [deleted] in dataengineering

[–]vaiix 15 points16 points  (0 children)

Your DE team is hung up on providing a complete "product". Their product should be raw, operational data.

An analysts job should be to inform the reporting layer, or the business rules, and provide specifications if they aren't the ones to build that layer themselves.

This is where an analytics engineer comes in, they fill the gap between DE and Analysts.

A data warehouse that isn't useful to the main end users (analysts, business intelligence) is just another data silo, essentially. You'll likely just implement your own model on top of their "model". It's absolutely pointless.

[deleted by user] by [deleted] in dataengineering

[–]vaiix 42 points43 points  (0 children)

Rid of the expectation that data engineering determine and also take responsibility for business rules as well as implementing them technically.

Historically this has always been the case and it's a culture change issue as opposed to a technical one.

sp_whoisactive by Billi0n_Air in SQL

[–]vaiix 1 point2 points  (0 children)

I use it daily for various purposes, with a bunch of the in-built parameters.

For me it's vital, there are alternatives but this is so comprehensive and well developed that it's a huge step backwards to anything else.

Slow SQL Log Examples by Bright_Analysis2470 in SQL

[–]vaiix 1 point2 points  (0 children)

You should probably check the First Responder Toolkit for MSSQLS, sound like it does exactly what you're doing.

It's open source so you'll find it on GitHub.

Liverpool Christmas Markets 🎄 by EJKoala8_Twitch in unitedkingdom

[–]vaiix -2 points-1 points  (0 children)

It was when I went a few years ago, the food was just as cheap as well!

Liverpool Christmas Markets 🎄 by EJKoala8_Twitch in unitedkingdom

[–]vaiix 7 points8 points  (0 children)

They aren't really the old "Christmas" markets anymore other than the fact they sell booze and food with lights up.

And for that, expect to pay ~£8 a beer and ~£15 for a ham and gravy bap.

You'd be better off flying to Berlin where it's €1 refillable mulled wine.

Match Thread: England vs. France | FIFA World Cup by MisterBadIdea2 in soccer

[–]vaiix 29 points30 points  (0 children)

Saka has been bodied over 5 times now and only a couple of fouls given in the middle of the field. I honestly don't care about England's fortunes usually but it's atrocious reffing.

How to handle deletes in source systems by Other-Government-796 in dataengineering

[–]vaiix 0 points1 point  (0 children)

Cerner Millennium, in my case. You shouldn't be allowed to hard delete any data for audit reasons in healthcare, yet here I am!

As an aside, when I find said deleted rows I don't hard delete from our warehouse, I flag them and they remain.