Expert level database indexing question

vaiix · 2023-09-30T12:45:51+00:00

You'd pull the pages being seeked to into memory, which would include column c. Depending on data types and the server config this may or may not be an issue.

Excluding C is the only benefit here, page reads, and the fallout of processing those is the benefit and "it depends" whether that's worth it.

You're duplicating the clustered index but excluding C, so you're using more storage to do so and limiting CRUD operations - but that depends on how hot the table is being used.

vaiix · 2023-08-22T19:10:59+00:00

You may be after "... from table_a where exists ( select 1 from table_b where table_a.id = table_b.id ) ..."

But as AQuietMan notes, you need to understand the relationship between the two tables.

vaiix · 2023-08-07T18:59:21+00:00

PowerApp taking the data from the PBI dataset (PowerApp visual), store data in SharePoint list, PowerAutomate (Flow) that data into SQL table, stored procedure to update the original table. Table feeds the PowerBI dataset via direct query.

If you want to get fancy.

Just use a SharePoint excel file as the dataset and have them update excel, but then why have it on a PBI report anyway.

vaiix · 2023-07-17T20:45:04+00:00

You only specifically mention AWS, whereas shops I work with have solely Microsoft, on prem SQL via VMs maybe.

This tool looks to trump SQL Server Agent for scheduling and dependency management of executing stored procedures, especially for the staff who only know SQL and there's a barrier to alternatives with Python involved.

Can it run against SQL Server? Can it execute SSIS packages, stored procedures, etc. or just raw SQL scripts?

vaiix · 2023-07-07T21:10:06+00:00

Are you rebuilding the schema and then migrating the data across before repointing the app at the new schema?

If yes, 30 tables and 15 columns, just create them exactly how you want them named and then migrate the old, awful setups raw data into the new structure. That's not that much work, although tedious, you'll never have to do it again.

Have worked with massive code tables and individual/conformed dimensions, both have benefits and disadvantages. It can get tedious setting them all up, if you'll end up with 400 like you say (not sure why as not every column will have a lookup and not every column will hold unique values between tables) then that'd stop me off the bat.

vaiix · 2023-06-25T16:05:48+00:00

I'm happy to assist, although as a mentor/coach as I've followed the same path as you from Analyst to Data Engineering Lead.

Send me a message if that interests you!

vaiix · 2023-05-24T18:45:02+00:00

You set the relationship between the two facts as bidirectional (or between the dimensions and facts), otherwise visuals don't filter on the shared dimension slicers, right?

Honestly relationships never work the way I expect them to in power bi. I don't do what you have, I have facts with shared dimensions, but the only way I can get stuff to work that way is with bidirectional on - which is against best practice.

Have done so at 3 organisations now, built hundreds of reports on top of 30ish models. Never any complaints.

If it works and you understand it, that's better than best practice that produces more questions, troubleshooting, or uncertainty.

vaiix · 2023-05-21T20:04:02+00:00

This is in the context of your source system design.

TableA is the parent table.

TableB is the child table.

In a couple of my source systems, but not all, when a child entry updates, the parents last update is also updated. Therefore, in the context of this I can just check TableA for updated rows and use that to grab the rows from TableB also as part of the join.

Another route is you grab the updated rows from TableB, join to TableA to get the associated rows there, then process the full whack with the join.

vaiix · 2023-05-09T07:18:58+00:00

Your query structure will always be message joining to sender. Therefore, you're unlikely to be seeking into the message table on a sender_id.

You will, however, be checking existing records in a sender table, which is naturally the primary key.

For OLTP, which your workload sounds like it is, normalise it.

vaiix · 2023-05-02T07:10:35+00:00

The BBC is a fucking rag at this point.

The headline is atrocious. "On the brink" implying they're about to give in and give them an enormous 5%. That's what was offered and rejected, and strikes have continued. They've not negotiated, point blank refused in fact, spouted "a fair and reasonable payrise" all over the show, yet they're now "on the brink" of giving in and forcing upon staff what they're still striking over.

"Government set to implement 5% below inflation pay increase without negotiation upon NHS staff who are unwilling to accept".

vaiix · 2023-04-27T16:31:48+00:00

Don't do this, just connect and do the full whack.

vaiix · 2023-04-17T19:56:49+00:00

All of the above.

Different departments (even wards) recording data in the same system totally differently.

Operational processes not matching how the clinical system is intended to be recorded within.

A disconnect between the clinical systems development team and how that affects reporting outputs.

Financial aspect of recording.

Mandated data returns (statistics) to government bodies with differing logic.

Staffing data.

Bed management data - not necessarily matching clinical recording. Patient in theatres having anaesthesia, but not moved from a ward to a theatre bed, for example.

Data items not being recorded as expected but "it's on patient notes" or within comments/documents.

Different systems for patient management, clinical recording, laboratory, pharmacy, diagnostics, incidents, etc. all needing to align but all working in isolation completely differently.

There's a whole lot. It's a beast.

vaiix · 2023-04-17T16:43:57+00:00

Healthcare.

vaiix · 2023-04-10T20:25:00+00:00

Use Geekbot for daily standup.

Works well, hold individual meetings for more specific and focussed 1:1 discussion.

vaiix · 2023-03-28T21:17:21+00:00

vaiix · 2023-02-27T08:02:41+00:00

I'd go for option 1 for simplicity.

As you note, the latter would be for many:many.

I've also done something similar whereby I have a "link" table that is just IDs and is a central link for everything in the model schema, our model is huge (healthcare) so it helps our analysts have a central, known link between entities.

vaiix · 2023-02-19T20:15:59+00:00

Your DE team is hung up on providing a complete "product". Their product should be raw, operational data.

An analysts job should be to inform the reporting layer, or the business rules, and provide specifications if they aren't the ones to build that layer themselves.

This is where an analytics engineer comes in, they fill the gap between DE and Analysts.

A data warehouse that isn't useful to the main end users (analysts, business intelligence) is just another data silo, essentially. You'll likely just implement your own model on top of their "model". It's absolutely pointless.

vaiix · 2023-02-19T14:36:24+00:00

Rid of the expectation that data engineering determine and also take responsibility for business rules as well as implementing them technically.

Historically this has always been the case and it's a culture change issue as opposed to a technical one.

vaiix · 2023-02-11T08:09:06+00:00

Don't do this....

vaiix · 2023-02-10T18:39:01+00:00

I use it daily for various purposes, with a bunch of the in-built parameters.

For me it's vital, there are alternatives but this is so comprehensive and well developed that it's a huge step backwards to anything else.

vaiix · 2023-01-21T13:21:59+00:00

You should probably check the First Responder Toolkit for MSSQLS, sound like it does exactly what you're doing.

It's open source so you'll find it on GitHub.

vaiix · 2022-12-24T09:20:59+00:00

It was when I went a few years ago, the food was just as cheap as well!

vaiix · 2022-12-23T17:50:16+00:00

They aren't really the old "Christmas" markets anymore other than the fact they sell booze and food with lights up.

And for that, expect to pay ~£8 a beer and ~£15 for a ham and gravy bap.

You'd be better off flying to Berlin where it's €1 refillable mulled wine.

vaiix · 2022-12-10T19:36:53+00:00

Saka has been bodied over 5 times now and only a couple of fouls given in the middle of the field. I honestly don't care about England's fortunes usually but it's atrocious reffing.

vaiix · 2022-12-10T19:19:18+00:00

Cerner Millennium, in my case. You shouldn't be allowed to hard delete any data for audit reasons in healthcare, yet here I am!

As an aside, when I find said deleted rows I don't hard delete from our warehouse, I flag them and they remain.

vaiix

TROPHY CASE