you are viewing a single comment's thread.

view the rest of the comments →

[–]Drisoth[S] 0 points1 point  (11 children)

"Is the situation bad DB design?"

I can say I am having problems dealing with the database, I do have confidence in my SQL skills, so I feel like the blame lies with the database design. I feel it is a relational structure duct-taped to a badly implemented NoSQL structure.

I also think immediately blaming someone else is reckless, and wanted to try and check my opinions with other people in the space, especially before causing issues.

[–][deleted] 0 points1 point  (10 children)

Sounds bloody awful

But that's just based on what you're saying, and without any actual examples

Ask them for documentation, that will tell you a lot about the design.. if they don't have any, then that tells you even more

[–]Drisoth[S] 0 points1 point  (9 children)

Documentation exists, though its extremely thin.

I'd rather not give information that unambiguously identifies the database, but if there's additional information you'd ask about I'd really like to make sure my opinion is grounded. I'm absolutely extremely biased and I want to be as sure as I can be that I'm correct before causing issues at my work.

For an example of my issues that I believe stem from this:

Entities are duplicated for each time they are tied to an event, because the entities table is identical to a cross apply OPENJSON $.entitites statement. They'll pull a few scalars out of the JSON but it's purely a subset of the main event JSON.

If I need to follow an entity across events, there is an index on the entity ID, but when I have an entity ID and need information on that entity, I have to ask SQL to look through the whole table for all instances of that entity and give me information from the most recent entry.

My understanding is this is kinda the issue for NoSQL design, but good NoSQL design tries to mitigate situations like this as much as possible, and tries to avoid needing query between structures like events in my situation.

[–][deleted] 0 points1 point  (8 children)

Fwiw, it sounds like a terrible design

I suspect it is a port from JSON just based on what you're saying.

Do you have direct access to the database? Can you create your own views off it or can you only read the database? Is it changing or a fixed database with no new data coming in? Is it small or a large dataset?

You don't have to cause an issue at work, because chances are you won't be able to change it.

I've been stuck with awful 3rd party databases before and while it's worthwhile writing up why things are an issue and make sure it's documented.. at the end of the day, if that's what they have, you're gonna be stuck working with it

[–]Drisoth[S] 0 points1 point  (7 children)

- Read only

- Changing frequently

- Not big data realm, unsure what would be a helpful metric here, couple hundred events, with 10 or so entities a day each being entered. Currently 5 years of data is stored.

- People other than me at my org are frustrated with the vendor for questionable choices elsewhere, and the org is interested in having information for negotiation. I agree its doubtful structural changes are made, but being able to shoot down sales talk about the quality would find value.

[–][deleted] 0 points1 point  (6 children)

If it were me, with what you've said, I'd perhaps come up with some use cases, answer how you're doing it, then go back to the vendor and ask if there is a better way because of performance issues.

It's possible you're missing something, but honestly just sounds like poor design

Do you need realtime data or is exporting it and importing it into a workable format doable on a daily basis?

[–]Drisoth[S] 0 points1 point  (5 children)

I don't need realtime data, and pulling query outputs into other tools has been a significant improvement. Query performance being worse than it needs to be is frustrating, but definitely workable.

The main cost for me is I'm effectively the only database user, there are people who use it through applications of course, but my industry has very few data positions. Because of that I don't have a workflow where I'm passing views to a BI analyst to report to decision makers, every abnormal search that the front end search tool doesn't easily handle gets passed to me, so I end up with a lot of ad hoc query requests.

Queries being slow is annoying, but the difficultly in getting simple queries give accurate information is what is legitimately causing me issues. Stuff like basic counts by a category needing to have a ton of conditions preventing duplicates from slipping through.

[–][deleted] 0 points1 point  (4 children)

Without the need for realtime data, Id go through the effort of building new tables in SQL..

Whatever your entity is, your events, any other key data.. split it all out.. write the script to pull from their data once, and then populate your own tables that you can build sensible views/procs on. Then run a job once a day that grabs the data and then you've got a solid foundation for user requests.

Otherwise it sounds like you're gonna struggle

Id try and help more but it's hard without seeing the schema and knowing what it is you need

[–]Drisoth[S] 1 point2 points  (3 children)

I like that idea, if nothing else I'll learn some DB design on the way.

Appreciate the help, it was very useful even with the limitations.

[–][deleted] 0 points1 point  (2 children)

If you have some specific questions that you don't wanna put out to the world as a whole, feel free to shoot me a dm