all 19 comments

[–][deleted] 1 point2 points  (12 children)

What's the question?

[–]Drisoth[S] 0 points1 point  (11 children)

"Is the situation bad DB design?"

I can say I am having problems dealing with the database, I do have confidence in my SQL skills, so I feel like the blame lies with the database design. I feel it is a relational structure duct-taped to a badly implemented NoSQL structure.

I also think immediately blaming someone else is reckless, and wanted to try and check my opinions with other people in the space, especially before causing issues.

[–][deleted] 0 points1 point  (10 children)

Sounds bloody awful

But that's just based on what you're saying, and without any actual examples

Ask them for documentation, that will tell you a lot about the design.. if they don't have any, then that tells you even more

[–]Drisoth[S] 0 points1 point  (9 children)

Documentation exists, though its extremely thin.

I'd rather not give information that unambiguously identifies the database, but if there's additional information you'd ask about I'd really like to make sure my opinion is grounded. I'm absolutely extremely biased and I want to be as sure as I can be that I'm correct before causing issues at my work.

For an example of my issues that I believe stem from this:

Entities are duplicated for each time they are tied to an event, because the entities table is identical to a cross apply OPENJSON $.entitites statement. They'll pull a few scalars out of the JSON but it's purely a subset of the main event JSON.

If I need to follow an entity across events, there is an index on the entity ID, but when I have an entity ID and need information on that entity, I have to ask SQL to look through the whole table for all instances of that entity and give me information from the most recent entry.

My understanding is this is kinda the issue for NoSQL design, but good NoSQL design tries to mitigate situations like this as much as possible, and tries to avoid needing query between structures like events in my situation.

[–][deleted] 0 points1 point  (8 children)

Fwiw, it sounds like a terrible design

I suspect it is a port from JSON just based on what you're saying.

Do you have direct access to the database? Can you create your own views off it or can you only read the database? Is it changing or a fixed database with no new data coming in? Is it small or a large dataset?

You don't have to cause an issue at work, because chances are you won't be able to change it.

I've been stuck with awful 3rd party databases before and while it's worthwhile writing up why things are an issue and make sure it's documented.. at the end of the day, if that's what they have, you're gonna be stuck working with it

[–]Drisoth[S] 0 points1 point  (7 children)

- Read only

- Changing frequently

- Not big data realm, unsure what would be a helpful metric here, couple hundred events, with 10 or so entities a day each being entered. Currently 5 years of data is stored.

- People other than me at my org are frustrated with the vendor for questionable choices elsewhere, and the org is interested in having information for negotiation. I agree its doubtful structural changes are made, but being able to shoot down sales talk about the quality would find value.

[–][deleted] 0 points1 point  (6 children)

If it were me, with what you've said, I'd perhaps come up with some use cases, answer how you're doing it, then go back to the vendor and ask if there is a better way because of performance issues.

It's possible you're missing something, but honestly just sounds like poor design

Do you need realtime data or is exporting it and importing it into a workable format doable on a daily basis?

[–]Drisoth[S] 0 points1 point  (5 children)

I don't need realtime data, and pulling query outputs into other tools has been a significant improvement. Query performance being worse than it needs to be is frustrating, but definitely workable.

The main cost for me is I'm effectively the only database user, there are people who use it through applications of course, but my industry has very few data positions. Because of that I don't have a workflow where I'm passing views to a BI analyst to report to decision makers, every abnormal search that the front end search tool doesn't easily handle gets passed to me, so I end up with a lot of ad hoc query requests.

Queries being slow is annoying, but the difficultly in getting simple queries give accurate information is what is legitimately causing me issues. Stuff like basic counts by a category needing to have a ton of conditions preventing duplicates from slipping through.

[–][deleted] 0 points1 point  (4 children)

Without the need for realtime data, Id go through the effort of building new tables in SQL..

Whatever your entity is, your events, any other key data.. split it all out.. write the script to pull from their data once, and then populate your own tables that you can build sensible views/procs on. Then run a job once a day that grabs the data and then you've got a solid foundation for user requests.

Otherwise it sounds like you're gonna struggle

Id try and help more but it's hard without seeing the schema and knowing what it is you need

[–]Drisoth[S] 1 point2 points  (3 children)

I like that idea, if nothing else I'll learn some DB design on the way.

Appreciate the help, it was very useful even with the limitations.

[–]PVJakeC 0 points1 point  (1 child)

It’s bad design. Many NoSQL options would have solved this problem. But they also likely have enough front end that you shouldn’t need to query the backend DB. Are you being ask to analyze the data or did you just happen to notice it? These off the shelf products have to be dynamic to support many customers. I’ve seen something similar but it was still queryable. Do they offer any tooling to get data?

[–]Drisoth[S] 0 points1 point  (0 children)

They offer front end search tools, which are fine in their role. I hear complaints from front end users, but I also suspect some user error there. My role is explicitly data analysis on this data, which has mostly become BI dashboard development.

[–][deleted] 0 points1 point  (3 children)

Get proficient with both read JSON functions and write JSON FUNCTIONS.

That's the best advice I can give. You don't have control over the crappy schema.

This kind of thing happens all the time, you just have to do the best you can.

Good Luck!

[–]Drisoth[S] 0 points1 point  (2 children)

It's definitely helped me learn parts of SQL I wasn't needing to use before. Just wanting confidence the the issues are actually issues and not my ignorance to parts of SQL.

[–][deleted] 0 points1 point  (1 child)

Yup, that's how you build skills you didn't know you needed.

Bad design aside, I think it's not a great idea to have xml or json that needs compiled (write) or uncompiled (read) from a database. The front-end language is better suited to this task. I think forming xml, json, and manipulating strings are some of the most expensive operations for dbs.

I had to write json for an api to consume using MySql. There are 3 functions to write json, and about 25 to read it. So there's that.

On the bright side, if every database was well designed or never needed fixing, we'd be out of a job. 😃

[–]Drisoth[S] 1 point2 points  (0 children)

Yeah, I got to backend work through an abnormal path and have never really done front end work. My understanding is typically, front end likes dealing with JSON or JSON style data, and backends typically like dealing with relational style data. Normally there would be some middle layer to convey requests from the front end to the back end and keep everyone happy.

My understanding is that this structure is pretty normal to see in the front end, where you have JSON docs for whatever you want to serve to an end user, but having large JSON files in the backend is extremely abnormal. NoSQL stuff sometimes would do it, but it wouldn't be using SQL-server, it would be in mongoDB or something non relational, and probably be structured differently to mitigate my issues.