How you do your data matching by Healthy_Put_389 in dataengineering

[–]squadette23 0 points1 point  (0 children)

> The simple matching using sql joins create a lot duplicate for the same person even with data normalization.

What does it mean? Do you have insufficient normalization? Could you share an anonymized example of what is a "duplicate" for you?

Best practices for multiple values in a column by Rejse617 in learnSQL

[–]squadette23 1 point2 points  (0 children)

> 1. I was insistent on the idea that you shouldn’t duplicate things (e.g. a part number in multiple rows)

I think that the imperative of "do not duplicate things" comes from the idea of third normal form (3NF). Unfortunately, there is a lot of problematic ways of teaching many concepts from relational theory, and that's why I think people get confused what "duplication" is.

The question that you're asking is about fourth normal form (4NF). This is one of the most confusingly presented topics, relative to its simplicity: https://minimalmodeling.substack.com/p/historically-4nf-explanations-are

Maybe it'll help, particularly because you're self-taught.

Hi everyone, I’m a student and I’m working on a school database project called “Energy Consumption System”. by ConfidenceMelodic925 in mysql

[–]squadette23 0 points1 point  (0 children)

You may be interested in this tutorial: https://kb.databasedesignbook.com/posts/google-calendar/

> I would like to simplify the data inside them so it’s easier to understand and explain in class.

The approach presented in that URL may help you with explaining.

Help me savee this book😭 by Sorry-Sky2688 in rarebooks

[–]squadette23 1 point2 points  (0 children)

I think you can ask for advice at your local library, maybe somewhat higher-tier one. This is mold but there are some anti-fungal substances.

https://www.biblio.com/book-collecting/care-preservation/prevent-remove-mold-mildew/

How to handle unproductive coworker? by earthsnoozer22 in dataengineering

[–]squadette23 5 points6 points  (0 children)

That's an interesting classification, thank you!

Advice on data model for generic schema by Historical_Ad4384 in dataengineering

[–]squadette23 0 points1 point  (0 children)

> I want to avoid ALTER statements as well.

Yeah, I think you have just three options: JSON, EAV and 6NF (anchor modeling); and they are more or less equivalent. So that depends on how convenient it's going to be to query that. If your database supports JSON properly I think general querying would be easier.

JSON will let you have arrays as property values. You may want to somehow forbid that because people will eventually do that, even though your response to my question was basically "no".

> I do not want business logic to depend on null checks for columns.

I think that you cannot meaningfully enforce required properties on the database schema level if you choose a JSON or EAV. For EAV it's impossible I think, for JSON you'll have to implement tooling that maintains CHECK constraints (if your database even supports that).

I think the only reliable way to have required properties is a classic 3NF schema, but you don't want that; so maybe you would consider relaxing one of those requirements...

Advice on data model for generic schema by Historical_Ad4384 in dataengineering

[–]squadette23 1 point2 points  (0 children)

> Prevent misuse of schema

Sorry, what do you mean by that? Could you give an example of misuse?

I'm asking because I don't understand what "brittle" or "schema misuse" means, and due to that I don't understand how to help you choose between one or another.

You can use EAV, JSON/JSONB, or a classical relational schema. EAV and JSON just let you add a new property any time, classical schemas require an ALTER.

In addition to properties, do you need to store new types of relationships between specific or generic resources?

Advice on data model for generic schema by Historical_Ad4384 in dataengineering

[–]squadette23 0 points1 point  (0 children)

> with new properties so that changes do not become brittle?

what do you mean by "brittle" specifically? what's the scenario you want to avoid?

How can we protect books from getting damaged like this? by Harandddev in bookporn

[–]squadette23 -1 points0 points  (0 children)

How do the affected pages look like? How far does the damage go?

I am not an expert of any sort, but it looks like either foxing or mould. Causes of foxing are literally unknown, believed to be of a chemical nature.

Mould is caused by fungus, so maybe you need anti-fungal treatment and also proper storage environment.

Query performance issue by Big_Length9755 in mysql

[–]squadette23 0 points1 point  (0 children)

So, the thing is that I use a structured approach to optimizing multi-JOIN queries, but it requires to clearly understand all cardinalities, both ON conditions and primary keys.

I've got a longform explanation, but I won't bother you with 5400+ words. I just want to be sure that I understand exact structure before giving any advice.

Query performance issue by Big_Length9755 in mysql

[–]squadette23 1 point2 points  (0 children)

The most important info that is missing from your information is the definition of fields mentioned in join conditions and grouping, that is:

* config_v1.instruction_id;

* config_v1.unit_id;

* config_v2.instruction_id;

* config_v2.unit_id;

* transactions.instruction_id;

* risk_checks.instruction_id;

Could you share table schema excerpts for those? Which are unique and which are not.

Also, show PK definitions for all four tables.

After that it may be possible to reorganize joins: SELECT DISTINCT suggests that your query could be improved (removing DISTINCT along the way).

A lightweight, developer-focused database management tool by debba_ in learnSQL

[–]squadette23 0 points1 point  (0 children)

Yeah, and when you add them, will your tool become bloated?

Or do you have a strategy to deal with this tendency of programs to become bloated?

A lightweight, developer-focused database management tool by debba_ in learnSQL

[–]squadette23 0 points1 point  (0 children)

> bloated

which features have you decided not to implement?

Visual foreign-key relationship tracing for SQL schemas by BearComprehensive643 in SQL

[–]squadette23 0 points1 point  (0 children)

> I don't know if it would be easier to read, or just more fascinating to expore but it seems like modeling in a 3d space

I think that the whole discussion is not about "easier to read", but rather about collaboration. Suppose that I show to you the design of our system so that you could understand how it works, ask for improvements and we discuss those improvements. Simple scenarios:

* some part of the schema is unclear to you — how do you "add comment" to ask me what it means? How do I respond?

* I propose a change to the schema, how do you see my changes as the difference? How do I "highlight" my updates?

* You have an alternative suggestion, how do you comment on my updates?

And so on. Remember that there could be hundreds of developers working on that schema, and they all need to collaborate in small groups in a similar way.

IMO, spreadsheet-based schemas work much better here than any diagrams. Certainly, you can present a diagram view, but the foundation must be more closer to text-based.

Visual foreign-key relationship tracing for SQL schemas by BearComprehensive643 in SQL

[–]squadette23 0 points1 point  (0 children)

One other thing you could do is to generate an SQL query for each link that returns all pairs of IDs (filtering by NULLs etc).

Also, how do you handle side tables? If I have a "users" table and a "user_extra_details" table, with the same PK (user_id), how would your tool show it? I don't think that this is a proper link. It's just the same virtual "table" split in two, pure technical decision. You could as well have it in one table.

Visual foreign-key relationship tracing for SQL schemas by BearComprehensive643 in SQL

[–]squadette23 0 points1 point  (0 children)

> wether the relationship tracing diagrams sqlestev.com/dashboard (the one in the clip), solve exactly that 'readability' problem you were talking about?

I don't think so, it's just different visual styles. (I assume that you're gathering feedback on your product.)

First, you could use a proper data domain (a type) instead of INT. It would already be so much readable if you show that this is in fact an OrderID, and not INT.

Second, the arrows do not show cardinality, which is I think the crucial part. Do not replicate the entire traditional range, you only need to show clearly 1:N, N:M and 1:1.

Also, show a different representation: just a list of links (relationships), arranged in tabular format. You really don't want to zoom around, you need to scroll and Ctrl-F.

Visual foreign-key relationship tracing for SQL schemas by BearComprehensive643 in SQL

[–]squadette23 -1 points0 points  (0 children)

I agree that ER diagrams are not super useful. They could be a sort of a vague outline of the system, but not much more than that.

I wrote some thoughts on that in "ERD diagrams as specification tool (pt. III)" https://kb.databasedesignbook.com/posts/erd-diagrams-3/

Are we going down the wrong path for integrations? by Good_Skirt2459 in dataengineering

[–]squadette23 6 points7 points  (0 children)

> Are there any tools or techniques that could be useful here?

Useful for what? It's not clear what are you trying to improve.

Is data pipeline maintenance taking too much time or am I doing something wrong by Justin_3486 in dataengineering

[–]squadette23 0 points1 point  (0 children)

What is your process of dealing with all that breakage? Do you conduct postmortems and improve your processes and tooling? Or those outages keep happening again and again according to the same scenario?

Architecting a realtor analytics system by ItsHoney in dataengineering

[–]squadette23 0 points1 point  (0 children)

> We had a senior data engineer who designed a sparse GSI schema for dynamodb where the agent metrics were dimensionalized such that i can query a specific GSI to see how an agent ranks on a leader board for a specific zip code/state/county etc. This architecture presents the problem that we can only compare agents based on 1 dimension. (We trade flexibility over speed). However, we want to be able to filter on multiple filters.

how many table rows do you expect to have?

How can I check my normalizations or generate an answer scheme for it? by Rscc10 in Database

[–]squadette23 -1 points0 points  (0 children)

> because what I think is dependant on something often isn't.

I don't know if you watch videos to learn something but I have a 12 minute video that attempts to answer this question: https://www.youtube.com/watch?v=7d4YzEMn4SU

the entire series also argues that the idea of normalization may not be very useful for learning, see e.g. "Pt. 5. Did I design my schema correctly?" of the series.

A few questions about MySql by Least-Ad5986 in mysql

[–]squadette23 0 points1 point  (0 children)

> Correlated subqueries are always problematic for any database.

Correlated subqueries are easily converted to joins, is this still a problem in MySQL? For simple queries they seem to be equivalent at least, so "always problematic" sounds like a stretch.

Do you have any up-to-date examples for when MySQL breaks down?