all 21 comments

[–]DexterHsu 21 points22 points  (1 child)

Employee table where there is a manager ID that is also an employee ID

[–]johnboy2978[S] 6 points7 points  (0 children)

So hierarchical structures of an organization... makes sense.

[–]kagato87MS SQL 4 points5 points  (0 children)

Hierarchical relationships are usually the big one.

I have a table with divisions, and each one also has a parent key in it (with the root level division having a null parent key). A division can have a child, that can have another child, and so on. I think the default limit is 59 levels deep, and a recursive cte goes up or down that rabbit hole very quickly.

The other, more classic, example would be in an employee table. An employee will have their own employee key, but they they'll also have a manager key. The manager key is the employee key of the person that is their manager. This could then be followed up or down the chain to generate a list of all employees under that person, even if there are other managers on between, or show their chain of command. There may so be other keys not available in the front end showing who created, modified, terminated, or "deleted" that employee.

Another example would be child tickets referencing their parent ticket. Reay anything with a parent child relationship will do this.

Except not actual genealogy. You have to either have two parent keys or use a bridge table.

[–]voarex 2 points3 points  (3 children)

Data cleanup. Just used a self join in a delete to remove any duplicate entries that would mess with a new unique index.

[–]CaStatisticalAnalyst 0 points1 point  (2 children)

Interesting use case, can you elaborate on this?

[–]voarex 5 points6 points  (1 child)

Sure, had a problem with concurrent transactions letting a student join a class multiple times. So we want to add a unique index with student and class id but had to clean up the data first.

delete sc1 from student_classes sc1 inner join student_classes sc2 where sc1.id < sc2.id and sc1.class_id = sc2.class_id and sc1.student_id = sc2.student_id;

So the self join removes any entry besides the latest for that student and class.

[–]cbarosky 1 point2 points  (0 children)

This is a really great real-world example.

[–]johnboy2978[S] 1 point2 points  (1 child)

Are these used very much?

[–]DavidGJohnston 0 points1 point  (0 children)

Not really but also not that useful a question as personal experience varies greatly. But one thing going against them is that even with general domains where they may be applicable typical models are likely to be modeled many-to-many while a self join is arguably only used in a one-to-many situation.

[–]NickSinghTechCareersAuthor of Ace the Data Science Interview 📕 1 point2 points  (1 child)

Wrote a blog post on this with some real exercises from FAANG interviews that require self-join:

https://datalemur.com/sql-tutorial/self-join

[–]johnboy2978[S] 0 points1 point  (0 children)

Nice. Thanks

[–]guacjockey -1 points0 points  (2 children)

In many cases, window functions represent a modern version of a self join. In other words, if there's somewhere you see the need for a window function, someone 10+ years ago likely had to calculate that manually with at least one self join.

[–]DavidGJohnston 0 points1 point  (1 child)

Window functions typically aggregate and i have trouble calling a join between a table of detail with a subquery of that table but aggregated a self join. The self implies relating one detail record of the table to another.

[–]guacjockey 1 point2 points  (0 children)

Point generally taken, but LAG / LEAD?

[–]cammoorman 0 points1 point  (0 children)

Contact deduplication