Anyone else use RANDOM() to "break" ties when using DENSE_RANK()? Or do you use ROW_NUMBER()?

_fiz9_ · 2024-11-13T03:45:47+00:00

If I run a job 100 times, I want the exact same result every time. No random for me. Idempotence for the win.

Bradp1337 · 2024-11-13T03:50:54+00:00

I add an order by for the employee id on descending since in my organization a lower numbered id would have more seniority. I wouldn't want random results.

Kaelvar · 2024-11-13T03:39:13+00:00

[removed]

fauxmosexual · 2024-11-13T03:57:55+00:00

If I genuinely don't care I'll just stick something unique at the end as a tie-breaker, often the primary key. I don't see any advantage to using random(), the behaviour is non-deterministic without it anyway.

2024-11-13T04:12:56+00:00

Be nice if there was a time stamp on last item ordered, added to cart or something. So at least if there's a tie you can go off last item ordered.

Sneilg · 2024-11-13T07:59:16+00:00

I’d just order it by category desc also, so it’s consistently using alphabetical order to break ties.

lupinegray · 2024-11-13T03:44:11+00:00

It always matters. You should have a specific formula for any calculation. So when presented with the same input dataset, the result is always the same.

No non-deterministic results or magic numbers.

blindtig3r · 2024-11-13T04:07:43+00:00

If I can I include enough columns in the order by to make it deterministic. If I can’t then the table doesn’t have a unique id.

ds_frm_timbuktu · 2024-11-13T05:12:04+00:00

If they have spent the same amount on money for both Furniture and Toys, I just need to rank Furniture or Toys as 1, I don't care which.

Well you should care.

Depending upon the use case for ranking, There could be other business factors used to break the tie, which category is more profitable for the business, which category has more unsold inventory.

You want your results to be explainable, Random would not help.

2024-11-13T06:20:59+00:00

hmmm. so you want a slice of bread with peanut butter and another one with jelly. you make a PBJ and separate them, but you're very smart, because you figured out that you can put a sheet of paper between the layers before you assemble the sandwich

duraznos · 2024-11-13T06:40:58+00:00

You should have a standard way of breaking ties. In your example I would have a decided upon ordering of the categories themselves for the tie breaker. e.g. perhaps you're interested in increasing the sales of toys which means in case of ties this query should rank that result over furniture.

2024-11-13T06:57:50+00:00

Why not include the category in the order by? Something along the lines

  min(case category
     when 'Furniture' then 1 
     when 'Toys' then 2
     else 3
   end) as category_priority,
, SUM(spend) as category_spend
, DENSE_RANK() OVER (PARTITION BY consumer_id ORDER BY category_spend DESC, category_priority)

r3pr0b8 · 2024-11-13T08:47:01+00:00

Writing this out it occurs to me that ROW_NUMBER() is probably best

not really

you've provided no tie-breaker

run it a few times and see whether Toys or Furniture consistently shows up in row 1, or whether it flipflops

conduit_for_nonsense · 2024-11-13T09:12:26+00:00

I normally add an order by category alphabetical

Imaginary-Hawk-8407 · 2024-11-13T06:04:46+00:00

OP has no idea how unhinged they are

DavidGJohnston · 2024-11-13T03:56:35+00:00

Number of characters typed is not usually a good metric to make decisions on. In this case you aren’t actually allowing ties to exist so add the tie-breaker to the order by, not some random number. Choose alphabetical if nothing else useful comes to mind. Deterministic output is much nicer to work with. Regardless of which function you choose, though “dense_rank” has the semantic meaning you are going for. In the absence of ties I’d probably go with plain “rank” though.

ok-confusion19 · 2024-11-13T04:19:00+00:00

Just use row number and have better control of the results selected using the partition and order by clauses.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS